Farrar,, B. G., Altschul, D. M., Fischer, J., van der Mescht, J., Placì, S., Troisi, C. A., Vernouillet, A., Clayton, N. S., & Ostojić, L. (2020). Trialling meta-research in comparative cognition: Claims and statistical inference in animal physical cognition. Animal Behavior and Cognition, 7(3), 419-444. doi: https://doi.org/10.26451/abc.07.03.09.2020
Scientific disciplines face concerns about replicability and statistical inference, and these concerns are also relevant in animal cognition research. This paper presents a first attempt to assess how researchers make and publish claims about animal physical cognition, and the statistical inferences they use to support them. We surveyed 116 published experiments from 63 papers on physical cognition, covering 43 different species. The most common tasks in our sample were trap-tube tasks (14 papers), other tool use tasks (13 papers), means-end understanding and string-pulling tasks (11 papers), object choice and object permanence tasks (9 papers) and access tasks (5 papers). This sample is not representative of the full scope of physical cognition research; however, it does provide data on the types of statistical design and publication decisions researchers have adopted. Across the 116 experiments, the median sample size was 7. Depending on the definitions we used, we estimated that between 44% and 59% of our sample of papers made positive claims about animals’ physical cognitive abilities, between 24% and 46% made inconclusive claims, and between 10% and 17% made negative claims. Several failures of animals to pass physical cognition tasks were reported. Although our measures had low inter-observer reliability, these findings show that negative results can and have been published in the field. However, publication bias is still present, and consistent with this, we observed a drop in the frequency of p-values above .05. This suggests that some non-significant results have not been published. More promisingly, we found that researchers are likely making many correct statistical inferences at the individual-level. The strength of evidence of statistical effects at the group-level was weaker, and its p-value distribution was consistent with some effect sizes being overestimated. Studies such as ours can form part of a wider investigation into statistical reliability in comparative cognition. However, future work should focus on developing the validity and reliability of the measurements they use, and we offer some starting points.
Physical cognition, Folk physics, Evidence, Statistical inference, Publication bias