reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A comparison between humans and AI at recognizing objects in unusual poses

Authors: Netta Ollikka, Amro Kamal Mohamed Abbas, Andrea Perin, Markku Kilpeläinen, Stephane Deny

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we compare human subjects with state-of-the-art deep networks for vision and state-of-the-art large vision-language models at recognizing objects in various poses. We collected a dataset of objects viewed in different poses (upright and rotated out-of-plane), to test the ability of humans to recognize these objects, and compare this ability to state-of-the-art deep networks (Figure 1).
Researcher Affiliation	Academia	Netta Ollikka EMAIL Department of Neuroscience and Biomedical Engineering Aalto University, Espoo, Finland Amro Abbas EMAIL The African Institute for Mathematical Sciences, Mbour-Thies, Senegal Andrea Perin EMAIL Department of Computer Science Aalto University, Espoo, Finland Markku Kilpeläinen EMAIL Department of Psychology and Logopedics University of Helsinki, Finland Stéphane Deny EMAIL Department of Neuroscience and Biomedical Engineering Department of Computer Science Aalto University, Espoo, Finland
Pseudocode	No	The paper describes methods and procedures in narrative text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All code and data is available at https://github.com/BRAIN-Aalto/unusual_poses.
Open Datasets	Yes	All code and data is available at https://github.com/BRAIN-Aalto/unusual_poses. We chose 51 different object categories from the Image Net classes (see Appendix D for the list of objects)
Dataset Splits	Yes	Each observer performed 49 trials, in which the image was in one of three types of poses: upright in 17 trials, rotated-correct (correctly classified by Efficient Net, see Dataset collection 2.1) in 17 trials, and rotated-incorrect (incorrectly classified by Efficient Net) in 15 trials.
Hardware Specification	No	The paper mentions a "22.5 VIEWPixx display" used for human experiments, but does not provide specific details about the GPUs, CPUs, or other computational hardware used for running the machine tests or model evaluations.
Software Dependencies	No	The paper mentions using "MATLAB Psychophysics Toolbox" and models from "Pytorch Image Model library (timm)", "Hugging Face Transformers library", and "Torch Hub". However, no specific version numbers are provided for these software components.
Experiment Setup	Yes	For the large-language models (all models excluding Sig LIP), the experiment was conducted via the API. Each model was shown the 147 images and provided with the following prompt (see examples in Appendix A): What s in this image? A. [label 1] B. [label 2] Choose either A or B and answer in one or two words. For pure vision networks, the choice was made by looking at the highest activation of the softmax output layer for these two labels.