reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contour Integration Underlies Human-Like Vision

Authors: Ben Lonnqvist, Elsa Scialom, Abdulkadir Gokce, Zehra Merchant, Michael Herzog, Martin Schrimpf

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study systematically dissects where and why models struggle with contour integration a hallmark of human vision by designing an experiment that tests object recognition under various levels of object fragmentation. Humans (n=50) perform at high accuracy, even with few object contours present. This is in contrast to models which exhibit substantially lower sensitivity to increasing object contours, with most of the over 1,000 models we tested barely performing above chance. Only at very large scales ( 5B training dataset size) do models begin to approach human performance.
Researcher Affiliation	Academia	1 Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland.
Pseudocode	No	The paper describes stimulus synthesis using an algorithm with mathematical formulas in Appendix A.11, but it does not present it as a structured pseudocode block or a clearly labeled algorithm block.
Open Source Code	No	The paper mentions third-party tools like 'rembg' (Gatis, 2023) and 'percept simulator' (Rotermund et al., 2024), but it does not provide any statement about releasing the authors' own implementation code for the methodology described in the paper, nor does it provide a link to a code repository.
Open Datasets	Yes	For our RGB base images we used the BOSS dataset (Brodeur et al., 2010; 2014) which consists of high-quality background-extracted images of everyday objects. We generated a total of 19 different datasets (Rotermund et al., 2024) from these images: contour-extracted images, as well as nine different levels of fragmentation for each of our two experimental conditions (directionless phosphenes and directional segments). We trained these datasets using three datasets: Image Net-1k (Russakovsky et al., 2015), Image Net-21k (Ridnik et al., 2021), and Eco Set (Mehrer et al., 2021).
Dataset Splits	Yes	We trained these datasets using three datasets: Image Net-1k (Russakovsky et al., 2015), Image Net-21k (Ridnik et al., 2021), and Eco Set (Mehrer et al., 2021). We trained models on full datasets, as well as subsets of the datasets ranging from 500 training samples to the full dataset. For each of our 12 object categories, we first selected 10 Image Net images from the corresponding Image Net categories. We then removed backgrounds from these images (Gatis, 2023) and generated 120 novel fragmented images for all percentage levels; 10 images per object category. We fit linear decoders on the penultimate layer activations.
Hardware Specification	Yes	All of our fragmented object models were trained for 100 epochs using two A100 GPUs per model, with a total batch size of 512.
Software Dependencies	No	The paper mentions optimizers like SGD and Adam W (Loshchilov & Hutter, 2019) and libraries such as 'Pytorch image models' (Wightman, 2019) and 'rembg' (Gatis, 2023), but it does not specify exact version numbers for any of these software components.
Experiment Setup	Yes	All of our fragmented object models were trained for 100 epochs using two A100 GPUs per model, with a total batch size of 512. We train multiple instances of Res Net-18 (He et al., 2016) on various combinations of four datasets: Image Net, Image Net-contours, Image Net-phosphenes, and Image Net-segments. During joint training on multiple datasets, we first select a batch of images and load the variations of those that λi = 0. For instance, if the model is trained on base Image Net and contours, we load a batch of RGB images and their contour counterparts. For phosphene and segment variations, we only use the 100% condition, and load the image accordingly. Except the Image Net-trained variant, which is trained using SGD, all model variants are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a learning rate of 10 3, a weight decay of 5 10 2, and a cosine learning rate decay schedule. We employ a linear warm-up for the first 5 epochs and train for a total of 100 epochs. The batch size is 512, and standard Image Net augmentations (random resized cropping and horizontal flipping) are applied throughout training.