Contour Integration Underlies Human-Like Vision

Authors: Ben Lonnqvist, Elsa Scialom, Abdulkadir Gokce, Zehra Merchant, Michael Herzog, Martin Schrimpf

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study systematically dissects where and why models struggle with contour integration a hallmark of human vision by designing an experiment that tests object recognition under various levels of object fragmentation. Humans (n=50) perform at high accuracy, even with few object contours present. This is in contrast to models which exhibit substantially lower sensitivity to increasing object contours, with most of the over 1,000 models we tested barely performing above chance. Only at very large scales ( 5B training dataset size) do models begin to approach human performance.
Researcher Affiliation Academia 1 Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland.
Pseudocode No The paper describes stimulus synthesis using an algorithm with mathematical formulas in Appendix A.11, but it does not present it as a structured pseudocode block or a clearly labeled algorithm block.
Open Source Code No The paper mentions third-party tools like 'rembg' (Gatis, 2023) and 'percept simulator' (Rotermund et al., 2024), but it does not provide any statement about releasing the authors' own implementation code for the methodology described in the paper, nor does it provide a link to a code repository.
Open Datasets Yes For our RGB base images we used the BOSS dataset (Brodeur et al., 2010; 2014) which consists of high-quality background-extracted images of everyday objects. We generated a total of 19 different datasets (Rotermund et al., 2024) from these images: contour-extracted images, as well as nine different levels of fragmentation for each of our two experimental conditions (directionless phosphenes and directional segments). We trained these datasets using three datasets: Image Net-1k (Russakovsky et al., 2015), Image Net-21k (Ridnik et al., 2021), and Eco Set (Mehrer et al., 2021).
Dataset Splits Yes We trained these datasets using three datasets: Image Net-1k (Russakovsky et al., 2015), Image Net-21k (Ridnik et al., 2021), and Eco Set (Mehrer et al., 2021). We trained models on full datasets, as well as subsets of the datasets ranging from 500 training samples to the full dataset. For each of our 12 object categories, we first selected 10 Image Net images from the corresponding Image Net categories. We then removed backgrounds from these images (Gatis, 2023) and generated 120 novel fragmented images for all percentage levels; 10 images per object category. We fit linear decoders on the penultimate layer activations.
Hardware Specification Yes All of our fragmented object models were trained for 100 epochs using two A100 GPUs per model, with a total batch size of 512.
Software Dependencies No The paper mentions optimizers like SGD and Adam W (Loshchilov & Hutter, 2019) and libraries such as 'Pytorch image models' (Wightman, 2019) and 'rembg' (Gatis, 2023), but it does not specify exact version numbers for any of these software components.
Experiment Setup Yes All of our fragmented object models were trained for 100 epochs using two A100 GPUs per model, with a total batch size of 512. We train multiple instances of Res Net-18 (He et al., 2016) on various combinations of four datasets: Image Net, Image Net-contours, Image Net-phosphenes, and Image Net-segments. During joint training on multiple datasets, we first select a batch of images and load the variations of those that λi = 0. For instance, if the model is trained on base Image Net and contours, we load a batch of RGB images and their contour counterparts. For phosphene and segment variations, we only use the 100% condition, and load the image accordingly. Except the Image Net-trained variant, which is trained using SGD, all model variants are trained with the Adam W (Loshchilov & Hutter, 2019) optimizer using a learning rate of 10 3, a weight decay of 5 10 2, and a cosine learning rate decay schedule. We employ a linear warm-up for the first 5 epochs and train for a total of 100 epochs. The batch size is 512, and standard Image Net augmentations (random resized cropping and horizontal flipping) are applied throughout training.