reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

Authors: Abdulkadir Gokce, Martin Schrimpf

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we explore scaling laws for modeling the primate visual ventral stream by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and behavior. We find that while behavioral alignment continues to scale with larger models, neural alignment saturates.
Researcher Affiliation	Academia	1EPFL. Correspondence to: Abdulkadir Gokce <EMAIL>.
Pseudocode	No	The paper describes mathematical formulas for power-law curves (Eq. 1, 3, 5, 6, 8, 9) and their minimization, but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We publicly release our training code, evaluation pipeline, and over 600 checkpoints for models trained in a controlled manner to enable future research. All resources are available at: https://github.com/epflneuroailab/scaling-primate-vvs.
Open Datasets	Yes	For our experiments, we selected two image classification datasets: Image Net (Deng et al., 2009) and Eco Set (Mehrer et al., 2021). ... To create subsets of Image Net and Eco Set, we sampled d {1, 3, 10, 30, 100, 300} images per category. ... Additionally, we explored the impact of adversarial finetuning on alignment performance. In Figure 7b, Res Net models trained on subsets of Image Net were fine-tuned adversarially for 10 epochs using the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015; Wong et al., 2020). ... We trained Res Net18 models on subsets of several large-scale image datasets: Image Net-21k-P, Web Vision-P, i Naturalist, and Places365. ... The MNIST dataset (Le Cun et al., 1998) ... we utilize the Infinite MNIST (Infimnist) tool Loosli et al., 2007, ... CIFAR-10 (Krizhevsky et al.)
Dataset Splits	Yes	A linear regression is trained on 90% of the images to correlate model and neural data, with prediction accuracy for the remaining 10% evaluated using Pearson correlation, repeated ten times for cross-validation. The behavioral benchmark assesses model predictions for 240 images against primate behavioral data from (Rajalingham et al., 2018) using a logistic classifier trained on 2,160 labeled images. ... To create subsets of Image Net and Eco Set, we sampled d {1, 3, 10, 30, 100, 300} images per category. For d {1, 10, 100}, we repeated the runs with three random seeds to ensure robustness. ... CIFAR-10 (Krizhevsky et al.) is a widely used benchmark of 60,000 low-resolution (32 x 32) color images divided evenly into 10 object classes. The dataset comprises 50,000 training images and 10,000 test images, with 6,000 samples per class. To match our scaling protocol, we created class-balanced subsets by sampling d {10, 30, 100, 300, 1000, 3000} images per class.
Hardware Specification	No	The paper mentions using 'Composer (Team, 2021) employed as the GPU orchestration tool' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies	Yes	Our experiments are conducted using the PyTorch framework (Paszke et al., 2019), with Composer (Team, 2021) employed as the GPU orchestration tool to efficiently manage computational resources. For image augmentations, we leverage the Albumentations Buslaev et al., 2020 library... we use the Lightly (Susmelj et al., 2020) library to facilitate the implementation of self-supervised losses, augmentations, and model heads. To generate adversarial examples for adversarial fine-tuning, we employ the Torchattacks library (Kim, 2020).
Experiment Setup	Yes	The remaining models were trained for 100 epochs using a minibatch size of 512. We employed a stochastic gradient descent (SGD) optimizer with a cosine decaying learning rate schedule, starting with a peak learning rate of 0.1 and incorporating a linear warm-up phase spanning five epochs. We maintained the momentum at 0.9 and applied a weight decay of 10-4. Cross-entropy loss was used as the minimization objective. We utilized standard Image Net data augmentations, specifically random resized cropping and horizontal flipping.