Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
Authors: Abdulkadir Gokce, Martin Schrimpf
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we explore scaling laws for modeling the primate visual ventral stream by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and behavior. We find that while behavioral alignment continues to scale with larger models, neural alignment saturates. |
| Researcher Affiliation | Academia | 1EPFL. Correspondence to: Abdulkadir Gokce <EMAIL>. |
| Pseudocode | No | The paper describes mathematical formulas for power-law curves (Eq. 1, 3, 5, 6, 8, 9) and their minimization, but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We publicly release our training code, evaluation pipeline, and over 600 checkpoints for models trained in a controlled manner to enable future research. All resources are available at: https://github.com/epflneuroailab/scaling-primate-vvs. |
| Open Datasets | Yes | For our experiments, we selected two image classification datasets: Image Net (Deng et al., 2009) and Eco Set (Mehrer et al., 2021). ... To create subsets of Image Net and Eco Set, we sampled d {1, 3, 10, 30, 100, 300} images per category. ... Additionally, we explored the impact of adversarial finetuning on alignment performance. In Figure 7b, Res Net models trained on subsets of Image Net were fine-tuned adversarially for 10 epochs using the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015; Wong et al., 2020). ... We trained Res Net18 models on subsets of several large-scale image datasets: Image Net-21k-P, Web Vision-P, i Naturalist, and Places365. ... The MNIST dataset (Le Cun et al., 1998) ... we utilize the Infinite MNIST (Infimnist) tool Loosli et al., 2007, ... CIFAR-10 (Krizhevsky et al.) |
| Dataset Splits | Yes | A linear regression is trained on 90% of the images to correlate model and neural data, with prediction accuracy for the remaining 10% evaluated using Pearson correlation, repeated ten times for cross-validation. The behavioral benchmark assesses model predictions for 240 images against primate behavioral data from (Rajalingham et al., 2018) using a logistic classifier trained on 2,160 labeled images. ... To create subsets of Image Net and Eco Set, we sampled d {1, 3, 10, 30, 100, 300} images per category. For d {1, 10, 100}, we repeated the runs with three random seeds to ensure robustness. ... CIFAR-10 (Krizhevsky et al.) is a widely used benchmark of 60,000 low-resolution (32 x 32) color images divided evenly into 10 object classes. The dataset comprises 50,000 training images and 10,000 test images, with 6,000 samples per class. To match our scaling protocol, we created class-balanced subsets by sampling d {10, 30, 100, 300, 1000, 3000} images per class. |
| Hardware Specification | No | The paper mentions using 'Composer (Team, 2021) employed as the GPU orchestration tool' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | Yes | Our experiments are conducted using the PyTorch framework (Paszke et al., 2019), with Composer (Team, 2021) employed as the GPU orchestration tool to efficiently manage computational resources. For image augmentations, we leverage the Albumentations Buslaev et al., 2020 library... we use the Lightly (Susmelj et al., 2020) library to facilitate the implementation of self-supervised losses, augmentations, and model heads. To generate adversarial examples for adversarial fine-tuning, we employ the Torchattacks library (Kim, 2020). |
| Experiment Setup | Yes | The remaining models were trained for 100 epochs using a minibatch size of 512. We employed a stochastic gradient descent (SGD) optimizer with a cosine decaying learning rate schedule, starting with a peak learning rate of 0.1 and incorporating a linear warm-up phase spanning five epochs. We maintained the momentum at 0.9 and applied a weight decay of 10-4. Cross-entropy loss was used as the minimization objective. We utilized standard Image Net data augmentations, specifically random resized cropping and horizontal flipping. |