Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry
Authors: Chi-Ning Chou, Hang Le, Yichen Wang, Sueyeon Chung
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show, in both theoretical and empirical settings, that as networks learn features, task-relevant manifolds untangle, with changes in manifold geometry revealing distinct learning stages and strategies beyond the lazy rich dichotomy. This framework provides novel insights into feature learning across neuroscience and machine learning, shedding light on structural inductive biases in neural circuits and the mechanisms underlying out-of-distribution generalization. |
| Researcher Affiliation | Collaboration | 1Center for Computational Neuroscience, Flatiron Institute, New York, NY, USA 2University of California, UCLA, Los Angeles, CA, USA 3Center for Neural Science, New York University, New York, NY, USA. Correspondence to: Chi-Ning Chou <EMAIL>, Hang Le <EMAIL>, Sue Yeon Chung <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Estimate simulated manifold capacity... Algorithm 2 Estimate manifold capacity and effective geometric measures |
| Open Source Code | Yes | All code required to reproduce the figures presented is available under an MIT License at https://github.com/chungneuroai-lab/feature-learning-geometry |
| Open Datasets | Yes | Specifically, we considered VGG-11 (Simonyan & Zisserman, 2015) and Res Net-18 (He et al., 2016) and datasets CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR-100 (Krizhevsky & Hinton, 2009), CIFAR-10C (Hendrycks & Dietterich, 2018). |
| Dataset Splits | Yes | The CIFAR-10 dataset (Krizhevsky & Hinton, 2009) consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. ... The CIFAR-100 dataset (Krizhevsky & Hinton, 2009) is similar to CIFAR-10, except that it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. |
| Hardware Specification | No | All experiments were performed using the Flatiron Institute s high-performance computing cluster. |
| Software Dependencies | No | Optimizer: We use Stochastic Gradient Descent with momentum (implemented as torch.optim.SGD(momentum=0.9)) to train the models. ... The error bar indicates the bootstraped 95% confidence interval calculated using seaborn.lineplot(errorbar=( ci , 95)). |
| Experiment Setup | Yes | Optimizer: We use Stochastic Gradient Descent with momentum (implemented as torch.optim.SGD(momentum=0.9)) to train the models. Data augmentation: We apply the following data augmentation during training: Random Crop(32, padding=4), Random Horizontal Flip. Learning rate and learning schedule: We follow the practice in (Chizat et al., 2019) and set initial learning rate η0 = 1.0 for VGG-11 and η0 = 0.2 for Res Net-18. The learning rate schedule is defined as ηt = η0 / (1 + (1/3)t). |