A Rainbow in Deep Network Black Boxes
Authors: Florentin Guth, Brice Ménard, Gaspar Rochette, Stéphane Mallat
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also verify numerically our modeling assumptions on deep CNNs trained on image classification tasks, and show that the trained networks approximately satisfy the rainbow hypothesis. In particular, rainbow networks sampled from the corresponding random feature model achieve similar performance as the trained networks. Our results highlight the central role played by the covariances of network weights at each layer, which are observed to be low-rank as a result of feature learning. Keywords: deep neural networks, infinite-width limit, random features, representation alignment, weight covariance. |
| Researcher Affiliation | Academia | Florentin Guth EMAIL Center for Data Science, New York University, 60 5th Avenue, New York, NY 10011, USA Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA Brice Ménard EMAIL Department of Physics & Astronomy, Johns Hopkins University Baltimore, MD 21218, USA Gaspar Rochette EMAIL Département d informatique, École Normale Supérieure, CNRS, PSL University 45 rue d Ulm, 75005 Paris, France Stéphane Mallat EMAIL Collège de France, 11, place Marcelin-Berthelot 75231 Paris, France Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA |
| Pseudocode | No | The paper describes methods verbally and mathematically but does not present explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce all our experiments can be found at https://github.com/Florentin Guth/Rainbow. |
| Open Datasets | Yes | Architectures and tasks. In this paper, we consider two architectures, learned scattering networks (Zarka et al., 2021; Guth et al., 2022) and Res Nets (He et al., 2016), trained on two image classification datasets, CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | The alignment rotations are computed using the CIFAR-10 train set, while network accuracy is evaluated on the test set, so that the measured performance is not a result of overfitting. |
| Hardware Specification | No | We thank the Scientific Computing Core at the Flatiron Institute for the use of their computing resources. |
| Software Dependencies | No | Network weights are initialized with i.i.d. samples from an uniform distribution (Glorot and Bengio, 2010) with so-called Kaiming variance scaling (He et al., 2015), which is the default in the Py Torch library (Paszke et al., 2019). |
| Experiment Setup | Yes | Scattering networks are trained for 150 epochs with an initial learning rate of 0.01 which is divided by 10 every 50 epochs, with a batch size of 128. Res Nets are trained for 90 epochs with an initial learning rate of 0.1 which is divided by 10 every 30 epochs, with a batch size of 256. We use the optimizer SGD with a momentum of 0.9 and a weight decay of 10 4 (except for Figures 4 and 10 where weight decay has been disabled). We use classical data augmentations: horizontal flips and random crops for CIFAR, random resized crops of size 224 and horizontal flips for Image Net. |