Scaling ResNets in the Large-depth Regime

Authors: Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training. Keywords: Res Nets, deep learning theory, neural ODE, neural network initialization, continuous-time models. We experimentally investigate in this section two questions. ... We train Res Nets on the datasets MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky, 2009). The results in terms of accuracy are presented in Figure 9 (light orange = good performance, blue = bad performance).
Researcher Affiliation Collaboration Pierre Marion EMAIL Sorbonne Universit e, CNRS, Laboratoire de Probabilit es, Statistique et Mod elisation F-75005 Paris, France; Adeline Fermanian EMAIL MINES Paris Tech, PSL Research University, CBIO, F-75006 Paris, France Institut Curie, PSL Research University, F-75005 Paris, France INSERM, U900, F-75005 Paris, France; G erard Biau EMAIL Sorbonne Universit e, CNRS, Laboratoire de Probabilit es, Statistique et Mod elisation Institut universitaire de France F-75005 Paris, France; Jean-Philippe Vert EMAIL Google Research, Brain team, Paris, France
Pseudocode No The paper contains extensive mathematical derivations and analyses (e.g., equations 1-14, proofs in Appendix A) but does not include any explicitly labeled pseudocode or algorithm blocks. The methods are described through prose and mathematical formulas rather than structured algorithmic steps.
Open Source Code Yes Our code is available at https://github.com/Pierre Marion23/scaling-resnets.
Open Datasets Yes To investigate the link between regularity of the weights at initialization, scaling, and performance after training, we train Res Nets on the datasets MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky, 2009). The best performance on the learning rate grid is reported in the figure. 1. http://yann.lecun.com/exdb/mnist 2. https://www.cs.toronto.edu/~kriz/cifar.html
Dataset Splits No The paper mentions training Res Nets on MNIST and CIFAR-10 datasets and training for 10 epochs. While these datasets often have standard splits, the paper does not explicitly state the specific training, validation, or test splits used for its experiments, nor does it provide a citation for the specific splits used.
Hardware Specification No The paper describes various experimental setups including hyperparameters, but does not specify any particular hardware (e.g., GPU models, CPU types, or memory amounts) used for running the experiments.
Software Dependencies No The paper mentions using the Adam optimizer for training, but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used in the experiments. The Adam optimizer itself is an algorithm, not a software package with a version.
Experiment Setup Yes Our code is available at https://github.com/Pierre Marion23/scaling-resnets. To obtain Figures 1 to 3, we initialize Res Nets from res-3 with the hyperparameters of Table 2. d 40 nin 64 nout 1 L 10 to 1000 β 0.25, 0.5, 1 weight distribution U( p 3/d) data distribution standard Gaussian. ... We train on MNIST1 and CIFAR-102 using the Adam optimizer (Kingma and Ba, 2015) for 10 epochs. The learning rate is divided by 10 after 5 epochs. The best performance on the learning rate grid is reported in the figure. Table 4: Hyperparameters of Figure 9: learning rate grid 10 4, 10 3, 10 2, 10 1, 1