Understanding Optimization in Deep Learning with Central Flows

Authors: Jeremy Cohen, Alex Damian, Ameet Talwalkar, Zico Kolter, Jason Lee

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. ... Although this derivation employs informal mathematical reasoning, our experiments demonstrate that this central flow can successfully predict long-term optimization trajectories on a variety of neural networks with a high degree of numerical accuracy.
Researcher Affiliation Academia 1 Carnegie Mellon University, 2 Princeton University, 3 Flatiron Institute
Pseudocode No The paper includes mathematical equations for optimizers (e.g., wt+1 = wt η L(wt). (GD)), and differential equations, but no explicitly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code Yes Our code can be found at: http://github.com/locuslab/central_flows.
Open Datasets Yes We test the vision architectures (CNN, Res Net, Vi T) on a subset of CIFAR-10 (Krizhevsky, 2009). We test the sequence architectures (LSTM, Transformer, Mamba) on a synthetic sorting task (Karpathy, 2020).
Dataset Splits No We test the vision architectures on a subset of CIFAR-10 that contains 1000 training examples, all from the first 4 CIFAR-10 classes. ... The size of the training datraset was usually 1,000 (except for Mamba, where it was 250). The paper mentions the size of the training dataset but does not specify validation or test splits, nor does it refer to a standard split with sufficient detail.
Hardware Specification No The paper discusses the computational expense and time complexity of its methods but does not provide specific details about the hardware used, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper mentions software like Jax, PyTorch, and Python+NumPy programs. However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes A Vi T is trained on CIFAR-10 using gradient descent with η = 2/200 (blue). ... Scalar RMSProp with η = 2/400 and β2 = 0.99 (blue). ... RMSProp with η = 2 * 10^-5 and β2 = 0.99 (blue). ... The quality of the central flow approximation is enhanced by starting it 10-15 steps into training. ... We initialize the weights and biases of the final linear layer to be zero, as this makes the curvature low at initialization.