Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

How Feature Learning Can Improve Neural Scaling Laws

Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our finding that feature learning improves the scaling law for hard tasks but not for easy and super-easy tasks with experiments of nonlinear MLPs fitting functions with power-law Fourier spectra on the circle and CNNs learning vision tasks.
Researcher Affiliation Academia B.B. is supported by a Google Ph D Fellowship. A.A. is supported by a Fannie and John Hertz Fellowship. C.P. is supported by NSF grant DMS2134157, NSF CAREER Award IIS-2239780, and a Sloan Research Fellowship. This work has been made possible in part by a gift from the Chan Zuckerberg Initiative Foundation to establish the Kempner Institute for the Study of Natural and Artificial Intelligence.
Pseudocode No The paper describes mathematical models and dynamics using equations and textual explanations, but it does not include any clearly labeled pseudocode or algorithm blocks. The dynamics are described as a continuous mathematical process rather than a step-by-step algorithm.
Open Source Code No The paper does not contain an explicit statement by the authors releasing their code, nor does it provide a direct link to a code repository for the methodology described. While it references a third-party GitHub link for a dataset (Pearce, 2022), this is not code for the paper's specific methodology.
Open Datasets Yes We adopt larger versions of these datasets: MNIST-1M and CIAFR-5M. We generate MNIST-1M using the denoising diffusion model (Ho et al., 2020) in Pearce (2022). We use CIFAR5M from Nakkiran et al. (2021).
Dataset Splits No The paper mentions the use of MNIST-1M and CIFAR-5M datasets for experiments. However, it does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined standard splits).
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or cloud computing specifications.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed to replicate the experimental setup.
Experiment Setup Yes The MLPs in Figure 5 were depth L = 4 with nonlinearities ϕ(h) = Re LU(h)qϕ... These MLPs are depth 4 and width 512. The CNN experiment on MNIST uses a depth L = 4 architecture with two convolutional layers and two Dense layers. A depth L = 4 decoder-only transformer (16 heads with dhead = 128) trained on next-word prediction with SGD... to lazy learning curve γ = 10^-4 over the interval from 10^6 to 3 * 10^9 tokens... simulation at γ = 0.1.