From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning
Authors: Noa Rubin, Kirsten Fischer, Javed Lindner, Inbar Seroussi, Zohar Ringel, Michael Krämer, Moritz Helias
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work presents a theoretical framework of multi-scale adaptive feature learning bridging these two views. Using methods from statistical mechanics, we derive analytical expressions for network output statistics which are valid across scaling regimes and in the continuum between them. ... In Fig. 2, we compare theoretical values for training and test discrepancies against empirical measurements for linear networks trained on a linearly separable Ising task (see App. C.3 for details). Comparing to the NNGP as a baseline, we find that, while the NNGP fails to match network outputs, the multi-scale adaptive theory accurately predicts the values observed in trained networks. |
| Researcher Affiliation | Academia | 1The Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel 2 Institute for Advanced Simulation (IAS-6), Computational and Systems Neuroscience, J ulich Research Centre, J ulich, Germany 3RWTH Aachen University, Aachen, Germany 4Department of Physics, RWTH Aachen University, Aachen, Germany 5Institute for Theoretical Particle Physics and Cosmology, RWTH Aachen University, Aachen, Germany 6Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel. Correspondence to: Noa Rubin <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Annealing of solutions across scaling regimes Input: data X, labels Y , scales {χi}i Compute NNGP train predictors f NNGP α from data X and labels Y . Set initial value to NNGP predictor f NNGP α . for χ in {χi}i do Set gw 7 gw/χ. Solve self-consistency solution for tree-level approximation f TL α with initial value f NNGP α . Solve self-consistency solution for one-loop approximation f 1-Loop α with initial value f TL α . end for |
| Open Source Code | Yes | The code for theory and experiments can be found in 10.5281/zenodo.15480898. URL https://doi.org/ 10.5281/zenodo.15480898. |
| Open Datasets | Yes | In addition, our theory does not make any assumptions on the data set; we show results for an Ising task, a teacher student task and MNIST. |
| Dataset Splits | Yes | Parameters: γ = 1, Ptrain = 80, N = 100, D = 200, κ0 = 1, Ptest = 103, gv = gw = 0.5, p = 0.1. |
| Hardware Specification | No | The authors gratefully acknowledge the computing time granted by the JARA Vergabegremium and provided on the JARA Partition part of the supercomputer JURECA at Forschungszentrum J ulich (computation grant JINB33). |
| Software Dependencies | No | The time discrete version of (164) is implemented in our Py Torch code as |
| Experiment Setup | Yes | Parameters: γ = 1, Ptrain = 80, N = 100, D = 200, κ0 = 1, Ptest = 103, gv = gw = 0.5, p = 0.1. |