Grokking at the Edge of Linear Separability
Authors: Alon Beck, Noam Itzhak Levi, Yohai Bar-Sinai
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Fig. 1, we show numerical results depicting the gradient-descent dynamics of the model across three values of λ d/N. Notably, we observe a significant grokking effect, both in the non-monotonicity of the test loss, and a delayed rise in test accuracy, only when λ λc = 1/2. (...) In Fig. 4 we present numerical simulations supporting this behavior. |
| Researcher Affiliation | Academia | 1Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel 2École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. Correspondence to: Alon Beck <EMAIL>, Noam Levi <EMAIL>. |
| Pseudocode | No | The paper describes methods using mathematical equations and text, but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statements about releasing code or provide links to a code repository. |
| Open Datasets | No | The paper describes generating a synthetic dataset: "We study a typical logistic binary classification problem, with the goal of finding a linear separator between two Gaussians with distinct labels." and "xi N(0, σ2Id)". It does not use or provide access to any pre-existing public datasets. |
| Dataset Splits | Yes | The parameters are N = 4 104,σ = 5, η = 0.01. The number of test samples is Ntest = 104. Additional details regarding the experiments can be found in App. K. |
| Hardware Specification | No | The paper does not specify any particular hardware (CPU, GPU, etc.) used for running the experiments. |
| Software Dependencies | No | The paper mentions "using adaptive momentum based optimizers like ADAM (Kingma & Ba, 2017)" in Section 3.5 and "ADAM optimizer with Py Torch s default parameters" in Fig. 11, but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | The parameters are N = 4 104,σ = 5, η = 0.01. The direction of S(t = 0) was drawn isotropically with S0 = 0.1 and b(t = 0) = 0. (...) using ADAM optimizer (with β1 = 0.8, β2 = 0.9), instead of GD. The parameters are λ = d/N = 0.495, N = 4000 and σ = 1. |