Swing-by Dynamics in Concept Learning and Compositional Generalization

Authors: Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM s learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights e.g., we find a novel mechanism for nonmonotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models. Empirical confirmation of the predicted Swing-by phenomenon in diffusion models. We verify the predicted mechanism of Swing-by Dynamics in text-conditioned diffusion models, observing the non-monotonic evolution of generalization accuracy for unseen combinations of concepts, as predicted by our theory.
Researcher Affiliation Collaboration 1 CBS-NTT Physics of Intelligence Program, Harvard University, Cambridge, MA, USA 2 Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 3 Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA 4 Department of Physics, Harvard University, Cambridge, MA, USA
Pseudocode No The paper describes mathematical analysis and theoretical derivations (e.g., equations 2.1, 4.1, 4.2, 4.3, and sections like 'THEORETICAL EXPLANATION' and 'PROOFS AND CALCULATIONS'), but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes In Fig. 6, we repeat the experiments did in Section 4 of Park et al. (2024): we consider a synthetic setup where the model learns to generate the image of a circle of the indicated size and color given by a text input. ... We borrow part of the compositional data generating process (DGP) introduced by in Park et al. (2024).
Dataset Splits Yes The training set D = S p [s] n x(p) k on k=1 is generated by the following process: for each p [s], each training point of the p-th cluster is sampled i.i.d. from a Gaussian distribution x(p) k N h µp1p, diag (σ)2i. ... In all SIM experiments... the number of training samples in each Gaussian cluster is 5000. ... The training set only contains samples from input pairs (red, big), (blue, big) and (red, small), and the test set contains samples from an OOD input pair (blue, small).
Hardware Specification No The paper mentions training models and U-Net architecture but does not specify any particular hardware like GPU or CPU models used for these experiments.
Software Dependencies No The paper mentions using the AdamW optimizer, U-Net, ResNet layers, Layer Norm, and GELU activations, but it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes In all SIM experiments... all the models are trained using stochastic gradient descent with a batch size of 128 and a learning rate of 0.1 for 40 epochs. ... We train our model with the Adam W optimizer (Loshchilov & Hutter, 2019) with learning rate 1 10 3 and weight decay 0.01. We use a batch size of 128 and train for 20k steps.