reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When does compositional structure yield compositional generalization? A kernel theory.

Authors: Samuel Lippl, Kimberly Stachenfeld

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a theory of compositional generalization in kernel models with fixed, compositionally structured representations. This provides a tractable framework for characterizing the impact of training data statistics on generalization. We find that these models are limited to functions that assign values to each combination of components seen during training, and then sum up these values (conjunction-wise additivity). This imposes fundamental restrictions on the set of tasks compositionally structured kernel models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that they can learn in principle, we identify novel failure modes in compositional generalization (memorization leak and shortcut bias) that arise from biases in the training data. Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data.
Researcher Affiliation	Collaboration	Samuel Lippl Center for Theoretical Neuroscience Columbia University New York, NY, USA EMAIL Kimberly Stachenfeld Google Deep Mind and Center for Theoretical Neuroscience Columbia University New York, NY, USA EMAIL
Pseudocode	No	The paper describes mathematical derivations and theoretical findings but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code required to reproduce all experiments can be found under https://github.com/sflippl/compositional-generalization.
Open Datasets	Yes	Deep networks trained on MNIST and CIFAR versions of compositional tasks.
Dataset Splits	Yes	After training models on certain combinations of components Ztrain Z = QC c=1 Zc, we assess generalization on all other combinations Ztest := Z \ Ztrain.
Hardware Specification	No	The paper discusses software frameworks like Pytorch and Pytorch Lightning and types of neural networks (Conv Nets, Res Nets, Vi Ts) but does not provide any specific details about the hardware (e.g., GPU or CPU models) used for training or running experiments.
Software Dependencies	No	All networks were trained with Pytorch and Pytorch Lightning Paszke et al. (2019). We fit the kernel models by hand-specifying the kernel and fitting either a support vector regression or classification using scikit-learn (Pedregosa et al., 2011).
Experiment Setup	Yes	We consider Re LU networks with one hidden layer and H = 1000 units. We initialize by σ p 2/H, considering σ [10 6, 1]... We considered networks with four convolutional layers (kernel size is five, two layers have 32 filters, two have 64 filters) and two densely connected layers (with 512 and 1024 units)... We trained these networks with SGD using a learning rate of 10 4 and momentum of 0.9... We trained a residual neural network with eight blocks in total... using the Adam optimizer with a learning rate of 10 3 for 100 epochs... Finally, we trained a Vision Transformer (Vi T) with six attention heads, 256 dimensions for both the attention layer and the MLP, and a depth of four, using Adam with a learning rate of 10 4 for 200 epochs.