Does Data Scaling Lead to Visual Compositional Generalization?
Authors: Arnas Uselis, Andrea Dittadi, Seong Joon Oh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Our experiments reveal a clear principle: compositional generalization is driven by data diversity, not mere data scale. |
| Researcher Affiliation | Academia | 1T ubingen AI Center, University of T ubingen 2Helmholtz AI 3Technical University of Munich 4Munich Center for Machine Learning (MCML) 5Max Planck Institute for Intelligent Systems, T ubingen. Correspondence to: Arnas Uselis <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Recovering factored concept representations for k = 2 concepts |
| Open Source Code | Yes | github.com/oshapio/visual-compositional-generalization We release our code and datasets publicly to promote reproducible research and responsible development of these capabilities. |
| Open Datasets | Yes | We use DSPRITES (Matthey et al., 2017) (using only heart shape to avoid symmetries), 3DSHAPES (Kim & Mnih, 2019), PUG (Bordes et al., 2023), COLOREDMNIST (Arjovsky et al., 2020), and a dataset we introduce of perceptually-challenging shapes without symmetries to which we refer as FSPRITES. Details in Appendix D. We release our code and datasets publicly to promote reproducible research and responsible development of these capabilities. |
| Dataset Splits | Yes | For each concept value i, we observe combinations with values j where (i j + n) mod n < k, and evaluate on all other combinations. This creates a clear distinction between combinations seen during training and those requiring compositional generalization. ... The training combinations (ci 1, ci 2) are drawn from the restricted subset Strain C1 C2. We refer to this as in-distribution (ID) data. (2) Testing: Evaluate on combinations from Stest = (C1 C2) \ Strain, i.e., concept pairs that never co-occurred during training. We refer to this as out-of-distribution (OOD) data. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. It mentions models like RESNET-50 and DINO Vi T-L but no specific hardware specifications. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2017) optimizer' but does not provide specific version numbers for any software libraries or dependencies. It also references model architectures like RESNET-50 and ViT without associated software versions. |
| Experiment Setup | Yes | Optimization. All models are trained using the Adam (Kingma & Ba, 2017) optimizer. Based on an initial grid search, we use a learning rate of 10 4 for Res Net training from scratch and 10 3 for probing pre-trained features. All models are trained for 100 epochs with a batch size of 64. |