Interaction Asymmetry: A General Principle for Learning Composable Abstractions
Authors: Jack Brady, Julius von Kügelgen, Sebastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results unify recent theoretical results for learning concepts of objects, which we show are recovered as special cases with n = 0 or 1. We provide results for up to n = 2, thus extending these prior works to more flexible generator functions, and conjecture that the same proof strategies generalize to larger n. Practically, our theory suggests that, to disentangle concepts, an autoencoder should penalize its latent capacity and the interactions between concepts during decoding. We propose an implementation of these criteria using a flexible Transformer-based VAE, with a novel regularizer on the attention weights of the decoder. On synthetic image datasets consisting of objects, we provide evidence that this model can achieve comparable object disentanglement to existing models that use more explicit object-centric priors. |
| Researcher Affiliation | Collaboration | 1 Max Planck Institute for Intelligent Systems, Tübingen 2 Tübingen AI Center 3 ETH Zürich 4 Samsung SAIT AI Lab, Montreal 5 Google DeepMind 6 ELLIS Institute, Tübingen |
| Pseudocode | No | The provided text describes the method and theory but does not include any clearly labeled pseudocode or algorithm blocks. It mentions code availability but not pseudocode within the paper content. |
| Open Source Code | Yes | Code available at: github.com/Jack Brady/interaction-asymmetry |
| Open Datasets | Yes | We test this model’s ability to disentangle concepts of visual objects on a Sprites dataset (Watters et al., 2019a) and on CLEVR6 (Johnson et al., 2017). ... We conduct additional experiments on the CLEVRTex dataset (Karazija et al., 2021). |
| Dataset Splits | Yes | For Sprites, we use 5,000 images for validation, 5,000 for testing, and the rest for training, while for CLEVR6, we use 2,000 images for validation and 2,000 for testing. ... CLEVRTex dataset (Karazija et al., 2021).This dataset... We use 40,000 images for training and 5,000 for validation and testing, respectively. |
| Hardware Specification | No | The paper mentions using "compute resources at the Tübingen Machine Learning Cloud" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using "Adam optimizer (Kingma and Ba, 2015)" and "PyTorch code" but does not specify version numbers for these or other software libraries or frameworks. |
| Experiment Setup | Yes | We train all models across 3 random seeds using batches of 32. In all cases, we use the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 5e-4 which we warm-up for the first 10,000 training iterations and then decay by a factor of 10 throughout training. We also warm-up the value of α for the first 25,000 training iterations. ... For CLEVR6, we use batches of 32 and train for 400,000 iterations. ... We train all models on Spriteworld across 3 random seeds using batches of 64 for 500,000 iterations. |