Compositional Generalization via Forced Rendering of Disentangled Latents
Authors: Qiyao Liang, Daoyuan Qian, Liu Ziyin, Ila R Fiete
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we investigate a controlled 2D Gaussian bump generation task with fully disentangled (x, y) inputs, demonstrating that standard generative architectures still fail in OOD regions when training with partial data, by re-entangling latent representations in subsequent layers. By examining the model s learned kernels and manifold geometry, we show that this failure reflects a memorization strategy for generation via data superposition rather than via composition of the true factorized features. We show that when models are forced through architectural modifications with regularization or curated training data to render the disentangled latents into the full-dimensional representational (pixel) space, they can be highly data-efficient and effective at composing in OOD regions. We provide a detailed empirical investigation into why disentangled representations often fail to achieve robust compositional generalization. Fig. 1(b-c) shows the MSE error contour plots and sample generated ID/OOD images. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology, Cambridge MA, USA 02139 2University of Cambridge, Cambridge CB2 1EW, U.K 3NTT Research. Correspondence to: Qiyao Liang <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses in prose and mathematical notation; there are no explicitly labeled pseudocode blocks or algorithm sections. |
| Open Source Code | Yes | Code available at github.com/qiyaoliang/Disentangled Comp Gen |
| Open Datasets | Yes | Concretely, we focus on a synthetic 2D Gaussian bump generation task, where a network learns to decode given (x, y) coordinates into a spatial image. In a set of follow-up experiments on MNIST image rotation, we observe that this same superposition of memorized activation patterns drawn from related in-distribution data emerges as a general OOD generalization strategy in neural networks that have learned to memorize ID data. |
| Dataset Splits | Yes | The training set covers a square-donut shaped region of (x, y) with a large OOD region in the center of the training distribution (Fig. 1(a)); alternatively, the OOD region is an equivalent area in the bottom left corner of image. In both cases, the model sees all x and y values in training but many combinations are held out. We generate a dataset of this rotated image 4 for various rotation angles θ [0, 360 ] while leaving out an OOD range θOOD (160 , 200 ). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and types of architectures (CNN, MLP) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Task setup. We consider a generative model that must output an N N grayscale image containing a Gaussian bump at a specified 2D (x, y) location within some bounding box (e.g. [0, N]2). ... Architecture. Concretely, we focus on a CNN-based decoder-only architecture that maps disentangled latent inputs to image outputs. ... Disentangled input encodings. ... Population-based coding, ... Ramp coding, ... In each case, the input representation is disentangled w.r.t. x and y ... Architectural regularization for low-rank embeddings ... This design promotes interpretability and compositional disentanglement in the learned features. ... Ltotal = LMSE + Lent + Lvar. Appendix D (Table 1) provides "Optimizer Adam W (Learning Rate: 1 10 3)". |