Towards a Formal Theory of Representational Compositionality
Authors: Eric Elmoznino, Thomas Jiralerspong, Yoshua Bengio, Guillaume Lajoie
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our definition on both real and synthetic data, and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We hope that our definition can inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought. We make our code available here. |
| Researcher Affiliation | Academia | 1Mila Quebec AI Institute 2Universit e de Montr eal. Correspondence to: Eric Elmoznino <EMAIL>, Guillaume Lajoie <EMAIL>. |
| Pseudocode | Yes | Figure 1. Form of the shortest program outputting a compositional representation Z. a. Pseudocode of the program, which describes the representation using sentences W (sequences of discrete tokens) that are compressed using a prior pw(w), and then maps these sentences to high-dimensional vectors in representation-space using a function f(w) that outputs the sufficient statistics of a Normal distribution. |
| Open Source Code | Yes | We make our code available here. |
| Open Datasets | Yes | COCO. sentence-transformers/coco-captions Datasets at Hugging Face, July 2024. URL https://huggingface.co/datasets/sentence-transformers/coco-captions. |
| Dataset Splits | Yes | We reserved 400 datapoints for a separate validation set that was used for early stopping at each iteration of prequential coding. |
| Hardware Specification | No | No specific hardware details (like CPU, GPU models, or memory) are provided in the paper. The text mentions "modern machine learning hardware" but lacks specific specifications. |
| Software Dependencies | No | The paper mentions "Lark Python package" for parsing but does not provide a specific version number. Other mentions like "Adam optimizer" are algorithms, and platforms like "Hugging Face" are not specific versioned software dependencies for replication. |
| Experiment Setup | Yes | The model architecture used for prequential coding was an MLP with 2 hidden layers of size 256. Each word in W L embedded into a 64-dimensional vector, and these concatenated embeddings were the input to the MLP. The MLP output logits over object values for each attribute. ... We used the Adam optimizer with a learning rate of 1 10 3 to train the model at each iteration of prequential coding. |