Towards a Formal Theory of Representational Compositionality

Authors: Eric Elmoznino, Thomas Jiralerspong, Yoshua Bengio, Guillaume Lajoie

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our definition on both real and synthetic data, and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We hope that our definition can inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought. We make our code available here.
Researcher Affiliation Academia 1Mila Quebec AI Institute 2Universit e de Montr eal. Correspondence to: Eric Elmoznino <EMAIL>, Guillaume Lajoie <EMAIL>.
Pseudocode Yes Figure 1. Form of the shortest program outputting a compositional representation Z. a. Pseudocode of the program, which describes the representation using sentences W (sequences of discrete tokens) that are compressed using a prior pw(w), and then maps these sentences to high-dimensional vectors in representation-space using a function f(w) that outputs the sufficient statistics of a Normal distribution.
Open Source Code Yes We make our code available here.
Open Datasets Yes COCO. sentence-transformers/coco-captions Datasets at Hugging Face, July 2024. URL https://huggingface.co/datasets/sentence-transformers/coco-captions.
Dataset Splits Yes We reserved 400 datapoints for a separate validation set that was used for early stopping at each iteration of prequential coding.
Hardware Specification No No specific hardware details (like CPU, GPU models, or memory) are provided in the paper. The text mentions "modern machine learning hardware" but lacks specific specifications.
Software Dependencies No The paper mentions "Lark Python package" for parsing but does not provide a specific version number. Other mentions like "Adam optimizer" are algorithms, and platforms like "Hugging Face" are not specific versioned software dependencies for replication.
Experiment Setup Yes The model architecture used for prequential coding was an MLP with 2 hidden layers of size 256. Each word in W L embedded into a 64-dimensional vector, and these concatenated embeddings were the input to the MLP. The MLP output logits over object values for each attribute. ... We used the Adam optimizer with a learning rate of 1 10 3 to train the model at each iteration of prequential coding.