Cross-Entropy Is All You Need To Invert the Data Generating Process

Authors: Patrik Reizinger, Alice Bizeul, Attila Juhos, Julia E Vogt, Randall Balestriero, Wieland Brendel, David Klindt

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical contribution with a series of empirical studies. First, using simulated data matching our theoretical assumptions, we demonstrate successful disentanglement of latent factors. Second, we show that on Dis Lib, a widely-used disentanglement benchmark, simple classification tasks recover latent structures up to linear transformations. Finally, we reveal that models trained on Image Net encode representations that permit linear decoding of proxy factors of variation. Together, our theoretical findings and experiments offer a compelling explanation for recent observations of linear representations, such as superposition in neural networks.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems, T ubingen AI Center, ELLIS Institute, T ubingen, Germany; 2Department of Computer Science, ETH Z urich and ETH AI Center, ETH Z urich, Z urich, Switzerland; 3Department of Computer Science, Brown University, Rhode Island, USA; 4Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA;
Pseudocode No The paper describes theoretical frameworks and algorithms but does not include any structured pseudocode or algorithm blocks. For instance, in Section 3 and Appendix B, the paper details Generalized Contrastive Learning (GCL) and Info NCE but presents them through textual descriptions and mathematical equations rather than pseudocode.
Open Source Code Yes We made our code publicly available on Git Hub1. 1https://github.com/klindtlab/csi
Open Datasets Yes First, using simulated data matching our theoretical assumptions... Second, we show that on Dis Lib, a widely-used disentanglement benchmark (Locatello et al., 2019)... Finally, we reveal that models trained on Image Net encode representations... using Image Net-X (Idrissi et al., 2022)... on the full Image Net dataset (Deng etg al., 2009).
Dataset Splits Yes We randomly split the data into 70% training and 30% testing data.
Hardware Specification No This research utilized compute resources at the T ubingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG.
Software Dependencies No The paper mentions using "Pytorch", "Adam optimizer", and "Logistic Regression module from sklearn" but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Both models were trained for 100 epochs, with the Adam optimizer, a learning rate of 0.001 and a batch size of 4096.