Cross-Entropy Is All You Need To Invert the Data Generating Process
Authors: Patrik Reizinger, Alice Bizeul, Attila Juhos, Julia E Vogt, Randall Balestriero, Wieland Brendel, David Klindt
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical contribution with a series of empirical studies. First, using simulated data matching our theoretical assumptions, we demonstrate successful disentanglement of latent factors. Second, we show that on Dis Lib, a widely-used disentanglement benchmark, simple classification tasks recover latent structures up to linear transformations. Finally, we reveal that models trained on Image Net encode representations that permit linear decoding of proxy factors of variation. Together, our theoretical findings and experiments offer a compelling explanation for recent observations of linear representations, such as superposition in neural networks. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, T ubingen AI Center, ELLIS Institute, T ubingen, Germany; 2Department of Computer Science, ETH Z urich and ETH AI Center, ETH Z urich, Z urich, Switzerland; 3Department of Computer Science, Brown University, Rhode Island, USA; 4Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA; |
| Pseudocode | No | The paper describes theoretical frameworks and algorithms but does not include any structured pseudocode or algorithm blocks. For instance, in Section 3 and Appendix B, the paper details Generalized Contrastive Learning (GCL) and Info NCE but presents them through textual descriptions and mathematical equations rather than pseudocode. |
| Open Source Code | Yes | We made our code publicly available on Git Hub1. 1https://github.com/klindtlab/csi |
| Open Datasets | Yes | First, using simulated data matching our theoretical assumptions... Second, we show that on Dis Lib, a widely-used disentanglement benchmark (Locatello et al., 2019)... Finally, we reveal that models trained on Image Net encode representations... using Image Net-X (Idrissi et al., 2022)... on the full Image Net dataset (Deng etg al., 2009). |
| Dataset Splits | Yes | We randomly split the data into 70% training and 30% testing data. |
| Hardware Specification | No | This research utilized compute resources at the T ubingen Machine Learning Cloud, DFG FKZ INST 37/1057-1 FUGG. |
| Software Dependencies | No | The paper mentions using "Pytorch", "Adam optimizer", and "Logistic Regression module from sklearn" but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Both models were trained for 100 epochs, with the Adam optimizer, a learning rate of 0.001 and a batch size of 4096. |