reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Comparing the information content of probabilistic representation spaces

Authors: Kieran A. Murphy, Sam Dillavou, Danielle Bassett

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of these measures in three case studies. First, in the context of unsupervised disentanglement, we identify recurring information fragments within individual latent dimensions of VAE and Info GAN ensembles. Second, we compare the full latent spaces of models and reveal consistent information content across datasets and methods, despite variability during training. Finally, we leverage the differentiability of our measures to perform model fusion, synthesizing the information content of weak learners into a single, coherent representation. Across these applications, the direct comparison of information content offers a natural basis for characterizing the processing of information. 4 Experiments
Researcher Affiliation	Academia	Kieran A. Murphy EMAIL Dept. of Bioengineering, University of Pennsylvania Sam Dillavou EMAIL Dept. of Physics & Astronomy, University of Pennsylvania Dani S. Bassett EMAIL Depts. of Bioengineering, Electrical & Systems Engineering, Physics & Astronomy, Neurology, Psychiatry, University of Pennsylvania; The Santa Fe Institute; The Neuro, Montreal Neurological Institute, Mc Gill University
Pseudocode	No	The paper describes methods narratively and does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce the experiments of Sec. 4 can be found at the following repository: https://github.com/murphyka/representation-space-info-comparison. The heart of the codebase is in utils.py, containing the Bhattacharyya and Monte Carlo calculations of I(U; X) and the NMI/VI calculations.
Open Datasets	Yes	In Fig. 3a, the regularization of a β-VAE is increased for the dsprites dataset (Higgins et al., 2017). Fig. 3b shows remarkable consistency of information fragmentation by an ensemble of β-VAEs trained on the cars3d dataset (Reed et al., 2015), Finally, we studied the manner of information fragmentation on datasets which are not simply an exhaustive set of combinations of generative factors (Fig. 3d,e). For β-VAE ensembles trained on fashion-mnist (Xiao et al., 2017) and celeb A (Liu et al., 2015), For the MNIST and Fashion-MNIST ensembles, we trained 50 β-VAEs with a 10-dimensional latent space. The encoder had the following architecture: ... (Le Cun et al., 2010)
Dataset Splits	No	The paper uses several standard datasets such as MNIST, Fashion-MNIST, dsprites, cars3d, smallnorb, and celeb A. However, it does not explicitly provide specific details on how these datasets were split into training, validation, or test sets for its experiments. For some datasets, it mentions using pre-trained models from other published works.
Hardware Specification	Yes	All experiments were implemented in Tensor Flow and run on a single computer with a 12 GB Ge Force RTX 3060 GPU.
Software Dependencies	No	All experiments were implemented in Tensor Flow and run on a single computer with a 12 GB Ge Force RTX 3060 GPU. We used the OPTICS implementation from sklearn5 with precomputed distance metric and min_samples= 20 (and all other parameters their default values).
Experiment Setup	Yes	For the MNIST and Fashion-MNIST ensembles, we trained 50 β-VAEs with a 10-dimensional latent space. The encoder had the following architecture: Conv2D: 32 4 4 Re LU kernels, stride 2, padding same Conv2D: 64 4 4 Re LU kernels, stride 2, padding same Reshape([-1]) Dense: 256 Re LU The decoder had the following architecture: Dense: 7 7 32 Re LU Reshape([7, 7, 32]) Conv2DTranspose: 64 4 4 Re LU kernels, stride 2, padding same Conv2DTranspose: 32 4 4 Re LU kernels, stride 2, padding same Conv2DTranspose: 1 4 4 Re LU kernels, stride 1, padding same . The models were trained for 2 105 steps, with a Bernoulli loss on the pixels, the Adam optimizer with a learning rate of 10 4, and a batch size of 64.