reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Controllable Generative Modeling via Causal Reasoning

Authors: Joey Bose, Ricardo Pio Monti, Aditya Grover

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of large-scale synthetic and human evaluations, we demonstrate that generating counterfactual samples which respect the underlying causal relationships inferred via CAGE leads to subjectively more realistic images.
Researcher Affiliation	Collaboration	Avishek Joey Bose EMAIL Mc Gill University and Mila Ricardo Pio Monti EMAIL Meta Aditya Grover EMAIL UCLA
Pseudocode	Yes	Algorithm 1 Obtaining CAGE scores τ
Open Source Code	No	The paper does not provide any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. It mentions using existing libraries or models but not providing their own implementation code.
Open Datasets	Yes	We study the consistency of the inferred causal directions for two high-dimensional image datasets, Morpho MNIST (Pawlowski et al., 2020) and Celeb AHQ (Karras et al., 2017), with known or biologically guessed prior cause-effect relationships. For our experiments, whenever possible, we used the default settings found in the original papers of all chosen models. In particular, we used the default settings for both Mintnet (Song et al., 2019) and Style GAN2 (Karras et al., 2020) which were pretrained on MNIST and Flickr Faces HQ respectively.
Dataset Splits	No	The paper mentions using a 'test set samples' for Morpho MNIST and finetuning on '10% of Celeb AHQ' but does not specify the exact percentages, sample counts, or methodology for creating comprehensive training, validation, and test splits for the datasets used in their experiments.
Hardware Specification	No	Finally, the authors would also like to thank Meta AI for the computing resources that made this work possible. The paper mentions computing resources provided by Meta AI but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	For latent linear classifiers we use an SVM based classifier as found in the widely used Sci-kit learn library (Pedregosa et al., 2011). The paper mentions the use of 'Sci-kit learn library' and several models (Mintnet, Style GAN2, Masked Autoregressive Flow) but does not provide specific version numbers for these or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For Mintnet we finetuned on a Morpho Mnist dataset for 250 epochs using the Adam optimizer with default settings. Similarly, we also finetuned Style GAN2 on Celeb AHQ for 2000 iterations using 10% of Celeb AHQ. Our synthetic experiments on the other required us to train a Masked Auto Regressive Flow (Papamakarios et al., 2017) that consisted of 10 layers with 4 blocks per layer. The Masked Auto Regressive Flow was trained for 5000 iterations using 5000 data samples.