reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Desiderata for Representation Learning: A Causal Perspective

Authors: Yixin Wang, Michael I. Jordan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	2.5 Empirical Studies of CAUSAL-REP: We study CAUSAL-REP in both image and text datasets. We study the following questions: 1. How well do probabilities of causation measure eﬃciency and non-spuriousness of features? 2. Does supervised CAUSAL-REP produce non-spurious representations for synthetic (Section 2.5.2), image (Section 2.5.3), and text (Section 2.5.4) data? 3. How well does unsupervised CAUSAL-REP perform on instance discrimination? (Section 2.5.5). We find that probabilities of causation (POC) are eﬀective in distinguishing eﬃcient/ineﬃcient and non-spurious/spurious representations. Moreover, CAUSAL-REP ﬁnds non-spurious features in both supervised and unsupervised settings; it also outperforms existing unsupervised representation learning algorithms in downstream prediction. 3.4 Empirical Studies of IOSS: We study IOSS in unsupervised image datasets. We ﬁnd that IOSS is more eﬀective at distinguishing between disentangled and entangled representations than other unsupervised disentanglement metrics. Unsupervised representation learning with the IOSS penalty results in representations with better disentanglement.
Researcher Affiliation	Academia	Yixin Wang EMAIL Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA Michael I. Jordan EMAIL EECS and Statistics, University of California, Berkeley, CA 94720, USA INRIA, 75013 Paris, France
Pseudocode	Yes	Algorithm 1: Calculating the (lower bound of) eﬃciency and non-spuriousness of a representation, Algorithm 2: CAUSAL-REP (Supervised), Algorithm 3: Disentangled representation learning with IOSS, Algorithm 4: CAUSAL-REP (Unsupervised)
Open Source Code	Yes	To aid reproducibility, we have included source code that can reproduce the results at https://github.com/yixinwang/representation-causal-public.
Open Datasets	Yes	The Colored MNIST study. We focus on the colored MNIST data with the digits 3 and 8 and colors red and green ; see Appendix G.2 for the detailed experimental setup. The Celeb A study. We next study CAUSAL-REP on the Celeb A dataset (Liu et al., 2015). The reviews corpora study. We begin with the raw reviews datasets from Amazon, Tripadvisor,7 and Yelp8. 7http://times.cs.uiuc.edu/ wang296/Data/ 8https://www.yelp.com/dataset/documentation/main. Finally, we study CAUSAL-REP in the unsupervised setting. We focus on image datasets in the unsupervised setting because non-spurious features that distinguish diﬀerent images are more readily deﬁned in the image domain than in the text domain. We evaluate the non-spuriousness of the unsupervised CAUSAL-REP again by its predictive performance on non-spuriousness test sets. Given the unsupervised CAUSAL-REP representation, we ﬁt a prediction model to the target label and test its predictive performance on non-spuriousness test sets.
Dataset Splits	Yes	We create training and test sets, each containing 5,000 data points, by subsampling the Celeb A datasets. We create two test datasets: one is in-distribution heldout test set with the spurious words present as in the training set; the other is non-spuriousness test set without the randomly added spurious words. To create a training set, we color the 3 images in red with probability p and in green with probability (1 p). Next, we color the 8 images in red with probability (1 p) and in green with probability p. To create a test set, we color the images such that the color and images are correlated oppositely.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It lacks specific GPU models, CPU types, or other detailed specifications.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and the 'NLTK package' but does not specify any version numbers for these software components. This lack of version information makes the software dependencies unreproducible.
Experiment Setup	Yes	Across all experiments, we used the Adam optimizer and learning rate 0.01. For text experiments, we consider a bag-of-words representation with standard stop words removal with the NLTK package. We consider an implementation of the supervised CAUSAL-REP algorithm which adopts a VAE with 64 latent dimensions as the probabilistic factor model for pinpointing. For representation functions, we consider a two-layer neural network with 20-dimensional outputs. We further add noise to the ground truth digit label by randomly ﬂipping the labels with a probability of 0.25. We then ﬁt VAEs with an increasingly strong regularization with the IOSS penalty.