reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift

Authors: Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, Anqi Liu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our WRI implementation on synthetic and real-world datasets with distribution shifts, particularly focusing on cases of covariate shift in the invariant features. In all of the datasets, we select our model based on a validation set from the test environment, and we report the test set accuracy averaged over 5 random seeds with standard errors. Training validation selection assumes that the training and test data are drawn from similar distributions, which is not the case we aim to be robust to (Gulrajani & Lopez-Paz, 2020); test validation selection is more useful for demonstrating OOD capabilities (Ruan et al., 2021). Our major baselines are ERM (Vapnik, 1991), IRM (Arjovsky et al., 2019), and VREx (Krueger et al., 2021). IRM and VREx are two other causally-motivated works that also search for conditional invariance as a signature of an underlying causal structure. Because WRI shares a similar theoretical grounding, we find it particularly important to compare our empirical performance with these works. Appendix E includes comparisons with other non-causal baselines, as well as additional experiments and details.
Researcher Affiliation	Academia	Gina Wong EMAIL Department of Computer Science Johns Hopkins University Joshua Gleason EMAIL Department of Electrical and Computer Engineering University of Maryland, College Park Rama Chellappa EMAIL Department of Electrical and Computer Engineering Johns Hopkins University Yoav Wald EMAIL Center for Data Science New York University Anqi Liu EMAIL Department of Computer Science Johns Hopkins University
Pseudocode	Yes	Algorithm D.1: WRI with model-based density
Open Source Code	Yes	The code for generating the figures and empirical results can be found at https://github.com/ginawong/weighted_risk_invariance/.
Open Datasets	Yes	We evaluate our WRI implementation on synthetic and real-world datasets with distribution shifts, particularly focusing on cases of covariate shift in the invariant features. In all of the datasets, we select our model based on a validation set from the test environment, and we report the test set accuracy averaged over 5 random seeds with standard errors. ... We evaluate our method on 5 real-world datasets that are part of the Domain Bed suite, namely VLCS (Fang et al., 2013), PACS (Li et al., 2017), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018), and Domain Net (Peng et al., 2019). ... Finally, we create ideal versions of these datasets where we simplify the image data to two-dimensional features of digit value and color. (More details on how these datasets were generated can be found in Appendix E.3.)
Dataset Splits	Yes	We create a heteroskedastic variant on this dataset, HCMNIST, where we vary the label flip probability with the digit. We also create HCMNIST-CS, as a version of HCMNIST with invariant covariate shift, by enforcing different distributions of digits in each environment. ... We place 65% of the digits 0, 1, 5, and 6 into the first training environment and 5% into the second environment (with the remaining 30% in test). The remaining digits are distributed so that all environments have the same number of samples.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It mentions using a Res Net50 model and ERM-trained features for computational efficiency, but not the specific hardware on which these computations were performed.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2017)' and 'scikit-learn (Pedregosa et al., 2011)', and discusses 'Python 3.8, PyTorch 1.9, and CUDA 11.1' as examples in Appendix D.2. However, it does not explicitly state the specific version numbers of these or other software dependencies used in their own experimental setup. For instance, for scikit-learn, only the citation is given, not the specific version used.
Experiment Setup	Yes	We specified the following Domain Bed hyperparameters and selected them according to the following distributions. Note that in Domain Bed, the default hyperparameters are used during the first hyperparameter seed, and random hyperparameters are selected for subsequent seeds. For any hyperparameters not listed here, we use the defaults provided by Domain Bed. Learning-rate default: 1e-2, random: log-uniform over [5e-3, 1e-1]. IRM-anneal iters default 50, random: log-uniform over [1, 500]. Number of steps before penalty weight is increased. VREx-anneal iters default 50, random: log-uniform over [1, 500]. Number of steps before penalty weight is increased. Featurizer dimensions default: 64, random: discrete uniform from {32, 64}. The dimensionality of the pretrained features used. WRI-λ default: 1, random: log-uniform over [1e-1, 5e1]. The penalty weight used when computing L in the predictor optimization step. WRI-annealing default: 0, random: discrete uniform from {0, 10}. Number of steps before WRI regularization term is included in loss function. WRI-density update freq (ω) default: 1, random: discrete uniform from {1, 2, 4, 8}. Number of predictor optimization steps between density optimization steps. WRI-density learning rate default: 2e-2, random: log-uniform over [1e-2, 5e-2]. Learning rate used for the density estimate optimizer. WRI-density weight decay default: 1e-5, random: log-uniform over [1e-6, 1e-2]. The weight decay used for the density estimate optimizer. WRI-density batch size default: 256, random: discrete uniform from {128, 256}. The batch size used when optimizing for density estimates. WRI-density λ default: 5, random: log-uniform from [5, 50]. The penalty weight used when computing L in the density optimization step. WRI-density β default: 2e2, random: log-uniform from [5e-2, 5]. The negative log penalty weight used when computing L in the density optimization step. WRI-density steps (nd) default: 4, random: discrete uniform from {4, 16, 32}. The number of optimization steps taken each time the density estimators are updated. WRI-min density default: 0.05, random: uniform from [0.01, 0.2]. The minimum density imposed via scaled shifted sigmoid activation on density estimator models. WRI-max density default: 1, random: uniform from [0.4, 2]. The maxmimum density imposed via scaled shifted sigmoid activation on density estimator models.