reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization

Authors: Yuanchao Wang, Zhao-Rong Lai, Tianqi Zhong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that OOD-TV-IRM outperforms IRM-TV in most situations. Table 2: Accuracies of different methods on simulation data (left) and Celeb A (right).
Researcher Affiliation	Academia	Yuanchao Wang1,2, Zhao-Rong Lai1 , Tianqi Zhong3 1Guangdong Key Laboratory of Data Security and Privacy Preserving, College of Cyber Security, Jinan University 2Pratt School of Engineering, Duke University 3Sino-French Engineer School, Beihang University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We develop a primal-dual algorithm to solve OOD-TV-IRM (15) or OOD-TV-Minimax (16). To do this, we further assume that R(w Φ) and λ(Ψ, Φ) are differentiable w.r.t. their corresponding arguments w, Φ, and Ψ. As indicated in (Lai & Wang, 2024), \| w R(w Φ)\| in (15) or (16) is nondifferentiable w.r.t. Φ. Hence we adopt the subgradient descent method to update these parameters, as illustrated in Appendix A.2. Then the primal and dual updates for (15) or (16) are: ( Φ(k+1) = Φ(k) η(k) 1 Φg(Ψ(k), Φ(k)), Ψ(k+1) = Ψ(k) + η(k) 2 Ψg(Ψ(k), Φ(k+1)); (17) ( Φ(k+1) = Φ(k) η(k) 1 Φh(ρ(k), Ψ(k), Φ(k)), (ρ(k+1), Ψ(k+1)) = (ρ(k), Ψ(k)) + η(k) 2 (ρ,Ψ)h(ρ(k), Ψ(k), Φ(k+1)), (18)
Open Source Code	Yes	The code is available at https://github.com/laizhr/OOD-TV-IRM.
Open Datasets	Yes	The Celeb A data set (Liu et al., 2015) contains face images of celebrities. The Landcover data set consists of time series data and the corresponding land cover types derived from satellite images (Gislason et al., 2006; Russwurm et al., 2020; Xie et al., 2021). This task uses the Adult data set1 to predict whether an individual s income exceeds $50K per year based on census data. 1https://archive.ics.uci.edu/dataset/2/adult We also perform a regression task with the House Prices data set2. 2https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data We use the Colored MNIST data set3 to evaluate the performance of our approach in multi-group classification of a more general scenario. 3https://www.kaggle.com/datasets/youssifhisham/colored-mnist-dataset The NICO data set (He et al., 2021) is a widely-used benchmark in Non-Independent and Identically Distributed (Non-I.I.D.) image classification with contexts.
Dataset Splits	Yes	Two-thirds of the data from the Black Male and Non-Black Female subgroups are randomly selected for training, and the compared methods are verified across all four subgroups using the remaining data. (Adult) The training and test sets consist of samples with built years in periods [1900, 1950] and (1950, 2000], respectively. (House Price) We randomly split each context into 80% training samples and 20% test samples. (NICO) All methods are trained on non-African data, and then tested on both non-African (from regions not overlapping with the training data) and African regions. (Landcover) The training data is generated with parameter settings (p s , p+ s , pv), where p s and p+ s denote the ps(t) setting for t [0, 0.5) and t [0.5, 1], respectively. As for the test data, we set ps {0.999, 0.8, 0.2, 0.001} and keep the same pv. (Simulation)
Hardware Specification	No	No specific hardware details are provided in the paper. The paper describes neural network architectures and experimental results but does not mention the GPU models, CPU types, or other hardware specifications used for training or inference.
Software Dependencies	No	The Adam scheme (Kingma & Ba, 2015) is adopted as the optimizer. (Section 4.8) and mainstream learning architectures (like Pytorch4). (Appendix A.2) However, no specific version numbers for Adam, Pytorch, or other key software components are provided.
Experiment Setup	Yes	Each experiment is repeated 10 times to record the mean and standard deviation (STD) of the results for each compared method. (Section 4) The Adam scheme (Kingma & Ba, 2015) is adopted as the optimizer. (Section 4.8) We follow the annealing strategy of (Lin et al., 2022) in the early epochs, thus the adversarial learning starts from the 2001st epoch. (Section 4.8) The adversarial learning process effectively converges after 600 epochs for OOD-TV-Minimax-ℓ1 and after 300 epochs for OOD-TV-Minimax-ℓ2. (Section 4.8)