Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization

Authors: Yuanchao Wang, Zhao-Rong Lai, Tianqi Zhong

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that OOD-TV-IRM outperforms IRM-TV in most situations. Table 2: Accuracies of different methods on simulation data (left) and Celeb A (right).
Researcher Affiliation Academia Yuanchao Wang1,2, Zhao-Rong Lai1 , Tianqi Zhong3 1Guangdong Key Laboratory of Data Security and Privacy Preserving, College of Cyber Security, Jinan University 2Pratt School of Engineering, Duke University 3Sino-French Engineer School, Beihang University EMAIL, EMAIL, EMAIL
Pseudocode Yes We develop a primal-dual algorithm to solve OOD-TV-IRM (15) or OOD-TV-Minimax (16). To do this, we further assume that R(w Φ) and λ(Ψ, Φ) are differentiable w.r.t. their corresponding arguments w, Φ, and Ψ. As indicated in (Lai & Wang, 2024), | w R(w Φ)| in (15) or (16) is nondifferentiable w.r.t. Φ. Hence we adopt the subgradient descent method to update these parameters, as illustrated in Appendix A.2. Then the primal and dual updates for (15) or (16) are: ( Φ(k+1) = Φ(k) η(k) 1 Φg(Ψ(k), Φ(k)), Ψ(k+1) = Ψ(k) + η(k) 2 Ψg(Ψ(k), Φ(k+1)); (17) ( Φ(k+1) = Φ(k) η(k) 1 Φh(ρ(k), Ψ(k), Φ(k)), (ρ(k+1), Ψ(k+1)) = (ρ(k), Ψ(k)) + η(k) 2 (ρ,Ψ)h(ρ(k), Ψ(k), Φ(k+1)), (18)
Open Source Code Yes The code is available at https://github.com/laizhr/OOD-TV-IRM.
Open Datasets Yes The Celeb A data set (Liu et al., 2015) contains face images of celebrities. The Landcover data set consists of time series data and the corresponding land cover types derived from satellite images (Gislason et al., 2006; Russwurm et al., 2020; Xie et al., 2021). This task uses the Adult data set1 to predict whether an individual s income exceeds $50K per year based on census data. 1https://archive.ics.uci.edu/dataset/2/adult We also perform a regression task with the House Prices data set2. 2https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data We use the Colored MNIST data set3 to evaluate the performance of our approach in multi-group classification of a more general scenario. 3https://www.kaggle.com/datasets/youssifhisham/colored-mnist-dataset The NICO data set (He et al., 2021) is a widely-used benchmark in Non-Independent and Identically Distributed (Non-I.I.D.) image classification with contexts.
Dataset Splits Yes Two-thirds of the data from the Black Male and Non-Black Female subgroups are randomly selected for training, and the compared methods are verified across all four subgroups using the remaining data. (Adult) The training and test sets consist of samples with built years in periods [1900, 1950] and (1950, 2000], respectively. (House Price) We randomly split each context into 80% training samples and 20% test samples. (NICO) All methods are trained on non-African data, and then tested on both non-African (from regions not overlapping with the training data) and African regions. (Landcover) The training data is generated with parameter settings (p s , p+ s , pv), where p s and p+ s denote the ps(t) setting for t [0, 0.5) and t [0.5, 1], respectively. As for the test data, we set ps {0.999, 0.8, 0.2, 0.001} and keep the same pv. (Simulation)
Hardware Specification No No specific hardware details are provided in the paper. The paper describes neural network architectures and experimental results but does not mention the GPU models, CPU types, or other hardware specifications used for training or inference.
Software Dependencies No The Adam scheme (Kingma & Ba, 2015) is adopted as the optimizer. (Section 4.8) and mainstream learning architectures (like Pytorch4). (Appendix A.2) However, no specific version numbers for Adam, Pytorch, or other key software components are provided.
Experiment Setup Yes Each experiment is repeated 10 times to record the mean and standard deviation (STD) of the results for each compared method. (Section 4) The Adam scheme (Kingma & Ba, 2015) is adopted as the optimizer. (Section 4.8) We follow the annealing strategy of (Lin et al., 2022) in the early epochs, thus the adversarial learning starts from the 2001st epoch. (Section 4.8) The adversarial learning process effectively converges after 600 epochs for OOD-TV-Minimax-ℓ1 and after 300 epochs for OOD-TV-Minimax-ℓ2. (Section 4.8)