Structure-informed Risk Minimization for Robust Ensemble Learning
Authors: Fengchun Qiao, Yanlin Chen, Xi Peng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SRM on two common Oo D generalization benchmarks, Domain Bed (Gulrajani & Lopez-Paz, 2020) and WILDS (Koh et al., 2021). Following the standard practice, we use a held-out validation set from training distributions on Domain Bed benchmark and validation distributions on WILDS benchmark for model selection. We provide implementation details and additional results in the Appendix. We provide the source code in the supplementary material. Baselines. We compare SRM with the following methods: (1) Uniform Ensemble; (2) Greedy Selection; (3) Empirical Risk Minimization (ERM) (Vapnik & Vapnik, 1998); (4) Uniform Prior; (5) Laplacian Prior; (6) Group Distributionally Robust Optimization (DRO) (Sagawa et al., 2019). These methods can be grouped into two categories: (1) Nonoptimization-based, where the ensemble weight is obtained without the need for optimization (Uniform Ensemble and Greedy Selection); (2) Optimization-based, where the ensemble weight is learned through an optimization process (ERM, Uniform Prior, Laplacian Prior and DRO). 4.1. Domain Bed Benchmark Datasets. We conduct experiments on five datasets: Terra Incognita (Beery et al., 2018), VLCS (Fang et al., 2013), Office Home (Venkateswara et al., 2017), PACS (Li et al., 2017), and Domain Net (Peng et al., 2019). |
| Researcher Affiliation | Academia | 1Deep REAL Lab, Department of Computer and Information Sciences, University of Delaware, DE, USA. Correspondence to: Xi Peng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Structure-informed Risk Minimization (SRM) Input: Data of Etrain, Step sizes ηw and ηq Output: Learned ensemble weights w // Construct graph G and compute prior p for i, j {1, . . . , n} do D(Pi, Pj) µi µj 2 2 + Σ1/2 i Σ1/2 j 2 F Aij D(Pi, Pj) end c(Pe) [Pn j=1 d(Pe, Pj)] 1 // Closeness centrality pe c(Pe)/ Pn j=1 c(Pj) // Prior distribution // Optimize weights Initialize w0 1 n1 while not converged do Calculate L(w, q) via Eq. 9 Update ensemble weights wt+1 via Eq. 10 Update mixture weights qt+1 via Eq. 11 end |
| Open Source Code | Yes | Code is available at: https: //github.com/deep-real/SRM. |
| Open Datasets | Yes | We evaluate SRM on two common Oo D generalization benchmarks, Domain Bed (Gulrajani & Lopez-Paz, 2020) and WILDS (Koh et al., 2021). ... We conduct experiments on five datasets: Terra Incognita (Beery et al., 2018), VLCS (Fang et al., 2013), Office Home (Venkateswara et al., 2017), PACS (Li et al., 2017), and Domain Net (Peng et al., 2019). ... We evaluate SRM on FMo W-WILDS (Koh et al., 2021) dataset, which comprises satellite images collected from different geographical regions across five continents at different time. |
| Dataset Splits | Yes | Following the standard practice, we use a held-out validation set from training distributions on Domain Bed benchmark and validation distributions on WILDS benchmark for model selection. ... For each dataset, we hold one distribution out for test and train on the remaining ones, and report the average accuracies over all test distributions. ... Apart from the original train-test split scheme (Test After 2016), where training distributions consist of years 2002 to 2013, test distributions consists of years 2016 and 2017, and years 2013 to 2016 are reserved for validation, we further propose two train-test split schemes which cover more diverse distribution shift scenarios: (1) Test Before 2004, where years 2007 to 2018 are for training, 2002 to 2004 are for testing, 2004 to 2007 are for validation; (2) Test Middle, where years 2002 to 2008 and years 2012 to 2018 are for training, 2009-2011 are for testing, 2008 and 2011 are for validation. |
| Hardware Specification | No | The paper does not explicitly provide details about the specific hardware used for running its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | We use DiWA (Rame et al., 2022) to train the models in the ensemble pool. Each model in the ensemble pool is a ResNet50 (He et al., 2016) model trained with ERM (Vapnik & Vapnik, 1998) using different hyper-parameter settings. ... For optimizing w and q, we use SGD optimizer. The paper mentions software tools and frameworks but does not provide specific version numbers for them. |
| Experiment Setup | Yes | The number of models (n) used in the experiments is 10. A random model in the ensemble pool is chosen to construct the distribution graph. For optimizing w and q, we use SGD optimizer. For the experiments on Domain Bed, we set ηw = 0.1 and ηq = 0.1, and for WILDS, we set ηw = 3e-2 and ηq = 0.1. λ is selected from [0.0, 2.0] for each dataset. We use in-distribution validation set to optimize w and q, and the number of steps is 100 and 50 for Domain Bed and WILDS, respectively. |