Compositional Risk Minimization
Authors: Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, Pascal Vincent
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts. |
| Researcher Affiliation | Collaboration | Work done at Meta Joint last author 1Meta FAIR 2Mila, Universit e de Montr eal. Correspondence to: Divyat Mahajan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Compositional Risk Minimization (CRM) Input: Training set Dtrain = {(x, z)} Output: Classifier parameters ˆθ, ˆ W, B ... Algorithm 2 Compositional Risk Minimization (CRM) for 2 Attribute Case Input: training set Dtrain with examples (x, y, a), where y is the class to predict and a is an attribute spuriously correlated with y Output: classifier parameters θ, W, B . |
| Open Source Code | Yes | A practical method: CRM is a simple algorithm for training classifiers, which first trains an additive energy classifier and then adjusts the trained classifier for tackling compositional shifts. We empirically validate the superiority of CRM to other methods previously proposed for addressing subpopulation shifts. Our code repository can be accessed via the link in the footnote1. 1Github: facebookresearch/compositional-risk-minimization |
| Open Datasets | Yes | Following this procedure, we adapted Waterbirds (Wah et al., 2011), Celeb A (Liu et al., 2015), Meta Shift (Liang & Zou, 2022), Multi NLI (Williams et al., 2017), and Civil Comments (Borkan et al., 2019) for experiments. We also experiment with the NICO++ dataset (Zhang et al., 2023), where we already have Ztrain Ztest = Z as some groups were not present in the train dataset. |
| Dataset Splits | Yes | We repurpose these benchmarks for compositional shifts by discarding samples from one of the groups (z) in the train (and validation) dataset; but we don t change the test dataset, i.e., z Ztrain but z Ztest. Let us denote the data splits from the standard benchmarks as (Dtrain, Dval, Dtest). Then we generate multiple variants of compositional shifts {(D z train, D z val, Dtest) | z Z }, where D z train and D z val are generated by discarding samples from Dtrain and Dval that belong to the group z. Table 3. Statitics for the different benchmarks used in our experiments. (Contains columns for Train Size, Val Size, Test Size) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instance specifications. It mentions using ResNet50 and BERT as backbones but not the underlying hardware. |
| Software Dependencies | No | The paper mentions using Python-based tools like PyTorch (implied by PyTorch implementation snippet and use of AdamW optimizer by Paszke et al., 2017), but it does not specify any version numbers for Python, PyTorch, or other libraries. For example, it lists 'import torch' and 'import torchvision' but without versions. |
| Experiment Setup | Yes | Hyperparameter Selection. We rely on the group balanced accuracy on the validation set to determine the optimal hyperparameters. We specify the grids for each hyperparameter in Table 4, and train each method with 5 randomly drawn hyperparameters. The grid sizes for hyperparameter selection were designed following Pezeshki et al. (2023). Table 4. Details about the grids for hyperparameter selection. The choices for grid sizes were taken from Pezeshki et al. (2023). (Contains columns for Learning Rate, Weight Decay, Batch Size, Total Epochs with specific uniform ranges). |