OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?

Authors: Liangze Jiang, Damien Teney

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the learned selector identifies high-performing algorithms across synthetic, vision, and language tasks.
Researcher Affiliation Academia 1EPFL, Switzerland 2Idiap Research Institute, Switzerland. Correspondence to: Liangze Jiang <EMAIL>.
Pseudocode No The paper only describes the methodology in natural language and flowcharts (e.g., Figure 1), without presenting a formal pseudocode block or algorithm steps.
Open Source Code Yes Code: https://github.com/ Liangze Jiang/OOD-Chameleon
Open Datasets Yes We evaluate OOD-Chameleon on seven applications from three domains: synthetic (Sagawa et al., 2020) , vision (Celeb A (Liu et al., 2015), Meta Shift (Liang & Zou, 2022), Office Home (Venkateswara et al., 2017), Colored MNIST (Arjovsky et al., 2020), and language (Civil Comments (Borkan et al., 2019), Multi NLI (Williams et al., 2017)).
Dataset Splits Yes We randomly split these datasets into 4:1 as the dataset of datasets for training the algorithm selector, and the unseen datasets for evaluation. ... We randomly split the generated Celeb A tasks into 4:1 training and test splits, and use the training part for the meta-dataset D.
Hardware Specification No The paper mentions using specific model architectures like Res Net18, CLIP, and BERT, but does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'Subpop Bench (Yang et al., 2023)' but does not provide specific version numbers for these or any other software dependencies or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We use 4-layer MLPs to parameterize the algorithm selector... We use Adam optimizer with default hyperparameters, along with L2 regularization. We train for 1000 epochs to ensure convergence. ... We use basic data augmentations (resize, crop, etc).