reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable?

Authors: Liangze Jiang, Damien Teney

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the learned selector identifies high-performing algorithms across synthetic, vision, and language tasks.
Researcher Affiliation	Academia	1EPFL, Switzerland 2Idiap Research Institute, Switzerland. Correspondence to: Liangze Jiang <EMAIL>.
Pseudocode	No	The paper only describes the methodology in natural language and flowcharts (e.g., Figure 1), without presenting a formal pseudocode block or algorithm steps.
Open Source Code	Yes	Code: https://github.com/ Liangze Jiang/OOD-Chameleon
Open Datasets	Yes	We evaluate OOD-Chameleon on seven applications from three domains: synthetic (Sagawa et al., 2020) , vision (Celeb A (Liu et al., 2015), Meta Shift (Liang & Zou, 2022), Office Home (Venkateswara et al., 2017), Colored MNIST (Arjovsky et al., 2020), and language (Civil Comments (Borkan et al., 2019), Multi NLI (Williams et al., 2017)).
Dataset Splits	Yes	We randomly split these datasets into 4:1 as the dataset of datasets for training the algorithm selector, and the unseen datasets for evaluation. ... We randomly split the generated Celeb A tasks into 4:1 training and test splits, and use the training part for the meta-dataset D.
Hardware Specification	No	The paper mentions using specific model architectures like Res Net18, CLIP, and BERT, but does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'Subpop Bench (Yang et al., 2023)' but does not provide specific version numbers for these or any other software dependencies or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We use 4-layer MLPs to parameterize the algorithm selector... We use Adam optimizer with default hyperparameters, along with L2 regularization. We train for 1000 epochs to ensure convergence. ... We use basic data augmentations (resize, crop, etc).