reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Combine and Conquer: A Meta-Analysis on Data Shift and Out-of-Distribution Detection

Authors: Eduardo Dadalto Câmara Gomes, Florence Alberge, Pierre Duhamel, Pablo Piantanida

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through empirical investigation, we explore different types of shifts, each exerting varying degrees of impact on data. Our results demonstrate that our approach significantly improves overall robustness and performance across diverse OOD detection scenarios. Notably, our framework is easily extensible for future developments in detection scores and stands as the first to combine decision boundaries in this context. The code and artifacts associated with this work are publicly available1.
Researcher Affiliation	Academia	1Laboratoire des Signaux et Systèmes (L2S), Université Paris-Saclay 2SATIE Laboratory, Université Paris-Saclay, ENS Paris-Saclay, CNRS 3ILLS International Laboratory on Learning Systems, Mila Quebec AI Institute 4CNRS, Centrale Supélec
Pseudocode	Yes	Algorithm 1 Offline preparation algorithm for combining multiple detectors for OOD detection.
Open Source Code	Yes	The code and artifacts associated with this work are publicly available1. 1https://github.com/edadaltocg/detectors
Open Datasets	Yes	For all our main experiments, we set as in-distribution dataset Image Net-1K (=ILSVRC2012; Deng et al., 2009) on Res Net (He et al., 2016) and Vision Transformers (Dosovitskiy et al., 2021) models. ...far-OOD datasets: SSB-Easy (Vaze et al., 2022) ... Open Image-O (OI-O) (Wang et al., 2022) ... Places (Zhou et al., 2017) ... i Naturalist (Horn et al., 2017) ... Textures (Cimpoi et al., 2014); and the near-OOD datasets: SSBHard (Vaze et al., 2022) ... Species (Hendrycks et al., 2022) ... NINCO (Bitterwolf et al., 2023) ...we ran experiments with the Image Net-R (IN-R) (Hendrycks et al., 2021) dataset. ...we ran experiments with the corrupted Image Net (IN-C) (Hendrycks & Dietterich, 2019) dataset.
Dataset Splits	Yes	We evaluate the performance of the detectors by mixing the 50,000 testing samples from Image Net with the curated datasets from Bitterwolf et al. (2023). To simulate a novelty shift at test time, we fabricate fully ID windows and corrupted windows formed by a mixture of ID and OOD data from the Open Image-O (OI-O) (Wang et al., 2022) dataset with mixing parameter β as defined in Equation (2). To do so, each test window is compared to a fixed reference window of size r = 1000 extracted from a clean validation set. In a window based detection scenario, we make the assumptions that 1.) there are multiple reference samples available, 2.) the instance s class label are not available right after prediction, and 3.) the model is not updated. So, given a reference window Wr 1 p XY with r samples and test window Wm 2 = {x 1, . . . , x m} q X with sample size m, our task is to determine whether they are both sampled from the source distribution or, equivalently, whether p XY (x, y) equals q Xb Y (x , ˆy ) where ˆy = f(x ). Unlike the independent window-based detection setting introduced in Section 5.2, in this setup, we implement a sliding window of size 64 with a stride of one, so that the resulting windows contain overlapping data samples.
Hardware Specification	No	granted access to the HPC/AI resources of IDRIS under the allocation 2023 AD011012803R2 made by GENCI.
Software Dependencies	No	The paper mentions using Res Net and Vision Transformers models, but does not specify any software libraries or their version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	No	For all our main experiments, we set as in-distribution dataset Image Net-1K (=ILSVRC2012; Deng et al., 2009) on Res Net (He et al., 2016) and Vision Transformers (Dosovitskiy et al., 2021) models. We followed the hyperparameter selection procedure suggested in the original papers when needed. We ran experiments with β [0, 1] and with window sizes \|W\| {1, . . . , 1000}.