reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Out-of-Distribution Detection via Dynamic Covariance Calibration

Authors: Kaiyu Guo, Zijian Wang, Tan Pan, Brian C. Lovell, Mahsa Baktashmotlagh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our approach significantly enhances OOD detection across various models. We evaluate our method on two pre-trained models for the CIFAR dataset and five pre-trained models for Image Net-1k, including the self-supervised DINO model.
Researcher Affiliation	Academia	1School of Electrical Engineering and Computer Science, University of Queensland, Brisbane, Australia 2Shanghai Academy of AI for Science (SAIS), Shanghai, China 3Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, China. Correspondence to: Mahsa Baktashmotlagh <EMAIL>.
Pseudocode	Yes	C. Algorithm Details We present the pseudocode for our method in Algorithm 1.
Open Source Code	Yes	The code is released at https://github.com/workerbcd/ooddcc.
Open Datasets	Yes	In the first benchmark, we use the official test split of CIFAR-10/CIFAR-100 as the indistribution (ID) datasets, with six datasets serving as OOD data: SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), Places365 (Zhou et al., 2017), LSUN (Yu et al., 2015), LSUN Resize (Yu et al., 2015) and i SUN (Xu et al., 2015). For the second benchmark, we follow (Huang & Li, 2021) to adopt Image Net-1k (Deng et al., 2009) as the ID dataset. There are six OOD datasets selected, including Texture (Cimpoi et al., 2014), SUN (Xiao et al., 2010), Places (Zhou et al., 2017), i Naturalist (Van Horn et al., 2018), Imagenet-O (Hendrycks et al., 2021) and Open Image O (Wang et al., 2022).
Dataset Splits	Yes	In the first benchmark, we use the official test split of CIFAR-10/CIFAR-100 as the indistribution (ID) datasets, with six datasets serving as OOD data: SVHN (Netzer et al., 2011)... There are no overlapping categories between the OOD datasets and the ID dataset.
Hardware Specification	Yes	It takes around 0.002 seconds to calculate the score from each feature with 16GB V100 GPU.
Software Dependencies	No	The paper implies the use of PyTorch through function calls like 'torch.einsum' and 'torch.linalg.inv' in Algorithm 1, but it does not explicitly list specific software dependencies with version numbers.
Experiment Setup	No	The main text describes the models and datasets used (e.g., Dense Net, Wide Res Net, Vi T, Res Net-50, DINO, CIFAR, ImageNet) and some general procedures like L2 normalization, but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs, optimizer settings) or detailed training configurations in the main text.