Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift
Authors: RENCHUNZI XIE, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted with various architectures on diverse distribution shifts demonstrate that our method significantly outperforms current state-of-the-art approaches. The code is available at https://github.com/Renchunzi-Xie/Gd Score. ... Section 5 Experiments |
| Researcher Affiliation | Collaboration | Renchunzi Xie EMAIL College of Computing and Data Science Nanyang Technological University; Ambroise Odonnat EMAIL Huawei Noah s Ark Lab, Inria Paris, France |
| Pseudocode | Yes | B Pseudo-code of Gd Score Our proposed Gd Score for unsupervised accuracy estimation can be calculated as shown in Algorithm 1. Algorithm 1: Unsupervised Accuracy Estimation via Gd Score |
| Open Source Code | Yes | The code is available at https://github.com/Renchunzi-Xie/Gd Score. |
| Open Datasets | Yes | For pre-training the neural network, we use CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), Tiny Image Net (Le & Yang, 2015), Image Net (Deng et al., 2009), Office-31 (Saenko et al., 2010), Office-Home (Venkateswara et al., 2017), Camelyon17-WILDS (Koh et al., 2021), and BREEDS (Santurkar et al., 2020) ... we use CIFAR-10C, CIFAR-100C, and Image Net-C (Hendrycks & Dietterich, 2019) ... Tiny Image Net-C (Hendrycks & Dietterich, 2019) |
| Dataset Splits | No | The paper mentions using specific datasets like CIFAR-10C, CIFAR-100C, Image Net-C which span 19 types of corruption across 5 severity levels, and Tiny Image Net-C with 15 types of corruption and 5 severity levels. It also refers to |
| Hardware Specification | No | The paper states: "To show the versatility of our approach across different architectures, we perform all our experiments on Res Net18, Res Net50 (He et al., 2016) and WRN-50-2 (Zagoruyko & Komodakis, 2016) models." However, no specific hardware (e.g., GPU, CPU models, or memory) used for these experiments is mentioned. |
| Software Dependencies | No | The paper mentions "SGD with a learning rate of 10 3, cosine learning rate decay (Loshchilov & Hutter, 2016), a momentum of 0.9, and a batch size of 128." These are algorithmic parameters. No specific software libraries or frameworks with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) are provided, which would be necessary for reproducibility. |
| Experiment Setup | Yes | Training details. To show the versatility of our approach across different architectures, we perform all our experiments on Res Net18, Res Net50 (He et al., 2016) and WRN-50-2 (Zagoruyko & Komodakis, 2016) models. We train them for 20 epochs for CIFAR-10 (Krizhevsky & Hinton, 2009) and 50 epochs for the other datasets. In all cases, we use SGD with a learning rate of 10 3, cosine learning rate decay (Loshchilov & Hutter, 2016), a momentum of 0.9, and a batch size of 128. For all experiments, we used p = 0.3 to compute Gd Score. |