ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models

Authors: Guangtao Zheng, Wenqian Ye, Aidong Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze the effectiveness of the framework and empirically demonstrate that it is an efficient and practical tool for improving a model s robustness to spurious bias on diverse datasets. Through extensive experiments, we show that our method successfully trains models robust to spurious biases without prior knowledge about these biases. Section 5: Experiments, Section 5.1: Datasets, Section 5.2: Experimental Setup, Section 5.3: Analysis of Probe Set, Section 5.4: Main Results (Tables 1, 2, 3), Section 5.5: Ablation Studies (Figure 3).
Researcher Affiliation Academia Guangtao Zheng , Wenqian Ye and Aidong Zhang University of Virginia EMAIL
Pseudocode No The paper describes the methodology using prose and mathematical equations. It mentions "Details of the training algorithm are provided in Appendix." but the appendix content is not provided in the analyzed text. Therefore, no structured pseudocode or algorithm blocks are present in the provided paper text.
Open Source Code Yes Code is available at https://github.com/gtzheng/Shortcut Probe.
Open Datasets Yes Waterbirds [Sagawa et al., 2019], Celeb A [Liu et al., 2015], Che Xpert [Irvin et al., 2019], Image Net-9 [Ilyas et al., 2019] is a subset of Image Net [Deng et al., 2009], Image Net-A [Hendrycks et al., 2021], NICO [He et al., 2021], Multi NLI [Williams et al., 2017], Civil Comments [Borkan et al., 2019].
Dataset Splits Yes From the chosen data source, such as the training or validation set, we sorted the samples within each class by their prediction losses and divided them into two equal halves: a high-loss set and a low-loss set. ... Then, we retrained the model on half of the validation set using various bias mitigation methods. For our method, we first constructed the probe set using the same half of the validation set and used the probe set for shortcut detection and mitigation. The remaining half of the validation set was used for model selection and hyperparameter tuning. ... We prepared the training and validation data as in [Kim et al., 2022] and [Bahng et al., 2020].
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or detailed computing environments used for the experiments.
Software Dependencies No The paper mentions using "Res Net-50 as the backbone network", "Res Net-18", and a "pretrained BERT model [Kenton and Toutanova, 2019]" but does not provide specific version numbers for these or any other core software libraries/frameworks.
Experiment Setup Yes We first trained a base model initialized with pretrained weights using empirical risk minimization (ERM) on the training dataset. Then, we retrained the model on half of the validation set... The remaining half of the validation set was used for model selection and hyperparameter tuning. ... ψ = arg min ψ Ldet + ηLreg, where η > 0 represents the regularization strength. ... θ 2 = arg min θ2 λLori/Lspu, where λ > 0 is the regularization strength. ... We retrain only the final classification layer of the model while keeping the feature extractor frozen.