Finding and Fixing Spurious Patterns with Explanations

Authors: Gregory Plumb, Marco Tulio Ribeiro, Ameet Talwalkar

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our method identifies a diverse set of spurious patterns and that it mitigates them by producing a model that is both more accurate on a distribution where the spurious pattern is not helpful and more robust to distribution shift. We divide our experiments into three groups: In Section 5.1, we induce SPs with varying strengths by sub-sampling COCO in order to understand how mitigation methods work in a controlled setting. We show that SPIRE is more effective at mitigating these SPs than prior methods.
Researcher Affiliation Collaboration Gregory Plumb EMAIL CMU, Marco Tulio Ribeiro EMAIL Microsoft Research, Ameet Talwalkar EMAIL CMU. Carnegie Mellon University is an academic institution, and Microsoft Research is an industry research lab, indicating a collaboration.
Pseudocode Yes Algorithm 1 details the process that we use for adding or removing Spurious from a model s representation. (found in Appendix H.1 SPIRE-R Projection Pseudo Code)
Open Source Code Yes 1Code is available at https://github.com/GDPlumb/SPIRE
Open Datasets Yes model trained to detect tennis rackets on the COCO dataset (Lin et al., 2014), Un Rel (Peyre et al., 2017) and Spatial Sense (Yang et al., 2019), ISIC dataset (Codella et al., 2019)
Dataset Splits Yes Because the test set for this dataset is not publicly available, we used its validation set as our test set and divided its training set into 90-10 training and validation splits. we created a series of controlled training sets of size 2000 by sampling images from the full training set such that P(Main) = P(Spurious) = 0.5 and p = P(Main | Spurious) ranges between 0.025 and 0.975.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments. It only mentions the model architecture (ResNet18) and software framework (PyTorch).
Software Dependencies No All of our experiments started with the pretrained Res Net18 (He et al., 2016) that is available from Py Torch (Paszke et al., 2019). We minimized the binary cross entropy loss using Adam (Kingma & Ba, 2014). The paper mentions PyTorch and Adam but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We minimized the binary cross entropy loss using Adam (Kingma & Ba, 2014) with a batch size of 64. For transfer-learning, we used a learning rate of 0.001 and, for fine-tuning, we used a learning rate of 0.0001; we explored other options during early experiments, but found there was no benefit to doing so. If the training loss failed to decrease sufficiently after some number of epochs, we lowered the learning rate. For SPIRE, we considered both removing objects by covering them with a grey box and by in-painting them; we found that transfer-learning while covering objects with a grey box was the most effective (see Table 9). RRR, CDEP, and GS all have regularization weights that can be tuned. FS has a minimum weight for images of objects out of context that can be tuned. For these methods, we considered values that are powers of 10 ranging from 0.1 to 10,000; no method chose one of the extreme values.