Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition

Authors: Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we also perform empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across 6 benchmark datasets under various experimental settings. We observe an agreement between our preemptive measure of dataset spuriousness and post-training model generalization metrics such as worst-group accuracy, further supporting our proposition.
Researcher Affiliation Collaboration Barproda Halder EMAIL Department of Electrical and Computer Engineering University of Maryland, College Park [...] Qiuyi Zhang EMAIL Google Research [...] Ilia Sucholutsky EMAIL Department of Computer Science Princeton University
Pseudocode Yes Algorithm 1: Spuriousness Disentangler: An Autoencoder-Based Explainability Framework
Open Source Code Yes The code is available at https://github.com/Barproda/spuriousness-disentangler.
Open Datasets Yes Our evaluation spans six datasets: Waterbird (Wah et al., 2011), Adult (Becker & Kohavi, 1996), Celeb A (Lee et al., 2020), Dominoes (Shah et al., 2020), Spawrious (Lynch et al., 2023), and Colored MNIST (Arjovsky et al., 2019).
Dataset Splits Yes Table 5: Summary of the datasets (Waterbird Train 3,498 184 56 1,057 Validation 467 466 133 133 Test 2,255 2,255 642 642)
Hardware Specification Yes All experiments are executed on NVIDIA RTX A4500.
Software Dependencies No The paper mentions 'DIT package (James et al., 2018)' but does not specify a version number for this or any other software component.
Experiment Setup Yes The hyperparameters are as follows: a batch size of 64, a learning rate of 0.001, a Cosine Annealing LR scheduler, an Adam optimizer with a weight decay of 0.0001, 50 pretraining epochs, followed by 100 epochs of additional training. When fine-tuning Res Net-50 we use the following hyperparameters: batch size of 64, learning rate of 0.0001, Cosine Annealing LR scheduler, stochastic gradient descent (SGD) optimizer with a weight decay of 0.0001, binary cross-entropy as the loss function, and 100 epochs.