Common Sense Bias Modeling for Classification Tasks
Authors: Miao Zhang, Zee Fryer, Ben Colman, Ali Shahriyari, Gaurav Bharaj
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Downstream experiments show that our method uncovers novel model biases in multiple image benchmark datasets. Furthermore, the discovered bias can be mitigated by simple data re-weighting to de-correlate the features, outperforming state-of-the-art unsupervised bias mitigation methods. |
| Researcher Affiliation | Collaboration | 1New York University 2Reality Defender Inc. |
| Pseudocode | No | The paper states: "Detailed algorithm steps are in the appendix (Zhang et al. 2024a)." However, no pseudocode or algorithm blocks are provided within the main body of the paper. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Celeb A-Dialog (Jiang et al. 2021) is a visual-language dataset including captions for images in Celeb A (Liu et al. 2015), a popular face recognition benchmark dataset containing 40 facial attributes. MS-COCO 2014 (Lin et al. 2014) is a large-scale scene image datasets including annotations for 80 objects, and each image has 5 descriptive captions generated independently (Chen et al. 2015). |
| Dataset Splits | Yes | For the downstream training, we use the same training, validation, and testing split as in (Zhang et al. 2022b) for Celeb A dataset, and as in (Misra et al. 2016) for MS-COCO dataset. However, because of label scarcity of individual objects relative to the entire dataset (for example, only 3% of images in MS-COCO contain Cat and 2% of images contain Kite ), to avoid class imbalance problem introducing confounding bias to the classifier, we randomly take subsets of the dataset splits which include 50% of each target label. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions using "spa Cy en core web sm model" and "Universal Sentence Encoder" but does not specify their version numbers or other software dependencies with version numbers. |
| Experiment Setup | Yes | For correlation reasoning, the noun chuncks from the description corpus are encoded as vectors in 512 dimensions using the Universal Sentence Encoder. Dimension reduction is performed using PCA before clustering, ensuring that the sum of the PCA components accounts for a high amount of variance (>90%). The distance criterion for generating feature clusters is set to z = 1.0; see ablation sections for sensitivity analysis. We use Chi-square test (Pearson 1900) to verify significance of derived correlation coefficients. For the downstream training, we use the same training, validation, and testing split as in (Zhang et al. 2022b) for Celeb A dataset, and as in (Misra et al. 2016) for MS-COCO dataset. Experiments are based on three independent runs which provide the mean and standard deviation results. |