An Effective Theory of Bias Amplification

Authors: Arjun Subramonian, Samuel Bell, Levent Sagun, Elvis Dohmatob

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we observe that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be differences in test error between groups that are not alleviated with increased parameterization. Importantly, our theoretical predictions align with empirical observations reported in the literature on machine learning bias. We extensively empirically validate our theory on synthetic and semi-synthetic datasets.
Researcher Affiliation Collaboration 1UCLA 2Meta FAIR 3Concordia University 4Mila
Pseudocode No The paper contains mathematical derivations and theoretical frameworks but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing code or links to a code repository.
Open Datasets Yes We extensively empirically validate our theory on synthetic and semi-synthetic datasets. Specifically, we show that our theory aligns with practice in the cases of: (1) bias amplification with synthetic data generated from isotropic covariance matrices and the semi-synthetic dataset Colored MNIST (Arjovsky et al., 2019)
Dataset Splits Yes Train-test split. Colored MNIST has a total of 60k instances. Each image is 28 28 3 pixels. We use the prescribed 0.67-0.33 train-test split.
Hardware Specification Yes We run all experiments on a single NVIDIA L40S.
Software Dependencies No The paper mentions using the Adam optimizer but does not specify any software libraries or versions used for implementation.
Experiment Setup Yes We train each model with a batch size of 250 for a single epoch with respect to groups (i.e., 80 training steps given there are two groups). We use a cross-entropy loss and the Adam optimizer with learning rate 0.01. We report our results over 10 random seeds.