reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Effective Theory of Bias Amplification

Authors: Arjun Subramonian, Samuel Bell, Levent Sagun, Elvis Dohmatob

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we observe that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be differences in test error between groups that are not alleviated with increased parameterization. Importantly, our theoretical predictions align with empirical observations reported in the literature on machine learning bias. We extensively empirically validate our theory on synthetic and semi-synthetic datasets.
Researcher Affiliation	Collaboration	1UCLA 2Meta FAIR 3Concordia University 4Mila
Pseudocode	No	The paper contains mathematical derivations and theoretical frameworks but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository.
Open Datasets	Yes	We extensively empirically validate our theory on synthetic and semi-synthetic datasets. Specifically, we show that our theory aligns with practice in the cases of: (1) bias amplification with synthetic data generated from isotropic covariance matrices and the semi-synthetic dataset Colored MNIST (Arjovsky et al., 2019)
Dataset Splits	Yes	Train-test split. Colored MNIST has a total of 60k instances. Each image is 28 28 3 pixels. We use the prescribed 0.67-0.33 train-test split.
Hardware Specification	Yes	We run all experiments on a single NVIDIA L40S.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not specify any software libraries or versions used for implementation.
Experiment Setup	Yes	We train each model with a batch size of 250 for a single epoch with respect to groups (i.e., 80 training steps given there are two groups). We use a cross-entropy loss and the Adam optimizer with learning rate 0.01. We report our results over 10 random seeds.