reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Navigating Towards Fairness with Data Selection

Authors: Yixuan Zhang, Zhidong Li, Yang Wang, Fang Chen, Xuhui Fan, Feng Zhou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive empirical evaluations on several benchmark datasets. The experiments on the image classification tasks demonstrate the effectiveness of our proposed data selection principle, which can adaptively select fair instances that are less impacted by label bias. [...] In the subsequent sections, we first describe our experimental setup, covering datasets, baselines, and evaluation metrics. Next, we compare our methods against existing stateof-the-art data selection techniques across various image classification tasks (Celeb Faces Attributes (Celeb A) (Liu et al. 2015) and modified Labeled Faces in the Wild Home (LFW+a) (Wolf, Hassner, and Taigman 2011)), considering different amounts of label bias. We examine our selection criteria through detailed ablation studies. Benchmark Datasets. We evaluate the performance of our proposed method using two image datasets: Celeb A and LFW+a. [...] Results are displayed in Table 1 for 20% and 40% bias amount. In the meantime, we report the fairness measure using p%-rule and DEO in Table 2.
Researcher Affiliation	Academia	1School of Statistics and Data Science, Southeast University, China 2Data Science Institute, University of Technology Sydney, Australia 3School of Computing, Macquarie University, Australia 4Center for Applied Statistics and School of Statistics, Renmin University of China, China 5Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Fair data selection to address label bias issue 1: Input: training set D, Nb, NB, T, α, γ, zero-shot predictor f and a target model fθ. 2: Initialize θ0. 3: for t in 0, , T do 4: Randomly select NB instances to construct Bt+1; 5: For each sample (xi, yi, si) in Bt+1, estimate and compute the objective in Eq. (13); 6: Select top-Nb samples to construct bt+1; 7: Drop instances from bt+1 if Cs,z > E[Cs,z], otherwise, bootstrap; 8: Perform gradient descent and update θ with resampled data. 9: end for
Open Source Code	No	The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets	Yes	Benchmark Datasets. We evaluate the performance of our proposed method using two image datasets: Celeb A and LFW+a. [...] Celeb Faces Attributes (Celeb A) (Liu et al. 2015) and modified Labeled Faces in the Wild Home (LFW+a) (Wolf, Hassner, and Taigman 2011)
Dataset Splits	No	Each dataset is divided into training, validation, and test sets. The paper mentions this division but does not specify the exact percentages, sample counts, or the methodology used for creating these splits. It also does not reference a standard or predefined split.
Hardware Specification	Yes	All experiments are performed with GPUs (NVIDIA Ge Force RTX 3090 with 86GB memory).
Software Dependencies	No	The paper mentions optimizers (Adam W) and model architectures (ResNet-18, ResNet-50, DenseNet-121, CLIP-RN50, ViT-B/16) but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Setup. In our experiments addressing label bias, we introduce symmetrical label biases of 20% and 40%. We use the Adam W optimizer (learning rate 0.001, weight decay 0.01), we set a batch size of Nb = 32 and a batch ratio Nb/NB = 0.1, consistent with the RHO-LOSS setup. For the LFW+a dataset, we employ Res Net-18 (He et al. 2016), and for Celeb A, we use Res Net-50 across all methods, along with a zero-shot predictor based on CLIP-RN50. We vary α and γ within the set {0.1, 0.3, 0.5, 0.7, 0.9}. Results are averaged over three random trials.