reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Decade's Battle on Dataset Bias: Are We There Yet?

Authors: Zhuang Liu, Kaiming He

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We revisit the dataset classification experiment suggested by Torralba & Efros (2011) a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures. Surprisingly, we observe that modern neural networks can achieve excellent accuracy in classifying which dataset an image is from: e.g., we report 84.7% accuracy on held-out validation data for the three-way classification problem consisting of the YFCC, CC, and Data Comp datasets.
Researcher Affiliation	Industry	Zhuang Liu Kaiming He Meta AI Research, FAIR
Pseudocode	No	The paper describes methods and processes in paragraph form and tables, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Work done at Meta; now at MIT. Code: github.com/liuzhuang13/bias
Open Datasets	Yes	Examples include YFCC100M (Thomee et al., 2016), CC12M (Changpinyo et al., 2021), and Data Comp-1B (Gadre et al., 2023) the main datasets we study in this paper among many others (Sun et al., 2017; Desai et al., 2021; Srinivasan et al., 2021; Schuhmann et al., 2022).
Dataset Splits	Yes	By default, we randomly sample 1M and 10K images from each dataset as training and validation sets, respectively.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions software components like 'optimizer Adam W', 'randomaug', 'mixup', and 'cutmix' along with their respective citations, and also 'Vi T-B' and 'MAE (He et al., 2022)', but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The complete training recipe is shown in Table 10. config value optimizer Adam W learning rate 1e-3 weight decay 0.3 optimizer momentum β1, β2=0.9, 0.95 batch size 4096 learning rate schedule cosine decay warmup epochs 20 (Image Net-1K) training epochs 300 (Image Net-1K) randomaug (Cubuk et al., 2020) (9, 0.5) label smoothing 0.1 mixup (Zhang et al., 2018b) 0.8 cutmix (Yun et al., 2019) 1.0