reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

REDUCR: Robust Data Downsampling using Class Priority Reweighting

Authors: William Bankes, George Hughes, Ilija Bogunovic, Zi Wang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present empirical results to showcase the performance of REDUCR on large-scale vision and text classification tasks. REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%.
Researcher Affiliation	Collaboration	William Bankes Department of Computer Science University College London EMAIL George Hughes Department of Computer Science University College London Ilija Bogunovic Department of Electrical Engineering University College London EMAIL Zi Wang Google Deep Mind EMAIL
Pseudocode	Yes	Algorithm 1 REDUCR for robust online batch selection
Open Source Code	Yes	Code available at: https://github.com/williambankes/REDUCR.
Open Datasets	Yes	We use CIFAR10 [Krizhevsky et al., 2012], CINIC10 [Darlow et al., 2018], Clothing1M [Xiao et al., 2015], the Multi-Genre Natural Language Interface (MNLI), and the Quora Question Pairs (QQP) datasets from the GLUE NLP benchmark [Wang et al., 2019]. The Image datasets were sourced from pytorch via the torchvision datasets package https://pytorch.org/vision/stable/datasets.html, the NLP datasets were sourced from huggingface, https://huggingface.co/datasets/nyu-mll/glue.
Dataset Splits	Yes	Each dataset is split into a labelled training, validation and test dataset (for details see Appendix A.5), the validation dataset is used to train the class-irreducible loss models and evaluate the class-holdout loss during training.
Hardware Specification	Yes	All models were trained on GCP NVIDIA Tesla T4 GPUs.
Software Dependencies	No	The networks are optimised with Adam W [Loshchilov and Hutter, 2019] and the default Pytorch hyperparameters are used for all methods except CINIC10 for which the weight decay is set to a value of 0.1. For the NLP dataset we use the bert-base-uncased [Devlin et al., 2019] model from Hugging Face [Wolf et al., 2020]
Experiment Setup	Yes	Unless stated otherwise 10% of batch Bt is selected as the small batch bt, and we set η = 1e 4. γ = 9 is used when training each of the amortised class-irreducible loss models on the vision datasets and γ = 4 for the NLP datasets. For full details of the experimental setup see Appendix A.5.