reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Authors: Jifan Zhang, Yifang Chen, Gregory Canal, Arnav Mohanty Das, Gantavya Bhatt, Stephen Mussmann, Yinglun Zhu, Jeff Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper addresses this deficiency by introducing Label Bench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of Label Bench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates significantly better label-efficiencies than previously reported in active learning. ... To showcase the power of our framework, we conduct experiments that benchmark multiple deep active learning algorithms in combination with semi-supervised learning and large pretrained models; our experiments reveal especially strong label-efficiency gains from active learning, demonstrating a significant difference from conventional beliefs in existing literature. To highlight some of our results, we observe a more than four-fold reduction (75% savings) in annotation cost over random sampling on CIFAR-10 (Figure 1(a)), a dataset that was believed to be particularly challenging for active learning . This improvement is further demonstrated in our experiments on Image Net (Figure 1(b, c)).
Researcher Affiliation	Academia	Jifan Zhang EMAIL University of Wisconsin, Madison, WI Yifang Chen EMAIL University of Washington, Seattle, WA Gregory Canal EMAIL University of Wisconsin, Madison, WI Arnav M. Das EMAIL University of Washington, Seattle, WA Gantavya Bhatt EMAIL University of Washington, Seattle, WA Stephen Mussmann EMAIL Georgia Institute of Technology, Atlanta, GA Yinglun Zhu EMAIL University of California, Riverside, CA Jeffrey Bilmes EMAIL University of Washington, Seattle, WA Simon S. Du EMAIL University of Washington, Seattle, WA Kevin Jamieson EMAIL University of Washington, Seattle, WA Robert D. Nowak EMAIL University of Wisconsin, Madison, WI
Pseudocode	No	The paper describes the proposed framework and active learning strategies in detail in natural language (Sections 3, 4, and Appendices B, C), but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, figures, or formally structured steps resembling code.
Open Source Code	Yes	Label Bench s modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/Efficient Training/Label Bench.
Open Datasets	Yes	We first test on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009), all of which are standard datasets used in previous AL and Semi-SL papers. To further evaluate Label Bench on more realistic datasets, we also test on i Wild Cam (Beery et al., 2021) and f Mo W (Christie et al., 2018), parts of the WILDS benchmark (Koh et al., 2021).
Dataset Splits	Yes	2. Initial batch of labels. We collect the first batch of labels by sampling uniformly at random. ... Appendix D. Hyper-parameter tuning: For each dataset, we utilize a separate validation set, typically with size around 10% of the training pool. We begin the process by adjusting the hyper-parameters on a subset of the training data, which is randomly queried and constitutes around 10% of the total training pool.
Hardware Specification	Yes	Table 1: Estimated cost of neural network training for Image Net experiments when collecting 600,000 labels with 20 iterations (batches of 30,000 labels per iteration). Here we display the total cost of running 12 trials with CLIP Vi T-B32 and Flex Match Semi-SL training (Zhang et al., 2021a). All AWS dollars are based on on-demand rates of EC2 P3 instances.
Software Dependencies	No	The paper mentions various active learning algorithms and semi-supervised learning methods, as well as pre-trained models like CLIP and CoCa, but it does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup	Yes	4.1 Experiment Setup: Here we detail our benchmark s specific choices of AL strategies, large pretrained models, and Semi-SL methods. ... 1. Initial large pretrained model. We use pretrained CLIP (Radford et al., 2021) and Co Ca (Yu et al., 2022) with the Vi T-B32 architecture as image encoders. For end-to-end fine-tuning, we attach the image encoder with a zero-shot prediction linear head. On the other hand, proxy models are initialized with random weights. Throughout our experiments, shallow networks have a single hidden layer with the same dimension as the embeddings. 2. Initial batch of labels. We collect the first batch of labels by sampling uniformly at random. 3. Adaptive annotation loop. We iterate over the following steps to annotate batches of examples. Model training. At the beginning of each iteration, the dataset is partially labeled. We use Semi-SL techniques to fine-tune the vision transformer or train the proxy model from scratch. ... Appendix D. Hyper-parameter tuning: For each dataset, we utilize a separate validation set, typically with size around 10% of the training pool. We begin the process by adjusting the hyper-parameters on a subset of the training data, which is randomly queried and constitutes around 10% of the total training pool. The selection of hyper-parameters is mainly based on the criterion of achieving the highest accuracy on the validation set.