Uncertainty-Based Active Learning for Reading Comprehension

Authors: Jing Wang, Jie Shen, Xiaofei Ma, Andrew Arnold

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate on benchmark datasets that 25% less labeled samples suffice to guarantee comparable, or even improved performance. Our results show strong evidence that for label-demanding scenarios, the proposed approach offers a practical guide on data collection and model training. Section 4 is titled "Experiments" and contains detailed empirical results, tables, and figures comparing performance on datasets.
Researcher Affiliation Collaboration Jing Wang EMAIL Amazon Jie Shen EMAIL Stevens Institute of Technology Xiaofei Ma EMAIL Amazon Web Services Andrew O. Arnold EMAIL Delphia
Pseudocode Yes Algorithm 1 Albus: Active Learning By Uncertainty-based Sampling Require: a set of unlabeled instances U = {x1, . . . , xn}, initial MRC model w0, maximum iteration number T, thresholds {τ1, . . . , τT }, the number of instances to be labeled n0. Ensure: A new MRC model w T . 1: U1 U. 2: for t = 1, , T do 3: Compute wt 1(x) for all x Ut. 4: Bt {x Ut : wt 1(x) τt}. 5: Compute the sampling probability Pr(x) for all x Bt. 6: St randomly choose n0 instances from Bt by the distribution {Pr(x)}x Bt, and query their labels. 7: Update the model wt arg minw L(w; St). 8: Ut+1 Ut\St. 9: end for
Open Source Code No The paper mentions "BERT-base is used as the pretrained model and fine-tuned for 2 epochs with a learning rate of 3e 5 and a batch size of 12, the default setting of Huggingface 3", with a footnote linking to "https://github.com/huggingface/transformers/tree/master/examples/question-answering". This refers to a third-party library used, not the authors' specific implementation code for their proposed algorithm.
Open Datasets Yes We focus on the span-based datasets, namely Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016) and News QA (Trischler et al., 2017).
Dataset Splits Yes SQu AD consists of over 100,000 questions posed by crowdworkers on a set of 536 Wikipedia articles. We use the original split: 87,599 questions for training and 10,570 questions for testing. News QA is a machine comprehension dataset of over 100,000 human-generated question-answer pairs from over 10,000 news articles from CNN. The dataset is composed of 74,160 questions for training and 4,212 questions for validation
Hardware Specification No The paper mentions training parameters and software (BERT-base, Huggingface) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions "BERT-base" as the pretrained model and "Huggingface" for fine-tuning, but does not provide specific version numbers for these or any other software libraries or programming languages used in their implementation.
Experiment Setup Yes BERT-base is used as the pretrained model and fine-tuned for 2 epochs with a learning rate of 3e 5 and a batch size of 12, the default setting of Huggingface. To ensure a comprehensive comparison among state-of-the-art approaches, we simulate the annotation process with human experts in the loop by selecting a fixed number of examples n0 to query their labels from training set in each iteration (we set n0 = 2, 000 for SQu AD and n0 = 5, 000 for News QA). The MRC model is initialized with 1,000 labeled samples for SQu AD and 10,000 for News QA. The parameter τ0 is chosen from the range of [0.01, 0.1] based on the training set and decreases at the rate of 1.1.