Uncertainty-Based Active Learning for Reading Comprehension
Authors: Jing Wang, Jie Shen, Xiaofei Ma, Andrew Arnold
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate on benchmark datasets that 25% less labeled samples suffice to guarantee comparable, or even improved performance. Our results show strong evidence that for label-demanding scenarios, the proposed approach offers a practical guide on data collection and model training. Section 4 is titled "Experiments" and contains detailed empirical results, tables, and figures comparing performance on datasets. |
| Researcher Affiliation | Collaboration | Jing Wang EMAIL Amazon Jie Shen EMAIL Stevens Institute of Technology Xiaofei Ma EMAIL Amazon Web Services Andrew O. Arnold EMAIL Delphia |
| Pseudocode | Yes | Algorithm 1 Albus: Active Learning By Uncertainty-based Sampling Require: a set of unlabeled instances U = {x1, . . . , xn}, initial MRC model w0, maximum iteration number T, thresholds {τ1, . . . , τT }, the number of instances to be labeled n0. Ensure: A new MRC model w T . 1: U1 U. 2: for t = 1, , T do 3: Compute wt 1(x) for all x Ut. 4: Bt {x Ut : wt 1(x) τt}. 5: Compute the sampling probability Pr(x) for all x Bt. 6: St randomly choose n0 instances from Bt by the distribution {Pr(x)}x Bt, and query their labels. 7: Update the model wt arg minw L(w; St). 8: Ut+1 Ut\St. 9: end for |
| Open Source Code | No | The paper mentions "BERT-base is used as the pretrained model and fine-tuned for 2 epochs with a learning rate of 3e 5 and a batch size of 12, the default setting of Huggingface 3", with a footnote linking to "https://github.com/huggingface/transformers/tree/master/examples/question-answering". This refers to a third-party library used, not the authors' specific implementation code for their proposed algorithm. |
| Open Datasets | Yes | We focus on the span-based datasets, namely Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016) and News QA (Trischler et al., 2017). |
| Dataset Splits | Yes | SQu AD consists of over 100,000 questions posed by crowdworkers on a set of 536 Wikipedia articles. We use the original split: 87,599 questions for training and 10,570 questions for testing. News QA is a machine comprehension dataset of over 100,000 human-generated question-answer pairs from over 10,000 news articles from CNN. The dataset is composed of 74,160 questions for training and 4,212 questions for validation |
| Hardware Specification | No | The paper mentions training parameters and software (BERT-base, Huggingface) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions "BERT-base" as the pretrained model and "Huggingface" for fine-tuning, but does not provide specific version numbers for these or any other software libraries or programming languages used in their implementation. |
| Experiment Setup | Yes | BERT-base is used as the pretrained model and fine-tuned for 2 epochs with a learning rate of 3e 5 and a batch size of 12, the default setting of Huggingface. To ensure a comprehensive comparison among state-of-the-art approaches, we simulate the annotation process with human experts in the loop by selecting a fixed number of examples n0 to query their labels from training set in each iteration (we set n0 = 2, 000 for SQu AD and n0 = 5, 000 for News QA). The MRC model is initialized with 1,000 labeled samples for SQu AD and 10,000 for News QA. The parameter τ0 is chosen from the range of [0.01, 0.1] based on the training set and decreases at the rate of 1.1. |