Active Learning for Cost-Sensitive Classification

Authors: Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé III, John Langford

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets. Keywords: Active Learning, Cost-sensitive Learning, Structured Prediction, Statistical Learning Theory, Oracle-based Algorithms. ... Experimentally, we show that COAL substantially outperforms the passive learning baseline with orders of magnitude savings in the labeling effort on a number of hierarchical classification datasets (see Figure 1 for comparison between passive learning and COAL on Reuters text categorization). ... We now turn to an empirical evaluation of COAL. For further computational efficiency, we implemented an approximate version of COAL using: 1) a relaxed version space Gi(y) {g G | b Ri(g; y) b Ri(gi,y; y) + i}, which does not enforce monotonicity, and 2) online optimization, based on online linear least-squares regression. The algorithm processes the data in one pass, and the idea is to (1) replace gi,y, the ERM, with an approximation go i,y obtained by online updates, and (2) compute the minimum and maximum costs via a sensitivity analysis of the online update. We describe this algorithm in detail in Subsection 7.1. Then, we present our experimental results, first for simulated active learning (Subsection 7.2) and then for learning to search for joint prediction (Subsection 7.3).
Researcher Affiliation Collaboration Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011 ... Alekh Agarwal EMAIL Microsoft Research Redmond, WA 98052 ... Tzu-Kuo Huang EMAIL Uber Advanced Technology Center Pittsburgh, PA 15201 ... Hal Daum e III EMAIL Microsoft Research New York, NY 10011 ... John Langford EMAIL Microsoft Research New York, NY 10011
Pseudocode Yes Algorithm 1 Cost Overlapped Active Learning (COAL) ... Algorithm 2 Max Cost
Open Source Code Yes Our code is publicly available as part of the Vowpal Wabbit machine learning library.3 ... 3. http://hunch.net/~vw
Open Datasets Yes We performed simulated active learning experiments with three datasets. Image Net 20 and 40 are sub-trees of the Image Net hierarchy covering the 20 and 40 most frequent classes... The third, RCV1-v2 (Lewis et al., 2004), is a multilabel textcategorization dataset... 3. http://hunch.net/~vw ... RCV1-v2 (Lewis et al., 2004). Data available at http://www.jmlr.org/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm.
Dataset Splits No The paper mentions: "We randomly permute the training data 100 times and make one pass through the training set with each parameter setting." This describes how the training data was processed, but it does not specify how the original datasets were split into training, validation, and test sets, nor does it provide absolute counts or percentages for these splits.
Hardware Specification No The paper does not explicitly describe the hardware used for running the experiments. It discusses experimental procedures and mentions using Vowpal Wabbit but provides no details on specific CPU, GPU, or other computational resources.
Software Dependencies No Our code is publicly available as part of the Vowpal Wabbit machine learning library.3 ... We use the cost-sensitive one-against-all (csoaa) implementation in Vowpal Wabbit5... The paper mentions "Vowpal Wabbit" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes There are two tuning parameters in our implementation. First, instead of i, we set the radius of the version space to i = κνi 1 i 1 (i.e. the log(n) term in the definition of νn is replaced with log(i)) and instead tune the constant κ. This alternate mellowness parameter controls how aggressive the query strategy is. The second parameter is the learning rate used by online linear regression6. For all experiments, we show the results obtained by the best learning rate for each mellowness on each dataset, which is tuned as follows. We randomly permute the training data 100 times and make one pass through the training set with each parameter setting. ... The best learning rates for different datasets and mellowness settings are in Table 2. ... We choose the mellowness by visual inspection for the baselines and use 0.01 for COAL8.