Active Learning for Cost-Sensitive Classification
Authors: Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé III, John Langford
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets. Keywords: Active Learning, Cost-sensitive Learning, Structured Prediction, Statistical Learning Theory, Oracle-based Algorithms. ... Experimentally, we show that COAL substantially outperforms the passive learning baseline with orders of magnitude savings in the labeling effort on a number of hierarchical classification datasets (see Figure 1 for comparison between passive learning and COAL on Reuters text categorization). ... We now turn to an empirical evaluation of COAL. For further computational efficiency, we implemented an approximate version of COAL using: 1) a relaxed version space Gi(y) {g G | b Ri(g; y) b Ri(gi,y; y) + i}, which does not enforce monotonicity, and 2) online optimization, based on online linear least-squares regression. The algorithm processes the data in one pass, and the idea is to (1) replace gi,y, the ERM, with an approximation go i,y obtained by online updates, and (2) compute the minimum and maximum costs via a sensitivity analysis of the online update. We describe this algorithm in detail in Subsection 7.1. Then, we present our experimental results, first for simulated active learning (Subsection 7.2) and then for learning to search for joint prediction (Subsection 7.3). |
| Researcher Affiliation | Collaboration | Akshay Krishnamurthy EMAIL Microsoft Research New York, NY 10011 ... Alekh Agarwal EMAIL Microsoft Research Redmond, WA 98052 ... Tzu-Kuo Huang EMAIL Uber Advanced Technology Center Pittsburgh, PA 15201 ... Hal Daum e III EMAIL Microsoft Research New York, NY 10011 ... John Langford EMAIL Microsoft Research New York, NY 10011 |
| Pseudocode | Yes | Algorithm 1 Cost Overlapped Active Learning (COAL) ... Algorithm 2 Max Cost |
| Open Source Code | Yes | Our code is publicly available as part of the Vowpal Wabbit machine learning library.3 ... 3. http://hunch.net/~vw |
| Open Datasets | Yes | We performed simulated active learning experiments with three datasets. Image Net 20 and 40 are sub-trees of the Image Net hierarchy covering the 20 and 40 most frequent classes... The third, RCV1-v2 (Lewis et al., 2004), is a multilabel textcategorization dataset... 3. http://hunch.net/~vw ... RCV1-v2 (Lewis et al., 2004). Data available at http://www.jmlr.org/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm. |
| Dataset Splits | No | The paper mentions: "We randomly permute the training data 100 times and make one pass through the training set with each parameter setting." This describes how the training data was processed, but it does not specify how the original datasets were split into training, validation, and test sets, nor does it provide absolute counts or percentages for these splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running the experiments. It discusses experimental procedures and mentions using Vowpal Wabbit but provides no details on specific CPU, GPU, or other computational resources. |
| Software Dependencies | No | Our code is publicly available as part of the Vowpal Wabbit machine learning library.3 ... We use the cost-sensitive one-against-all (csoaa) implementation in Vowpal Wabbit5... The paper mentions "Vowpal Wabbit" but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | There are two tuning parameters in our implementation. First, instead of i, we set the radius of the version space to i = κνi 1 i 1 (i.e. the log(n) term in the definition of νn is replaced with log(i)) and instead tune the constant κ. This alternate mellowness parameter controls how aggressive the query strategy is. The second parameter is the learning rate used by online linear regression6. For all experiments, we show the results obtained by the best learning rate for each mellowness on each dataset, which is tuned as follows. We randomly permute the training data 100 times and make one pass through the training set with each parameter setting. ... The best learning rates for different datasets and mellowness settings are in Table 2. ... We choose the mellowness by visual inspection for the baselines and use 0.01 for COAL8. |