Sample-Optimal Agnostic Boosting with Unlabeled Data
Authors: Udaya Ghai, Karan Singh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the empirical viability of our approach. Table 1 showcases the results from our initial experiments comparing Algorithm 1 with the agnostic boosting method introduced by Kanade & Kalai (2009), herein referred to as the Potential-based Agnostic Booster (PAB). These evaluations were performed on various UCI classification datasets (Sigillito et al., 1989; Hopkins et al., 1999; Smith et al., 1988; Hofmann, 1994; Sejnowski & Gorman, 1988; Breiman & Stone, 1984), employing decision stumps (Pedregosa et al., 2011) as the weak learners. |
| Researcher Affiliation | Collaboration | 1Amazon, NYC 2Tepper School of Business, Carnegie Mellon University. Correspondence to: Karan Singh <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Agnostic Boosting with Unlabeled Data |
| Open Source Code | No | The paper does not provide any specific statement or link regarding the availability of source code for the methodology described. |
| Open Datasets | Yes | These evaluations were performed on various UCI classification datasets (Sigillito et al., 1989; Hopkins et al., 1999; Smith et al., 1988; Hofmann, 1994; Sejnowski & Gorman, 1988; Breiman & Stone, 1984) |
| Dataset Splits | Yes | Table 1. 50-fold cross-validated accuracies of the Potential based Agnostic Booster (PAB) (Kanade & Kalai, 2009) and our proposed boosting algorithm on six datasets with 0%, 5%, 10%, and 20% added label noise (during training). Sonar and Ionosphere have 50% of labels dropped while the remaining datasets have 90% of labels dropped. |
| Hardware Specification | Yes | Experiments are all run on an M1 Macbook Pro and complete within an hour. |
| Software Dependencies | No | employing decision stumps (Pedregosa et al., 2011) as the weak learners. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. The paper mentions scikit-learn for decision stumps but does not specify a version number. |
| Experiment Setup | Yes | For PAB, the number of samples that can be fed to a week learner in a round scales inversely with the number of boosting rounds, as the algorithm requires fresh samples each round.As such, we perform a grid search on the number of boosting rounds with T {25, 50, 100}, while we just use 100 for our implementation of Algorithm 1. In both algorithms we search over the parameter m, the number of samples we feed to the weak learner each round with a grid of {5, 20, 50, 100}, though if such a setting is invalid for PAB, we continue until all samples are used. |