Sample-Optimal Agnostic Boosting with Unlabeled Data

Authors: Udaya Ghai, Karan Singh

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the empirical viability of our approach. Table 1 showcases the results from our initial experiments comparing Algorithm 1 with the agnostic boosting method introduced by Kanade & Kalai (2009), herein referred to as the Potential-based Agnostic Booster (PAB). These evaluations were performed on various UCI classification datasets (Sigillito et al., 1989; Hopkins et al., 1999; Smith et al., 1988; Hofmann, 1994; Sejnowski & Gorman, 1988; Breiman & Stone, 1984), employing decision stumps (Pedregosa et al., 2011) as the weak learners.
Researcher Affiliation Collaboration 1Amazon, NYC 2Tepper School of Business, Carnegie Mellon University. Correspondence to: Karan Singh <EMAIL>.
Pseudocode Yes Algorithm 1 Agnostic Boosting with Unlabeled Data
Open Source Code No The paper does not provide any specific statement or link regarding the availability of source code for the methodology described.
Open Datasets Yes These evaluations were performed on various UCI classification datasets (Sigillito et al., 1989; Hopkins et al., 1999; Smith et al., 1988; Hofmann, 1994; Sejnowski & Gorman, 1988; Breiman & Stone, 1984)
Dataset Splits Yes Table 1. 50-fold cross-validated accuracies of the Potential based Agnostic Booster (PAB) (Kanade & Kalai, 2009) and our proposed boosting algorithm on six datasets with 0%, 5%, 10%, and 20% added label noise (during training). Sonar and Ionosphere have 50% of labels dropped while the remaining datasets have 90% of labels dropped.
Hardware Specification Yes Experiments are all run on an M1 Macbook Pro and complete within an hour.
Software Dependencies No employing decision stumps (Pedregosa et al., 2011) as the weak learners. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. The paper mentions scikit-learn for decision stumps but does not specify a version number.
Experiment Setup Yes For PAB, the number of samples that can be fed to a week learner in a round scales inversely with the number of boosting rounds, as the algorithm requires fresh samples each round.As such, we perform a grid search on the number of boosting rounds with T {25, 50, 100}, while we just use 100 for our implementation of Algorithm 1. In both algorithms we search over the parameter m, the number of samples we feed to the weak learner each round with a grid of {5, 20, 50, 100}, though if such a setting is invalid for PAB, we continue until all samples are used.