reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Confidence Scores and Thresholds in Pseudolabeling-based SSL

Authors: Harit Vishwakarma, Yi Chen, Satya Sai Srinath Namburi Gnvv, Sui Jiet Tay, Ramya Korlakai Vinayak, Frederic Sala

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that, by integrating this framework with modern SSL methods, we achieve significant improvements in accuracy and training efficiency. In addition, we provide novel insights on the tradeoffs between the choices of the error parameter and the end model s performance. ... 5. Experiments: We conduct empirical evaluation over several settings to, C1. Verify that the adaptations of popular pseudolabelingbased SSL methods with Pab LO output models with better test accuracy. C2. Study the effects of choice of error tolerance ϵ on test accuracy of the final model. C3. Understand the role of pseudolabel accumulation in our method and baselines.
Researcher Affiliation	Collaboration	1Department of Computer Sciences, University of Wisconsin-Madison, WI, USA 2Department of Electrical and Computer Engineering, University of Wisconsin-Madison, WI, USA 3GE Health Care 4NYU Courant Institute. Correspondence to: Harit Vishwakarma <EMAIL>, Yi Chen <EMAIL>.
Pseudocode	Yes	The detailed steps are outlined in Algorithm 3 in the Appendix. ... B. Detailed Algorithms: Algorithm 1 Estimate Pseudolabeling Thresholds Classwise; Algorithm 2 Estimate Pseudolabeling Threshold Jointly for All Classes; Algorithm 3 Pseudolabeling Based SSL with Pab LO
Open Source Code	Yes	First, we briefly describe the experimental setup, with details deferred to Appendix C. The code is available on Git Hub 1. 1https://github.com/harit7/Pab LO-SSL
Open Datasets	Yes	We experiment with three datasets: CIFAR-10 (Krizhevsky et al., 2009) is an image dataset with 10 classes. CIFAR-100 (Krizhevsky et al., 2009) is an extended version of CIFAR-10 with 100 classes. SVHN (Netzer et al., 2011) is a 10-class image dataset of digits from Google Street View.
Dataset Splits	Yes	Table 1. Details of the dataset we use in our experiments. k is the number of classes. Nl is the number of labeled data points used for training the backbone model h. nu is the number of unlabelled data points used for consistency regularization and pseudolabeling for all the methods. Nval is the number of points used for model selection in all methods. Ntest is the number of test data points. Ncal is the number of points used for learning the g function. Nth is the number of data points used for threshold estimation. ... Unless otherwise mentioned, we use Nl as 250 for CIFAR-10 and SVHN and 2500 for CIFAR-100 in our experiments.
Hardware Specification	Yes	We ran all of our experiments on a high-throughput system with various GPUs. Therefore, each individual experiment task may be scheduled among NVIDIA A100 SXM4-40GB, NVIDIA A100 SXM4-80GB, NVIDIA L40, and NVIDIA H100 80GB HBM3. We measured the runtime of our algorithm on a desktop with a single NVIDIA RTX 4090.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies. It mentions 'The code is available on Git Hub' but doesn't list explicit software versions.
Experiment Setup	Yes	Table 10. Hyperparameters used for our method. Learning g function: optimizer SGD learning rate 0.01 batch size 64 max epoch 500 weight decay 0.01 momentum 0.9. Estimating t: optimizer SGD learning rate 0.01 batch size 64 max epoch 500 weight decay 0.01 momentum 0.9.