reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

Authors: Lu Han, Han-Jia Ye, De-Chuan Zhan

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method achieves steady improvement over supervised baseline and state-of-the-art performance under all class mismatch ratios on different benchmarks. ... Experiments on different SSL benchmarks empirically validate the effectiveness of our model.
Researcher Affiliation	Academia	Lu Han EMAIL State Key Laboratory for Novel Software Technology, Nanjing University Han-Jia Ye EMAIL State Key Laboratory for Novel Software Technology, Nanjing University De-Chuan Zhan EMAIL State Key Laboratory for Novel Software Technology, Nanjing University
Pseudocode	Yes	Algorithm 1 Υ-Model algorithm
Open Source Code	No	The paper does not provide concrete access to source code. It only mentions 'Reviewed on Open Review: https: // openreview. net/ forum? id= t LG26Qxo D8' which is a review platform and does not host code.
Open Datasets	Yes	CIFAR10 (6/4) : created from CIFAR10 (Krizhevsky & Hinton, 2009). ... CIFAR100 (50/50): created from CIFAR100 (Krizhevsky & Hinton, 2009). ... Tiny Image Net (100/100): created from Tiny Image Net, which is a subset of Image Net (Deng et al., 2009) ... Image Net100 (50/50): created from the 100 class subset of Image Net (Deng et al., 2009).
Dataset Splits	Yes	CIFAR10 (6/4) : ... We select 400 labeled samples for each ID class and totally 20,000 unlabeled samples from ID and OOD classes. SVHN (6/4): ... We select 100 labeled samples for each ID class and totally 20,000 unlabeled samples. CIFAR100 (50/50): ... We select 100 labeled samples for each ID class and a total of 20,000 unlabeled samples. Tiny Image Net (100/100): ... We select 100 labeled samples for each ID class and 40,000 unlabeled samples. Image Net100 (50/50): ... We select 100 labeled samples for each ID class and a total of 20,000 unlabeled samples.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. It only mentions using "Wide-Res Net-28-2" as the backbone.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions "Adam as the optimization algorithm" but does not specify the version of Adam or the framework/library used (e.g., PyTorch, TensorFlow, Scikit-learn).
Experiment Setup	Yes	For each epoch, we iterate over the unlabeled set and random sample labeled data, each unlabeled and labeled mini-batch contains 128 samples. We adopt Adam as the optimization algorithm with the initial learning rate 3 10 3 and train for 400 epochs. ... We first train a classification model only on labeled data for 100 epochs without RPL and SEC. We update pseudo-labels every 2 epochs. For both datasets, we set τ = 0.95, γ = 0.3,K = 4.