reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Probability-Density-aware Semi-supervised Learning

Authors: Shuyang Liu, Ruiqiu Zheng, Yunhang Shen, Zhou Yu, Ke Li, Xing Sun, Shaohui Lin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that PMLP achieves outstanding performance compared with other recent methods. Extensive experiments are conducted to validate the effectiveness of PMLP. For example, on CIFAR100 with 400 labeled samples, we can surpass the baseline model by 3.52% and surpass the second-best algorithm by 1.21%.
Researcher Affiliation	Collaboration	1 East China Normal University 2 Youtu Lab, Tencent, Shanghai, China 3 Laboratory of Advanced Theory and Application in Statistics and Data Science MOE, China
Pseudocode	Yes	Algorithm 1: The algorithm for density-aware label propagation
Open Source Code	Yes	1 Code: https://github.com/sdagfgaf/Probability-Densityaware-Semi-supervised-Learning
Open Datasets	Yes	1 We present comprehensive experiments of PMLP across extensive datasets, including SVHN (Krizhevsky and Hinton 2009), CIFAR10 (Krizhevsky and Hinton 2009), and CIFAR100 (Netzer et al. 2011), STL-10 (Adam Coates 2011).
Dataset Splits	Yes	Following the standard protocol of SSL (Zhao et al. 2022; Zheng et al. 2022; Wang et al. 2022; Chen et al. 2023; Yang et al. 2023), we randomly select 40 and 250 labeled samples from SVHN and CIFAR10. For CIFAR100, 400 and 2, 500 labeled samples are randomly selected. For STL-10, we select 40, 1000 labeled samples.
Hardware Specification	No	The KDE from Sklearn runs only on the CPU, leading to slower calculations. We use the exponential kernel K( ) in KDE to promote the divergence and design a GPU-based KDE, which can reach 5 acceleration Tab. 2. Explanation: The paper mentions "GPU-based KDE" but does not specify any particular GPU model, CPU, or other hardware specifications.
Software Dependencies	No	The KDE from Sklearn runs only on the CPU, leading to slower calculations. Explanation: The paper mentions "Sklearn" but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup	Yes	For SVHN and CIFAR10, we use Wide Res Net-28-2 as the encoder to generate representations. For CIFAR100, encoder is Wide Res Net-28-8. For STL-10, encoder Wide Res Net-37-2. The predictor is a one-linear layer network. In the CACL, we calculate zi with a 2-layer MLP projector. The batch size includes 64 unlabeled and 448 labeled points. In the label-propagation process, we select 1.5 N nearest neighbors, where N represents the dataset category. In PMLP, we use α = 0.8, η = 0.2, and τ = 0.95. In CACL, ϵ is set to 0.7. Optimization is performed using the SGD optimizer with a momentum of 0.9 and a weight decay of 5 10 4. The learning rate follows a cosine decay schedule. For KDE, we chose a bandwidth of h = 5 for SVHN, CIFAR10, and CIFAR100, and h = 3 for STL-10. In KDE, we select 512 points for CIFAR10 and SVHN, and the nearest 45 points for CIFAR100 and STL-10.