Bi-Level Optimization for Semi-Supervised Learning with Pseudo-Labeling

Authors: Marzi Heidari, Yuhong Guo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of the proposed approach, we conduct extensive experiments on multiple SSL benchmarks. The experimental results show the proposed BOPL outperforms the state-of-the-art SSL techniques.
Researcher Affiliation Academia Marzi Heidari1, Yuhong Guo1,2 1School of Computer Science, Carleton University, Ottawa, Canada 2CIFAR AI Chair, Amii, Canada EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Training Algorithm for BOPL
Open Source Code No The paper does not explicitly provide a link to source code or state that code has been made available in supplementary materials or a public repository.
Open Datasets Yes We conducted comprehensive experiments on four commonly used image classification benchmarks: CIFAR-10, CIFAR-100 (Krizhevsky, Hinton et al. 2009), SVHN (Netzer et al. 2011) and STL-10 (Coates, Ng, and Lee 2011).
Dataset Splits Yes We conducted experiments on CIFAR-10 with 250, 1,000, 2,000, and 4,000 labeled samples, on CIFAR-100 with 2,500, 4,000, and 10,000 labeled samples, on SVHN with 1000 and 500 labeled samples, and on STL-10 with 1,000 images as the labeled data.
Hardware Specification No The paper mentions different backbone networks (e.g., WRN-28-2, WRN-28-8) and optimizers, but does not specify any particular hardware (e.g., GPU, CPU models) used for the experiments.
Software Dependencies No The paper mentions using optimizers like SGD and techniques like cosine learning rate annealing, but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with their version numbers.
Experiment Setup Yes For training CNN-13, we employed the SGD optimizer with a Nesterov momentum (Nesterov 1983) of 0.9, an L2 regularization coefficient of 1e-4 for CIFAR-10 and CIFAR-100 datasets and 5e-5 for SVHN, and an initial learning rate α of 0.1. ... For the WRN-28-2 model, the training configuration includes the SGD optimizer, an L2 regularization coefficient of 5e-4, and an initial learning rate of 0.01. ... Specifically for BOPL, we set the batch size to 128, λ = 1e-2, ϵ = 1e-2, γ = 0.5, β = 0.999, and η = 1. We pre-train the model for 50 epochs using the Mean-Teacher algorithm and then proceed to train BOPL for 400 epochs. Finally, we fine-tune the model for 100 epochs using both the labeled data and the unlabeled data with learned pseudo-labels.