Bi-Level Optimization for Semi-Supervised Learning with Pseudo-Labeling
Authors: Marzi Heidari, Yuhong Guo
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of the proposed approach, we conduct extensive experiments on multiple SSL benchmarks. The experimental results show the proposed BOPL outperforms the state-of-the-art SSL techniques. |
| Researcher Affiliation | Academia | Marzi Heidari1, Yuhong Guo1,2 1School of Computer Science, Carleton University, Ottawa, Canada 2CIFAR AI Chair, Amii, Canada EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Training Algorithm for BOPL |
| Open Source Code | No | The paper does not explicitly provide a link to source code or state that code has been made available in supplementary materials or a public repository. |
| Open Datasets | Yes | We conducted comprehensive experiments on four commonly used image classification benchmarks: CIFAR-10, CIFAR-100 (Krizhevsky, Hinton et al. 2009), SVHN (Netzer et al. 2011) and STL-10 (Coates, Ng, and Lee 2011). |
| Dataset Splits | Yes | We conducted experiments on CIFAR-10 with 250, 1,000, 2,000, and 4,000 labeled samples, on CIFAR-100 with 2,500, 4,000, and 10,000 labeled samples, on SVHN with 1000 and 500 labeled samples, and on STL-10 with 1,000 images as the labeled data. |
| Hardware Specification | No | The paper mentions different backbone networks (e.g., WRN-28-2, WRN-28-8) and optimizers, but does not specify any particular hardware (e.g., GPU, CPU models) used for the experiments. |
| Software Dependencies | No | The paper mentions using optimizers like SGD and techniques like cosine learning rate annealing, but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with their version numbers. |
| Experiment Setup | Yes | For training CNN-13, we employed the SGD optimizer with a Nesterov momentum (Nesterov 1983) of 0.9, an L2 regularization coefficient of 1e-4 for CIFAR-10 and CIFAR-100 datasets and 5e-5 for SVHN, and an initial learning rate α of 0.1. ... For the WRN-28-2 model, the training configuration includes the SGD optimizer, an L2 regularization coefficient of 5e-4, and an initial learning rate of 0.01. ... Specifically for BOPL, we set the batch size to 128, λ = 1e-2, ϵ = 1e-2, γ = 0.5, β = 0.999, and η = 1. We pre-train the model for 50 epochs using the Mean-Teacher algorithm and then proceed to train BOPL for 400 epochs. Finally, we fine-tune the model for 100 epochs using both the labeled data and the unlabeled data with learned pseudo-labels. |