Weakly-Supervised Contrastive Learning for Imprecise Class Labels

Authors: Zi-Hao Zhou, Jun-Jie Wang, Tong Wei, Min-Ling Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically validate the efficacy of our framework across two paradigms of weakly-supervised learning, namely noisy label learning (NLL), partial label learning (PLL). For implementation, our weakly-supervised contrastive (WSC) loss is combined only with the simplest baseline. Detailed implementation can be found in Appendix E. [...] Table 1 shows the classification accuracy for each comparative approach. Most of the experimental results are the same as those reported in their original paper, except for Pro Mix, which we reproduce due to differences in our settings.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China. Correspondence to: Tong Wei <EMAIL>.
Pseudocode Yes The pseudo code for computing Lwsc in Equation 7 with batch data under a uniform class distribution is given in Algorithm 1, with the general version (without the uniform class assumption) provided in Appendix E. We can intuitively interpret Lwsc [...] Algorithm 1 The calculation of Lwsc in uniform class distribution case [...] Algorithm 2 batch data estimation of Lwsc under general scenario
Open Source Code Yes The implementation code is available at https://github.com/Speechless-10308/WSC.
Open Datasets Yes For the case of noisy label learning, following the settings in Li et al. 2020; Liu et al. 2020, we verify the effectiveness of our method on CIFAR-10 and CIFAR-100 (Krizhevsky, 2012) with two types of label noise: symmetric and asymmetric. [...] For the case of partial label learning, following Wang et al. 2022, we verify the effectiveness of our method on CIFAR-10 (Krizhevsky, 2012), CIFAR-100 (Krizhevsky, 2012) and CUB-200 (Wah et al., 2011b) with different partial ratios.
Dataset Splits Yes On CIFAR-10N, we used three noise types: Aggregate, Random1, and Worst, with corresponding noise rates of 9.03%, 17.23%, and 40.21%. For CIFAR-100N, we compared the model performance under Clean and Noisy settings with a noise rate of 40.21%. We also include a full comparison on Clothing1M which has more realistic and large-scale instance noise (Xiao et al., 2015). [...] Furthermore, we extend our experiments on CUB-200, as presented in the main paper, to include a wider range of partial label ratios, specifically 0.01, 0.05, 0.1.
Hardware Specification No The paper mentions model architectures (e.g., "an 18-layer Pre Act Res Net", "a 34-layer Res Net", "a 50-layer Res Net") but does not specify any hardware details such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions the use of "stochastic gradient descent (SGD)" and "cosine learning rate scheduler" but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For CIFAR-10 and CIFAR-100, an 18-layer Pre Act Res Net is employed and stochastic gradient descent (SGD) is utilized for training with a momentum value of 0.9, a weight decay factor of 0.001, and a batch size of 128 over a total of 250 epochs. The initial learning rate is set to 0.02 and adjusted by a cosine learning rate scheduler. In our framework, both the projection head and the classification head are configured as a two-layer Multilayer Perceptron (MLP) with a dimension of 256 and the number of classes. The parameters are set to α = 1, β = 12 for CIFAR-10 and α = 2, β = 300 for CIFAR-100. Additionally, for CIFAR-100, the last term of our loss function is scaled by a factor of 3. [...] The batch size is 256 and the weight decay is 0.0001 across all the PLL settings. We set the initial learning rate to be 0.1 and adjust it by Multi Step and Cosine scheduler on CIFAR datasets and CUB-200 respectively. On CIFAR-10 and CIFAR-100 (CIFAR-100-H), we train our model for 200 epochs using parameters α = 1, β = 12 for CIFAR-10 and α = 2, β = 300 for CIFAR-100. For the CUB dataset, we train for 300 epochs with parameters α = 2 and β linearly increasing from 0 to 400. The detailed choice of hyper-parameters is provided in Table 6.