Partial Label Clustering

Authors: Yutong Xie, Fuchao Yang, Yuheng Jia

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate our method realizes superior performance when comparing with state-of-the-art constrained clustering methods, and outperforms PLL and semi-supervised PLL methods when only limited samples are annotated. The code and appendix are publicly available at https://github.com/xyt-ml/PLC. 5 Experiments 5.1 Experimental Setup 5.2 Experimental Results 5.3 Further Analysis
Researcher Affiliation Academia Yutong Xie1, Fuchao Yang2, Yuheng Jia 3,4 1 Chien-Shiung Wu College, Southeast University, Nanjing 210096, China 2 College of Software Engineering, Southeast University, Nanjing 210096, China 3 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 4 Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Pseudo-code of PLC
Open Source Code Yes The code and appendix are publicly available at https://github.com/xyt-ml/PLC.
Open Datasets Yes To conduct a comprehensive evaluation of our proposed method, we compare our PLC method with other methods on both controlled UCI datasets and real-world datasets. The characteristics of controlled UCI datasets and real-world datasets can be found in Appendix D. Following the widelyused partial label data generation protocol [Cour et al., 2011], we generate the artificial partial label datasets under the controlling parameter r which controls the number of false-positive labels. For each example, we randomly select r other labels as false-positive labels. Table 1: Experimental results on ACC when compared with constrained clustering methods under different proportions of partial label training examples on real-world datasets, where bold and underlined indicate the best and second best results respectively. Compared Method Lost MSRCv2 Mirflickr Bird Song.
Dataset Splits Yes For constrained clustering methods, we randomly sample the partial label examples based on the proportion ρ {0.05, 0.10, 0.15, 0.20, 0.30, 0.40} and the remaining samples are used as test data. For PLL and semi-supervised PLL methods, we randomly sample the partial label examples based on the proportion ρ {0.01, 0.02, 0.05, 0.10}.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Parameters for our PLC method are set as α, β {0.01, 0.1, 1}, γ = 10 and k {10, 15, 20, 25, 30, 40}. Each compared method is implemented with the default hyper-parameter setup suggested in the respective literature. For each experiment, we implemented 10 times with random partitions and reported the average performance with the standard deviation.