Wrapped Partial Label Dimensionality Reduction via Dependence Maximization

Authors: Xiang-Ru Yu, Deng-Bao Wang, Min-Ling Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments over a broad range of synthetic and real-world partial label data sets validate that the performance of well-established partial label learning algorithms can be significantly improved by the proposed WPLDR. To evaluate the effectiveness of proposed WPLDR, we couple five state-of-the-art partial label learning algorithms with four partial dimensionality reduction approaches, DELIN, CENDA, PLDA and the proposed WPLDR. For each partial label learning method L, the coupled version is denoted as L-DELIN, L-CENDA and L-WPLDR respectively. In this paper, we instantiate L with five well-established partial label learning algorithms, and their parameter configurations are set based on the recommendations provided in corresponding literatures. In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classification results are reported.
Researcher Affiliation Academia Xiang-Ru Yu1,3 , Deng-Bao Wang2,3 , Min-Ling Zhang2,3 1School of Cyber Science and Engineering, Southeast University, China 2School of Computer Science and Engineering, Southeast University, China 3Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Eduction, China EMAIL
Pseudocode Yes Table 1: The pseudo-code of WPLDR. Inputs: D: partial label training data set {(xi, Si) | 1 i m} (X Rd, Y = {l1, l2, . . . , lq}, xi X, Si Y) d : the number of retained dimension after dimensionality reduction k: the number of nearest neighbors used to update the label confidence matrix α: the feature space trade-off parameter β: the label space trade-off parameter µ: the constraints trade-off parameter Outputs: P: the d d projection matrix via the WPLDR D : the transformed lower-dimensional partial label training set {(x i, Si) | 1 i m} Process: 1: Initialize the m q label confidence matrix F0 as shown in Eq. (2); 2: Cascade the training data into the instance matrix X = [x1, x2, . . . , xm]; 3: Initialize the d d projection matrix P0 via dependence maximization between embedded feature and label information as shown in Eq.(7); 4: repeat 5: Calculate the similarity matrix S according to the embedded feature vectors and label confidence matrix via Eq.(8); 6: Calculate H = I 1 mee T ; 7: Update the label confidence matrix F as shown in Eq. (12), which is a transformed problem of WPLDR in Eq. (1); 8: Update projection matrix P, and solve the transformed problem in Eq.(13). Given the generalized eigenvalue problem in Eq.(16), then the projection matrix is obtained by concatenating the d eigenvectors w.r.t. the top d eigenvalues; 9: until convergence 10: Derive the lower-dimensional partial label training data sets D with d features via the projection matrix P, X = P X;
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide any links to a code repository.
Open Datasets Yes Seven real-world partial label data sets have been collected from different tasks and domains. Due to page limit, Table 3 reports the experimental results on real-world as well as synthetic partial label data sets with different configurations. As shown in Table 3, the experimental results illustrate the classification accuracy of partial label algorithms before and after employing three dimensionality reduction approaches DELIN, CENDA, PLDA and WPLDR. In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classification results are reported. Synthetic Data Sets Following the widely used controlling protocal in partial label learning, synthetic partial label data sets are generated from UCI multi-class data sets with controlling parameter r, which indicates the number of false positive labels added in candidate label set. For each synthetic data set, we set r as {1, 2, 3} to evaluate the performance under different ambiguity levels.The detailed experimental results with r = 2 are reported in Table 2. In addition, the pairwise t-test at 0.05 significance level is conducted to show whether the performance difference between two comparison methods is significant in statistics, and the results of win/tie/loss counts with r = 1/2/3 are reported in Table 3. Based on these comparative results, the following observations can be concluded: Compared with partial label learning algorithms L, across the 105 statistical comparison cases (7 synthetic data sets 3 configurations 5 algorithms), the proposed WPLDR achieves superior or comparable classification performance in 98 cases. Compared with the existing partial label dimensionality reduction method DELIN, WPLDR achieves comparable or better performance in all cases, while it achieves significant performance improvement in 97 cases in pairwise t-test at 0.05 significance level. Compared with the existing PLDA, WPLDR achieves comparable or better performance in 95 cases, while the improvement is more impressive in most cases. Compared with the existing CENDA, among all the 105 cases, L-WPLDR achieves comparable or better classification performance in 95 cases. For high dimensional dataset amazon, where the dimen-sion of feature vector exceeds 1,300, compared with L, the classification performance has been improved with WPLDR by more than 0.3 in 14 cases among 15 cases (3 configurations 5 algorithms). These results indicate the superior performance of WPLDR in difficult settings. Data Set Amazon Enron Dermatology Winerate Zoo Segm-2500 Segm-3000 r = 2 (two false positive label)
Dataset Splits Yes In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classification results are reported. For each synthetic data set, we set r as {1, 2, 3} to evaluate the performance under different ambiguity levels.
Hardware Specification No The paper does not contain specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions that a standard Quadratic Programming (QP) problem can be efficiently solved by off-the-shelf QP tools and that the Convex-Concave Procedure (CCCP) is used to solve a problem. However, no specific software names or version numbers for these tools or any other libraries are provided.
Experiment Setup Yes To evaluate the effectiveness of proposed WPLDR, we couple five state-of-the-art partial label learning algorithms with four partial dimensionality reduction approaches, DELIN, CENDA, PLDA and the proposed WPLDR. For each partial label learning method L, the coupled version is denoted as L-DELIN, L-CENDA and L-WPLDR respectively. In this paper, we instantiate L with five well-established partial label learning algorithms, and their parameter configurations are set based on the recommendations provided in corresponding literatures. PL-KNN [H ullermeier and Beringer, 2006]: an averaging-based partial label learning algorithm, which makes prediction by weighted voting on candidate labels from k NN instances [suggested configuration: k=10]. PL-SVM [Nguyen and Caruana, 2008]: an identification-based partial label learning approach which induces classification model by adapting maximum margin [suggested configuration: regularization parameter pool with {10 3, . . . , 103}]. IPAL [Zhang and Yu, 2015]: a disambiguation-based partial label learning method, which determines the valid label via label propagation on weighted graph [suggested configuration: k=10, balancing parameter α = 0.95]. SURE [Feng and An, 2019]: a self-training partial label learning algorithm, under proper constraints, which unifies the model training and identification of pseudo label into one formulation [suggested configuration: regularization parameters λ = 0.3, β = 0.05]. PL-AGGD [Wang et al., 2021]: an adaptive graph guided disambiguation algorithm, which jointly performs graph construction, model training and partial label disambiguation in a framework. [suggested configuration: k = 10, µ = 1 and γ = 0.05 ]. For WPLDR, k (the number of nearest neighbors) is an important paramter. Fig. 1 illustrates how the classification accuracy of each partial label learning algorithm changes as k increases from 3 to 10 with interval 1. As is shown, on these four datasets, the classification accuracy of all partial label learning algorithms coupled with WPLDR is very stable cross different settings of k. Furthermore, the trade-off factors α and β serve as important parameters. In Fig. 2, the values of α and β increase from 0.0001 to 100. As is shown, when coupling with WPLDR, classification accuracy of each partial label learning algorithm is relatively stable across different values of α and β. According to the empirical studies, we suggest the value of α and β can be simply set as 0.01 and 0.01 in practice.