reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wrapped Partial Label Dimensionality Reduction via Dependence Maximization

Authors: Xiang-Ru Yu, Deng-Bao Wang, Min-Ling Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments over a broad range of synthetic and real-world partial label data sets validate that the performance of well-established partial label learning algorithms can be signiﬁcantly improved by the proposed WPLDR. To evaluate the effectiveness of proposed WPLDR, we couple ﬁve state-of-the-art partial label learning algorithms with four partial dimensionality reduction approaches, DELIN, CENDA, PLDA and the proposed WPLDR. For each partial label learning method L, the coupled version is denoted as L-DELIN, L-CENDA and L-WPLDR respectively. In this paper, we instantiate L with ﬁve well-established partial label learning algorithms, and their parameter conﬁgurations are set based on the recommendations provided in corresponding literatures. In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classiﬁcation results are reported.
Researcher Affiliation	Academia	Xiang-Ru Yu1,3 , Deng-Bao Wang2,3 , Min-Ling Zhang2,3 1School of Cyber Science and Engineering, Southeast University, China 2School of Computer Science and Engineering, Southeast University, China 3Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Eduction, China EMAIL
Pseudocode	Yes	Table 1: The pseudo-code of WPLDR. Inputs: D: partial label training data set {(xi, Si) \| 1 i m} (X Rd, Y = {l1, l2, . . . , lq}, xi X, Si Y) d : the number of retained dimension after dimensionality reduction k: the number of nearest neighbors used to update the label conﬁdence matrix α: the feature space trade-off parameter β: the label space trade-off parameter µ: the constraints trade-off parameter Outputs: P: the d d projection matrix via the WPLDR D : the transformed lower-dimensional partial label training set {(x i, Si) \| 1 i m} Process: 1: Initialize the m q label conﬁdence matrix F0 as shown in Eq. (2); 2: Cascade the training data into the instance matrix X = [x1, x2, . . . , xm]; 3: Initialize the d d projection matrix P0 via dependence maximization between embedded feature and label information as shown in Eq.(7); 4: repeat 5: Calculate the similarity matrix S according to the embedded feature vectors and label conﬁdence matrix via Eq.(8); 6: Calculate H = I 1 mee T ; 7: Update the label conﬁdence matrix F as shown in Eq. (12), which is a transformed problem of WPLDR in Eq. (1); 8: Update projection matrix P, and solve the transformed problem in Eq.(13). Given the generalized eigenvalue problem in Eq.(16), then the projection matrix is obtained by concatenating the d eigenvectors w.r.t. the top d eigenvalues; 9: until convergence 10: Derive the lower-dimensional partial label training data sets D with d features via the projection matrix P, X = P X;
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide any links to a code repository.
Open Datasets	Yes	Seven real-world partial label data sets have been collected from different tasks and domains. Due to page limit, Table 3 reports the experimental results on real-world as well as synthetic partial label data sets with different conﬁgurations. As shown in Table 3, the experimental results illustrate the classiﬁcation accuracy of partial label algorithms before and after employing three dimensionality reduction approaches DELIN, CENDA, PLDA and WPLDR. In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classiﬁcation results are reported. Synthetic Data Sets Following the widely used controlling protocal in partial label learning, synthetic partial label data sets are generated from UCI multi-class data sets with controlling parameter r, which indicates the number of false positive labels added in candidate label set. For each synthetic data set, we set r as {1, 2, 3} to evaluate the performance under different ambiguity levels.The detailed experimental results with r = 2 are reported in Table 2. In addition, the pairwise t-test at 0.05 signiﬁcance level is conducted to show whether the performance difference between two comparison methods is signiﬁcant in statistics, and the results of win/tie/loss counts with r = 1/2/3 are reported in Table 3. Based on these comparative results, the following observations can be concluded: Compared with partial label learning algorithms L, across the 105 statistical comparison cases (7 synthetic data sets 3 conﬁgurations 5 algorithms), the proposed WPLDR achieves superior or comparable classiﬁcation performance in 98 cases. Compared with the existing partial label dimensionality reduction method DELIN, WPLDR achieves comparable or better performance in all cases, while it achieves significant performance improvement in 97 cases in pairwise t-test at 0.05 signiﬁcance level. Compared with the existing PLDA, WPLDR achieves comparable or better performance in 95 cases, while the improvement is more impressive in most cases. Compared with the existing CENDA, among all the 105 cases, L-WPLDR achieves comparable or better classiﬁcation performance in 95 cases. For high dimensional dataset amazon, where the dimen-sion of feature vector exceeds 1,300, compared with L, the classiﬁcation performance has been improved with WPLDR by more than 0.3 in 14 cases among 15 cases (3 conﬁgurations 5 algorithms). These results indicate the superior performance of WPLDR in difﬁcult settings. Data Set Amazon Enron Dermatology Winerate Zoo Segm-2500 Segm-3000 r = 2 (two false positive label)
Dataset Splits	Yes	In the following subsections, for each dataset, we perform ten-fold cross-validation, while the mean and standard deviation of classiﬁcation results are reported. For each synthetic data set, we set r as {1, 2, 3} to evaluate the performance under different ambiguity levels.
Hardware Specification	No	The paper does not contain specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions that a standard Quadratic Programming (QP) problem can be efﬁciently solved by off-the-shelf QP tools and that the Convex-Concave Procedure (CCCP) is used to solve a problem. However, no specific software names or version numbers for these tools or any other libraries are provided.
Experiment Setup	Yes	To evaluate the effectiveness of proposed WPLDR, we couple ﬁve state-of-the-art partial label learning algorithms with four partial dimensionality reduction approaches, DELIN, CENDA, PLDA and the proposed WPLDR. For each partial label learning method L, the coupled version is denoted as L-DELIN, L-CENDA and L-WPLDR respectively. In this paper, we instantiate L with ﬁve well-established partial label learning algorithms, and their parameter conﬁgurations are set based on the recommendations provided in corresponding literatures. PL-KNN [H ullermeier and Beringer, 2006]: an averaging-based partial label learning algorithm, which makes prediction by weighted voting on candidate labels from k NN instances [suggested conﬁguration: k=10]. PL-SVM [Nguyen and Caruana, 2008]: an identiﬁcation-based partial label learning approach which induces classiﬁcation model by adapting maximum margin [suggested conﬁguration: regularization parameter pool with {10 3, . . . , 103}]. IPAL [Zhang and Yu, 2015]: a disambiguation-based partial label learning method, which determines the valid label via label propagation on weighted graph [suggested conﬁguration: k=10, balancing parameter α = 0.95]. SURE [Feng and An, 2019]: a self-training partial label learning algorithm, under proper constraints, which uniﬁes the model training and identiﬁcation of pseudo label into one formulation [suggested conﬁguration: regularization parameters λ = 0.3, β = 0.05]. PL-AGGD [Wang et al., 2021]: an adaptive graph guided disambiguation algorithm, which jointly performs graph construction, model training and partial label disambiguation in a framework. [suggested conﬁguration: k = 10, µ = 1 and γ = 0.05 ]. For WPLDR, k (the number of nearest neighbors) is an important paramter. Fig. 1 illustrates how the classiﬁcation accuracy of each partial label learning algorithm changes as k increases from 3 to 10 with interval 1. As is shown, on these four datasets, the classiﬁcation accuracy of all partial label learning algorithms coupled with WPLDR is very stable cross different settings of k. Furthermore, the trade-off factors α and β serve as important parameters. In Fig. 2, the values of α and β increase from 0.0001 to 100. As is shown, when coupling with WPLDR, classiﬁcation accuracy of each partial label learning algorithm is relatively stable across different values of α and β. According to the empirical studies, we suggest the value of α and β can be simply set as 0.01 and 0.01 in practice.