Label Distribution Propagation-based Label Completion for Crowdsourcing

Authors: Tong Wu, Liangxiao Jiang, Wenjun Zhang, Chaoqun Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both real-world and simulated crowdsourced datasets show that LDPLC significantly outperforms WSLC in enhancing the performance of label integration algorithms. Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.
Researcher Affiliation Academia 1School of Computer Science, China University of Geosciences, Wuhan 430074, China 2School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China. Correspondence to: Liangxiao Jiang <EMAIL>.
Pseudocode Yes Algorithm 1 LDPLC 1: Input: D = {(xi, Li)}N i=1 -a crowdsourced dataset 2: Output: D -the completed crowdsourced dataset 3: for r = 1 to R do 4: Construct a dataset Dr for ur 5: for m = 1 to M do 6: Learn the feature value vrm of ur by Eqs. (1)-(6) 7: end for 8: end for 9: for r = 1 to R do 10: for r = 1 to R do 11: Estimate the worker similarity s(ur, ur ) by Eqs. (7) and (8) 12: end for 13: end for 14: for i = 1 to N do 15: for r = 1 to R do 16: if lir! = 1 then 17: Initialize the label distribution Pir by Eq. (9) 18: else 19: Initialize the label distribution Pir by Eq. (10) 20: end if 21: end for 22: end for 23: for i = 1 to N do 24: Query the neighbors Ni for xi 25: Optimize the neighbors weights wi by Eqs. (11) and (12) 26: end for 27: for t = 1 to T do 28: for i = 1 to N do 29: for r = 1 to R do 30: Propagate the neighbors label distributions to Pir by Eq. (13) 31: end for 32: end for 33: end for 34: for i = 1 to N do 35: for r = 1 to R do 36: if lir == 1 then 37: Complete the missing label lir by Eq. (14) 38: end if 39: end for 40: end for 41: return D
Open Source Code Yes Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.
Open Datasets Yes Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC. To evaluate the performance of LDPLC, we conduct experiments on the widely used real-word crowdsourced dataset Label Me . Except for the dataset Label Me , we also conduct experiments on two other widely used real-world crowdsourced datasets Ruters and Leaves to validate the effectiveness of LDPLC. In this subsection, we conduct a series of experiments on the whole 34 simulated datasets published on the CEKA platform.
Dataset Splits No The paper mentions simulating crowd workers and states: "As a result, approximately 95% of the labels in the label matrix are missing, reflecting the sparsity observed in real-world crowdsourced datasets." However, it does not provide specific training/test/validation splits for the instances themselves, which is typically what the question refers to.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions using "Crowd Environment and its Knowledge Analysis (CEKA) (Zhang et al., 2015a) platform" and "Waikato Environment and Knowledge Analysis (WEKA) (Witten et al., 2011) platform." However, it does not specify version numbers for these platforms or any other libraries or software dependencies.
Experiment Setup Yes For simplicity, we set both the number of neighbors K and the number of iterations T to 5. All experiments are independently repeated ten times, and the average results of the ten experiments are used as the final results. The parameter settings of all label integration algorithms are consistent with those specified in their original papers. Therefore, for each simulated crowd worker ur, we randomly sample its annotation ratio and annotation quality from the uniform distribution [0, 0.1] and [0.6, 0.9], respectively.