Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing
Authors: Yuchen Zhang, Xi Chen, Dengyong Zhou, Michael I. Jordan
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods. |
| Researcher Affiliation | Collaboration | Yuchen Zhang EMAIL Department of Electrical Engineering and Computer Science University of California, Berkeley, Berkeley, CA 94720, USA. Xi Chen EMAIL Stern School of Business New York University, New York, NY 10012, USA. Dengyong Zhou EMAIL Microsoft Research 1 Microsoft Way, Redmond, WA 98052, USA. Michael I. Jordan EMAIL Department of Electrical Engineering and Computer Science and Department of Statistics University of California, Berkeley, Berkeley, CA 94720, USA. |
| Pseudocode | Yes | Algorithm 1: Estimating confusion matrices. Algorithm 2: Estimating one-coin model. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It does not mention any repository links, explicit code release statements, or code in supplementary materials. |
| Open Datasets | Yes | For real data experiments, we compare crowdsourcing algorithms on five datasets: three binary tasks and two multi-class tasks. Binary tasks include labeling bird species (Welinder et al., 2010) (Bird dataset), recognizing textual entailment (Snow et al., 2008) (RTE dataset) and assessing the quality of documents in TREC 2011 crowdsourcing track (Lease and Kazai, 2011) (TREC dataset). Multi-class tasks include labeling the bread of dogs from Image Net (Deng et al., 2009) (Dog dataset) and judging the relevance of web search results (Zhou et al., 2012) (Web dataset). |
| Dataset Splits | No | The paper describes how synthetic data was generated but does not specify explicit training, validation, or test splits for either synthetic or real datasets to enable reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For the Opt-D&S algorithm and the MV-D&S estimator, the estimation is outputted after ten EM iterates. For the group partitioning step involved in the Opt-D&S algorithm, the workers are randomly and evenly partitioned into three groups. ... For the MV-D&S estimator and the Opt-D&S algorithm, we iterate their EM steps until convergence. ... The default choice of the thresholding parameter is = 10^-6. |