reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing

Authors: Yuchen Zhang, Xi Chen, Dengyong Zhou, Michael I. Jordan

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.
Researcher Affiliation	Collaboration	Yuchen Zhang EMAIL Department of Electrical Engineering and Computer Science University of California, Berkeley, Berkeley, CA 94720, USA. Xi Chen EMAIL Stern School of Business New York University, New York, NY 10012, USA. Dengyong Zhou EMAIL Microsoft Research 1 Microsoft Way, Redmond, WA 98052, USA. Michael I. Jordan EMAIL Department of Electrical Engineering and Computer Science and Department of Statistics University of California, Berkeley, Berkeley, CA 94720, USA.
Pseudocode	Yes	Algorithm 1: Estimating confusion matrices. Algorithm 2: Estimating one-coin model.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It does not mention any repository links, explicit code release statements, or code in supplementary materials.
Open Datasets	Yes	For real data experiments, we compare crowdsourcing algorithms on ﬁve datasets: three binary tasks and two multi-class tasks. Binary tasks include labeling bird species (Welinder et al., 2010) (Bird dataset), recognizing textual entailment (Snow et al., 2008) (RTE dataset) and assessing the quality of documents in TREC 2011 crowdsourcing track (Lease and Kazai, 2011) (TREC dataset). Multi-class tasks include labeling the bread of dogs from Image Net (Deng et al., 2009) (Dog dataset) and judging the relevance of web search results (Zhou et al., 2012) (Web dataset).
Dataset Splits	No	The paper describes how synthetic data was generated but does not specify explicit training, validation, or test splits for either synthetic or real datasets to enable reproduction of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	For the Opt-D&S algorithm and the MV-D&S estimator, the estimation is outputted after ten EM iterates. For the group partitioning step involved in the Opt-D&S algorithm, the workers are randomly and evenly partitioned into three groups. ... For the MV-D&S estimator and the Opt-D&S algorithm, we iterate their EM steps until convergence. ... The default choice of the thresholding parameter is = 10^-6.