Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Doubly Robust Crowdsourcing
Authors: Chong Liu, Yu-Xiang Wang
JAIR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection rules, we often need much lower label cost to achieve nearly the same accuracy as in the ideal world where all workers label all data points. In this section, we report our experimental results, including both synthetic and real-world experiments. |
| Researcher Affiliation | Academia | Chong Liu EMAIL Yu-Xiang Wang EMAIL Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106, USA |
| Pseudocode | No | The paper describes algorithms such as Direct Method (DM), Importance Sampling (IS), Doubly Robust Crowdsourcing (DRC), Adaptive Item Selection (AIS), and Adaptive Worker Selection (AWS) using mathematical formulations and descriptive text, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | We use five classification datasets, Segment, Satimage, Usps (Hull, 1994), Pendigits, and Mnist (Le Cun et al., 1998), collected by Libsvm (Chang and Lin, 2011), which are all publicly available. We do experiments on three real-world datasets: Music Genre (Rodrigues et al., 2013), Dog (Zhou et al., 2012), and Rotten Tomatoes (Rodrigues et al., 2013). |
| Dataset Splits | Yes | In step 2, we uniformly sample the dataset into two equal parts, one for training and the other for test. In order to remove the randomness of this splitting process, the sampling index is fixed and saved for all further experiments. In step 3, we uniformly sample the test part of step 2 into two equal parts, one working as source part and the other working as target part. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions software like 'sklearn package' and 'Libsvm' but does not specify their version numbers or any other key software components with versions needed to replicate the experiments. |
| Experiment Setup | Yes | Decision trees with maximum depth 3 are used to generate surrogate labels for DRC-DS, DRC-MV, and DM. For DRC-AIS and DRC-AWS-AIS, the confidence margin parameter ρ is set to be 0.03, 0.06, and 0.09. For DRC-AWS and DRC-AWS-AIS, the multiplier parameter λ is set to be 1, 2, 3, 4, 5, 7, 10, 15, 25, and 50. Poisson sampling over workers with π going from 0.1 to 1.0 with interval of 0.1. Also, depth of decision trees is set to be 100. |