Adversarial Attacks on Crowdsourcing Quality Control
Authors: Alessandro Checco, Jo Bates, Gianluca Demartini
JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement and experimentally validate the gold question detection system, using realworld data from a popular crowdsourcing platform. Our experimental results show that crowdworkers using the proposed system spend more time on signalled gold questions but do not neglect the others thus achieving an increased overall work quality. |
| Researcher Affiliation | Academia | Alessandro Checco EMAIL Information School, The University of Sheffield Regent Court 211 Portobello Sheffield S1 4DP, United Kingdom Jo Bates EMAIL Information School, The University of Sheffield Regent Court 211 Portobello Sheffield S1 4DP, United Kingdom Gianluca Demartini EMAIL School of Information Technology and Electrical Engineering University of Queensland, GP South Building StaffHouse Road, St Lucia QLD 4072, Australia |
| Pseudocode | No | The paper describes the system architecture and workflows (e.g., Client Workflow Simhash, Server Workflow Clustering) using figures and detailed textual descriptions of steps, but it does not include a distinct, structured pseudocode or algorithm block. |
| Open Source Code | Yes | The core functionalities of the plugin to replicate the following experiments are available at https://github.com/Alessandro Checco/ all-that-glitters-is-gold. |
| Open Datasets | Yes | We use the csta datasets and task logs described in (Benoit, Conway, Lauderdale, Laver, & Mikhaylov, 2016), consisting of crowdsourced annotations of political data. ...available from https://github.com/ kbenoit/CSTA-APSR. |
| Dataset Splits | No | The paper mentions characteristics of the datasets such as the percentage of gold questions (e.g., "12.4 % of them are gold questions") and the number of judgements per non-gold question (e.g., "each non-gold question had been answered by 10 workers"). It also describes sub-sampling to vary these parameters. However, it does not specify explicit training/test/validation splits for a model in a typical machine learning context that would be needed for direct reproduction of data partitioning. |
| Hardware Specification | No | The paper describes a browser plug-in and an external server but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments or the server infrastructure. |
| Software Dependencies | No | The paper mentions technologies like "browser plug-in", "Java Script bookmarklet", "Orbit DB3" (with a GitHub link), and statistical models like "Gaussian mixture model". However, it does not specify version numbers for any programming languages, libraries, or operating systems used in their implementation or experimental setup. |
| Experiment Setup | Yes | Regarding the parameters of the system, we consider a realistic scenario: a job of 2000 tasks with an additional 5 % (100 tasks) of gold questions. We consider the default automatic behaviour of Figure Eight: 10 gold questions are used at the beginning to train and test the ability of the worker (i. e. a quiz page). After that, pages of 10 tasks are shown to the worker, of which 9 are requested tasks and one is a gold question. To be considered trusted, workers are required, by default, to judge a minimum of four gold questions and to reach an accuracy threshold of 70 %. ... Confidence: The worker will consider as gold all questions with signalled probability of being gold of at least 50 %. |