Adversarial Attacks on Crowdsourcing Quality Control

Authors: Alessandro Checco, Jo Bates, Gianluca Demartini

JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement and experimentally validate the gold question detection system, using realworld data from a popular crowdsourcing platform. Our experimental results show that crowdworkers using the proposed system spend more time on signalled gold questions but do not neglect the others thus achieving an increased overall work quality.
Researcher Affiliation Academia Alessandro Checco EMAIL Information School, The University of Sheffield Regent Court 211 Portobello Sheffield S1 4DP, United Kingdom Jo Bates EMAIL Information School, The University of Sheffield Regent Court 211 Portobello Sheffield S1 4DP, United Kingdom Gianluca Demartini EMAIL School of Information Technology and Electrical Engineering University of Queensland, GP South Building StaffHouse Road, St Lucia QLD 4072, Australia
Pseudocode No The paper describes the system architecture and workflows (e.g., Client Workflow Simhash, Server Workflow Clustering) using figures and detailed textual descriptions of steps, but it does not include a distinct, structured pseudocode or algorithm block.
Open Source Code Yes The core functionalities of the plugin to replicate the following experiments are available at https://github.com/Alessandro Checco/ all-that-glitters-is-gold.
Open Datasets Yes We use the csta datasets and task logs described in (Benoit, Conway, Lauderdale, Laver, & Mikhaylov, 2016), consisting of crowdsourced annotations of political data. ...available from https://github.com/ kbenoit/CSTA-APSR.
Dataset Splits No The paper mentions characteristics of the datasets such as the percentage of gold questions (e.g., "12.4 % of them are gold questions") and the number of judgements per non-gold question (e.g., "each non-gold question had been answered by 10 workers"). It also describes sub-sampling to vary these parameters. However, it does not specify explicit training/test/validation splits for a model in a typical machine learning context that would be needed for direct reproduction of data partitioning.
Hardware Specification No The paper describes a browser plug-in and an external server but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments or the server infrastructure.
Software Dependencies No The paper mentions technologies like "browser plug-in", "Java Script bookmarklet", "Orbit DB3" (with a GitHub link), and statistical models like "Gaussian mixture model". However, it does not specify version numbers for any programming languages, libraries, or operating systems used in their implementation or experimental setup.
Experiment Setup Yes Regarding the parameters of the system, we consider a realistic scenario: a job of 2000 tasks with an additional 5 % (100 tasks) of gold questions. We consider the default automatic behaviour of Figure Eight: 10 gold questions are used at the beginning to train and test the ability of the worker (i. e. a quiz page). After that, pages of 10 tasks are shown to the worker, of which 9 are requested tasks and one is a gold question. To be considered trusted, workers are required, by default, to judge a minimum of four gold questions and to reach an accuracy threshold of 70 %. ... Confidence: The worker will consider as gold all questions with signalled probability of being gold of at least 50 %.