Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
Authors: Matteo Venanzi, John Guiver, Pushmeet Kohli, Nicholas R. Jennings
JAIR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using two realworld public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a task s duration compared to state of the art methods. |
| Researcher Affiliation | Collaboration | Matteo Venanzi EMAIL Microsoft, 2 Waterhouse Square, London EC1N 2ST UK John Guiver EMAIL Microsoft Research, 21 Station Road, Cambridge CB1 2FB UK Pushmeet Kohli EMAIL Microsoft Research, One Microsoft Way, Redmond WA 98052-6399 US Nicholas R. Jennings EMAIL Imperial College, South Kensington, London SW7 2AZ UK |
| Pseudocode | No | The paper includes a factor graph (Figure 5) and describes the probabilistic inference process mathematically, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using Infer.NET for its implementation: "Using Infer.NET, we are able to train BCCTime on our largest dataset of 12,190 judgments within seconds using approximately 80MB of RAM on a standard laptop." However, it does not provide any statement about releasing their own source code for BCCTime or a link to a repository. |
| Open Datasets | Yes | Using two realworld public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a task s duration compared to state of the art methods. Zen Crowd India (ZC-IN): contains a set of links between the names of entities extracted from news articles and uniform resource identifiers (URIs) describing the entity in Freebase7 and DBpedia8 (Demartini et al., 2012). Zen Crowd USA (ZC-US): This dataset was also provided by Demartini et al. (2012) and contains judgements for the same set of tasks as ZC-IN... Weather Sentiment AMT (WS-AMT): The Weather Sentiment dataset was provided by Crowd Flower for the 2013 Crowdsourcing at Scale shared task challenge.9 It includes 300 tweets with 1,720 judgements from 461 workers and has been used in several experimental evaluations of crowdsourcing models (Simpson et al., 2015; Venanzi et al., 2014; Venanzi, Teacy, Rogers, & Jennings, 2015b). In detail, the workers were asked to classify the sentiment of tweets with respect to the weather into the following categories: negative (0), neutral (1), positive (2), tweet not related to weather (3) and can t tell (4). As a result, this dataset pertains to a multi-class classification problem. However, the original dataset used in the Share task challenge did not contain any time information about the collected judgments. Therefore, a new dataset (WS-AMT), was recollected for the same tasks as in the Crowd Flower shared task dataset using the AMT platform, acquiring exactly 20 judgements and recording the elapsed time for each judgment (Venanzi, Rogers, & Jennings, 2015a). |
| Dataset Splits | No | The paper discusses evaluating performance over sub-samples of judgments (Figure 8) and mentions accuracy metrics like AUC and average recall, which imply evaluation on data. However, it does not explicitly provide details about how the datasets were split into training, validation, or test sets for reproducibility (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | No | Using Infer.NET, we are able to train BCCTime on our largest dataset of 12,190 judgments within seconds using approximately 80MB of RAM on a standard laptop. This mentions a "standard laptop" and 80MB of RAM, but lacks specific details such as CPU or GPU models, or exact processor types to be considered a specific hardware specification. |
| Software Dependencies | Yes | In particular, we use the well-known EP algorithm (Minka, 2001) that has been shown to provide good quality approximations for BCC models (Venanzi et al., 2014)10. This method leverages a factorised distribution of the joint probability to approximate the marginal posterior distributions through an iterative message passing scheme implemented on the factor graph. Specifically, we use the EP implementation provided by Infer.NET (Minka, Winn, Guiver, & Knowles, 2014), which is a standard framework for running Bayesian inference in probabilistic models. Bibliography: Minka, T., Winn, J., Guiver, J., & Knowles, D. (2014). Infer.NET 2.6. Microsoft Research Cambridge. |
| Experiment Setup | Yes | Therefore, the workers confusion matrices are initialised with a slightly higher value on the diagonal (0.6) and lower values on the rest of the matrix. Then, the Dirichlet priors for p and s are set uninformatively with uniform counts12. The priors of the confusion matrices were initialised with a higher diagonal value (0.7) meaning that a priori the workers are assumed to be better than random. The Gaussian priors for the tasks time durations are set with means σ0 = 10 and λ0 = 50 and precisions γ0 = δ0 = 10 1, meaning that a priori each entity linking task is expected to be completed within 10 and 50 seconds. Furthermore, we initialise the Beta prior of ψk as a function of the number of tasks with α0 = 0.7N and β0 = 0.3N to represent the fact that a priori the worker is considered as a reliable if she makes valid labelling attempts for 70% of the tasks. Importantly, given the shape distribution of the worker s time completion data observed in the datasets (see Figure 2), we apply a logarithmic transformation to τ (k) i in order to obtain a more uniform distribution of workers completion time in the training data. Finally, the priors of all the benchmarks were set equivalently to BCCTime. |