CEKA: A Tool for Mining the Wisdom of Crowds
Authors: Jing Zhang, Victor S. Sheng, Bryce A. Nicholson, Xindong Wu
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It makes the entire knowledge discovery procedure much easier, including analyzing qualities of workers, simulating labeling behaviors, inferring true class labels of instances, filtering and correcting mislabeled instances (noise), building learning models and evaluating them. [...] Figure 2 demonstrates a simple experiment including the ground truth inference, noise correction and performance evaluation. In this sample code, like DS, all inference algorithms provide a uniform interface function do Inference, which assigns every instance an integrated label. [...] The statistical information of the performance will be obtained when the class Performance Statistic is applied to a Dataset object with the ground truth provided. |
| Researcher Affiliation | Academia | Jing Zhang EMAIL School of Computer Science and Information Engineering Hefei University of Technology (HFUT), Hefei 230009, China Department of Software Engineering, School of Computer Science and Engineering Nanjing University of Science and Technology (NJUST), Nanjing 210094, China Victor S. Sheng EMAIL Bryce A. Nicholson EMAIL Department of Computer Science, University of Central Arkansas, Conway, AR 72035, USA Xindong Wu EMAIL School of Computer Science and Information Engineering, HFUT, Hefei 230009, China Department of Computer Science, University of Vermont, Burlington, VT 05405, USA |
| Pseudocode | Yes | Figure 2: A sample code for a basic usage String resp Path=D:/adult.response.txt; // labels obtained from crowd String arff Path=D:/adult.arffx; // ground truth and features Dataset data = load File(resp Path, null, arff Path); // infer the ground truth by Dawid & Skene s algorithm Dawid Skene ds Algo = new Dawid Skene(50); ds Algo.do Inference(data); // noise filtering with the CF algorithm Classifier [] classifiers = new Classifier[1]; Classifiers[0] = new SMO(); // SMO Classifier in WEKA Classification Filter noise Filter = new Classification Filter(10); Dataset[] sub Data = null; // cleansed and noise data sets cf.Filter Noise(data, classifiers[0]); // conduct noise filtering sub Data[0] = noise Filter. get Cleansed Dataset(); sub Data[1] = noise Filter. get Noise Dataset(); // noise correction with STC algorithm Self Train Correction stc = new Self Train Correction(sub Data[0], sub Data[1], 1.0); stc.correction(classifiers[0]); // correct mislabeled data // combining two data sets and then evaluate performance Dataset Manipulator.add All Examples(sub Data[0], sub Data[1]); Performance Statistic perf Stat = new Performance Statistic(); perf Stat.stat(sub Data[0]); |
| Open Source Code | Yes | CEKA is written in Java and completely open source. Therefore, many new ideas and methods, such as noise correction for crowdsourcing, are easily integrated. The project CEKA is available at: http://ceka.sourceforge.net/. |
| Open Datasets | No | The paper mentions using a dataset via local file paths (e.g., "D:/adult.response.txt" and "D:/adult.arffx") in the usage example, but does not provide any specific link, DOI, repository name, or formal citation for public access to this dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information such as exact percentages for training, validation, and test sets, or references to predefined splits needed for reproducibility. It only shows an example of creating 'cleansed and noise datasets'. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or processing speeds used for running experiments. |
| Software Dependencies | No | The paper mentions that "CEKA is written in Java with core classes being compatible with the well-known machine learning tool WEKA" and uses a "SMO Classifier in WEKA", but it does not specify any version numbers for Java or WEKA. |
| Experiment Setup | Yes | The sample code provides specific parameters for algorithm initialization: "Dawid Skene ds Algo = new Dawid Skene(50);", "Classification Filter noise Filter = new Classification Filter(10);", and "Self Train Correction stc = new Self Train Correction(sub Data[0], sub Data[1], 1.0);". These are concrete hyperparameter values used in the experimental setup. |