reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CEKA: A Tool for Mining the Wisdom of Crowds

Authors: Jing Zhang, Victor S. Sheng, Bryce A. Nicholson, Xindong Wu

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	It makes the entire knowledge discovery procedure much easier, including analyzing qualities of workers, simulating labeling behaviors, inferring true class labels of instances, ﬁltering and correcting mislabeled instances (noise), building learning models and evaluating them. [...] Figure 2 demonstrates a simple experiment including the ground truth inference, noise correction and performance evaluation. In this sample code, like DS, all inference algorithms provide a uniform interface function do Inference, which assigns every instance an integrated label. [...] The statistical information of the performance will be obtained when the class Performance Statistic is applied to a Dataset object with the ground truth provided.
Researcher Affiliation	Academia	Jing Zhang EMAIL School of Computer Science and Information Engineering Hefei University of Technology (HFUT), Hefei 230009, China Department of Software Engineering, School of Computer Science and Engineering Nanjing University of Science and Technology (NJUST), Nanjing 210094, China Victor S. Sheng EMAIL Bryce A. Nicholson EMAIL Department of Computer Science, University of Central Arkansas, Conway, AR 72035, USA Xindong Wu EMAIL School of Computer Science and Information Engineering, HFUT, Hefei 230009, China Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
Pseudocode	Yes	Figure 2: A sample code for a basic usage String resp Path=D:/adult.response.txt; // labels obtained from crowd String arff Path=D:/adult.arffx; // ground truth and features Dataset data = load File(resp Path, null, arff Path); // infer the ground truth by Dawid & Skene s algorithm Dawid Skene ds Algo = new Dawid Skene(50); ds Algo.do Inference(data); // noise filtering with the CF algorithm Classifier [] classifiers = new Classifier[1]; Classifiers[0] = new SMO(); // SMO Classifier in WEKA Classification Filter noise Filter = new Classification Filter(10); Dataset[] sub Data = null; // cleansed and noise data sets cf.Filter Noise(data, classifiers[0]); // conduct noise filtering sub Data[0] = noise Filter. get Cleansed Dataset(); sub Data[1] = noise Filter. get Noise Dataset(); // noise correction with STC algorithm Self Train Correction stc = new Self Train Correction(sub Data[0], sub Data[1], 1.0); stc.correction(classifiers[0]); // correct mislabeled data // combining two data sets and then evaluate performance Dataset Manipulator.add All Examples(sub Data[0], sub Data[1]); Performance Statistic perf Stat = new Performance Statistic(); perf Stat.stat(sub Data[0]);
Open Source Code	Yes	CEKA is written in Java and completely open source. Therefore, many new ideas and methods, such as noise correction for crowdsourcing, are easily integrated. The project CEKA is available at: http://ceka.sourceforge.net/.
Open Datasets	No	The paper mentions using a dataset via local file paths (e.g., "D:/adult.response.txt" and "D:/adult.arffx") in the usage example, but does not provide any specific link, DOI, repository name, or formal citation for public access to this dataset.
Dataset Splits	No	The paper does not provide specific dataset split information such as exact percentages for training, validation, and test sets, or references to predefined splits needed for reproducibility. It only shows an example of creating 'cleansed and noise datasets'.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or processing speeds used for running experiments.
Software Dependencies	No	The paper mentions that "CEKA is written in Java with core classes being compatible with the well-known machine learning tool WEKA" and uses a "SMO Classifier in WEKA", but it does not specify any version numbers for Java or WEKA.
Experiment Setup	Yes	The sample code provides specific parameters for algorithm initialization: "Dawid Skene ds Algo = new Dawid Skene(50);", "Classification Filter noise Filter = new Classification Filter(10);", and "Self Train Correction stc = new Self Train Correction(sub Data[0], sub Data[1], 1.0);". These are concrete hyperparameter values used in the experimental setup.