reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SMART: An Open Source Data Labeling Platform for Supervised Learning

Authors: Rob Chew, Michael Wenger, Caroline Kery, Jason Nance, Keith Richards, Emily Hadley, Peter Baumgartner

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The active learning model view provides information on the performance of a classiﬁer trained on the labeled project data. The purpose of this view is to help provide insight on when the data labeling has hit a natural saturation point (additional labels are not improving model performance) and to add transparency to the active learning mechanism. SMART currently supports uncertainty sampling (Lewis and Gale, 1994) as the active learning algorithm... After each batch of data is labeled, the model is retrained, and its performance is updated and displayed in an interactive data visualization. Finally, the inter-rater reliability (IRR) view lets privileged users understand how consistently labelers agree on labels that are double-coded. SMART uses Cohens kappa coeﬃcient (Cohen, 1960) to measure IRR between two independent labelers and Fleiss kappa (Fleiss, 1971) for more than two labelers.
Researcher Affiliation	Collaboration	Rob Chew EMAIL Michael Wenger EMAIL ... Center for Data Science RTI International Research Triangle Park, NC 27709, USA Caroline Kery EMAIL
Pseudocode	No	The paper describes the functionality and features of the SMART platform but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The project website1 contains links to the code repository and extensive user documentation. 1. https://rtiinternational.github.io/SMART/
Open Datasets	No	The paper introduces SMART, a data labeling platform, and describes how users can upload their own unlabeled data or provide pre-labeled data. It does not use or provide access to a specific open dataset for its own evaluation or experiments.
Dataset Splits	No	The paper describes a data labeling platform and its functionalities, including active learning where models are retrained after each batch of data is labeled. However, it does not specify any fixed training/validation/test splits for experiments conducted within the paper itself.
Hardware Specification	No	The paper describes a web application and its software stack, mentioning it is designed to be platform agnostic. It does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for development or testing.
Software Dependencies	No	SMART is a web application using React, Bootstrap, d3, and webpack for front-end interactions and Django, Redis, and Postgre SQL to support back-end operations. Docker and Docker-Compose are used for OS-level virtualization to aid conﬁguration management and ease deployment in new environments. Docker and Docker-Compose are also SMART's only dependencies; instructions to install SMART using these systems can be found online in the user documentation. Currently, several classiﬁer types are supported through the Scikit-learn python library (Pedregosa et al., 2011).
Experiment Setup	No	The advanced settings allow the user to customize the active learning model, set the batch size for the number of observations to label prior to re-running computations, and to enable/disable inter-rater reliability. While sensible default settings are provided, all settings can be customized to meet the speciﬁc needs of the labeling project.