SMART: An Open Source Data Labeling Platform for Supervised Learning

Authors: Rob Chew, Michael Wenger, Caroline Kery, Jason Nance, Keith Richards, Emily Hadley, Peter Baumgartner

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The active learning model view provides information on the performance of a classifier trained on the labeled project data. The purpose of this view is to help provide insight on when the data labeling has hit a natural saturation point (additional labels are not improving model performance) and to add transparency to the active learning mechanism. SMART currently supports uncertainty sampling (Lewis and Gale, 1994) as the active learning algorithm... After each batch of data is labeled, the model is retrained, and its performance is updated and displayed in an interactive data visualization. Finally, the inter-rater reliability (IRR) view lets privileged users understand how consistently labelers agree on labels that are double-coded. SMART uses Cohens kappa coefficient (Cohen, 1960) to measure IRR between two independent labelers and Fleiss kappa (Fleiss, 1971) for more than two labelers.
Researcher Affiliation Collaboration Rob Chew EMAIL Michael Wenger EMAIL ... Center for Data Science RTI International Research Triangle Park, NC 27709, USA Caroline Kery EMAIL
Pseudocode No The paper describes the functionality and features of the SMART platform but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The project website1 contains links to the code repository and extensive user documentation. 1. https://rtiinternational.github.io/SMART/
Open Datasets No The paper introduces SMART, a data labeling platform, and describes how users can upload their own unlabeled data or provide pre-labeled data. It does not use or provide access to a specific open dataset for its own evaluation or experiments.
Dataset Splits No The paper describes a data labeling platform and its functionalities, including active learning where models are retrained after each batch of data is labeled. However, it does not specify any fixed training/validation/test splits for experiments conducted within the paper itself.
Hardware Specification No The paper describes a web application and its software stack, mentioning it is designed to be platform agnostic. It does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for development or testing.
Software Dependencies No SMART is a web application using React, Bootstrap, d3, and webpack for front-end interactions and Django, Redis, and Postgre SQL to support back-end operations. Docker and Docker-Compose are used for OS-level virtualization to aid configuration management and ease deployment in new environments. Docker and Docker-Compose are also SMART's only dependencies; instructions to install SMART using these systems can be found online in the user documentation. Currently, several classifier types are supported through the Scikit-learn python library (Pedregosa et al., 2011).
Experiment Setup No The advanced settings allow the user to customize the active learning model, set the batch size for the number of observations to label prior to re-running computations, and to enable/disable inter-rater reliability. While sensible default settings are provided, all settings can be customized to meet the specific needs of the labeling project.