SMART: An Open Source Data Labeling Platform for Supervised Learning
Authors: Rob Chew, Michael Wenger, Caroline Kery, Jason Nance, Keith Richards, Emily Hadley, Peter Baumgartner
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The active learning model view provides information on the performance of a classifier trained on the labeled project data. The purpose of this view is to help provide insight on when the data labeling has hit a natural saturation point (additional labels are not improving model performance) and to add transparency to the active learning mechanism. SMART currently supports uncertainty sampling (Lewis and Gale, 1994) as the active learning algorithm... After each batch of data is labeled, the model is retrained, and its performance is updated and displayed in an interactive data visualization. Finally, the inter-rater reliability (IRR) view lets privileged users understand how consistently labelers agree on labels that are double-coded. SMART uses Cohens kappa coefficient (Cohen, 1960) to measure IRR between two independent labelers and Fleiss kappa (Fleiss, 1971) for more than two labelers. |
| Researcher Affiliation | Collaboration | Rob Chew EMAIL Michael Wenger EMAIL ... Center for Data Science RTI International Research Triangle Park, NC 27709, USA Caroline Kery EMAIL |
| Pseudocode | No | The paper describes the functionality and features of the SMART platform but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The project website1 contains links to the code repository and extensive user documentation. 1. https://rtiinternational.github.io/SMART/ |
| Open Datasets | No | The paper introduces SMART, a data labeling platform, and describes how users can upload their own unlabeled data or provide pre-labeled data. It does not use or provide access to a specific open dataset for its own evaluation or experiments. |
| Dataset Splits | No | The paper describes a data labeling platform and its functionalities, including active learning where models are retrained after each batch of data is labeled. However, it does not specify any fixed training/validation/test splits for experiments conducted within the paper itself. |
| Hardware Specification | No | The paper describes a web application and its software stack, mentioning it is designed to be platform agnostic. It does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for development or testing. |
| Software Dependencies | No | SMART is a web application using React, Bootstrap, d3, and webpack for front-end interactions and Django, Redis, and Postgre SQL to support back-end operations. Docker and Docker-Compose are used for OS-level virtualization to aid configuration management and ease deployment in new environments. Docker and Docker-Compose are also SMART's only dependencies; instructions to install SMART using these systems can be found online in the user documentation. Currently, several classifier types are supported through the Scikit-learn python library (Pedregosa et al., 2011). |
| Experiment Setup | No | The advanced settings allow the user to customize the active learning model, set the batch size for the number of observations to label prior to re-running computations, and to enable/disable inter-rater reliability. While sensible default settings are provided, all settings can be customized to meet the specific needs of the labeling project. |