ktrain: A Low-Code Library for Augmented Machine Learning
Authors: Arun S. Maiya
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present ktrain, a low-code Python library that makes machine learning more accessible and easier to apply. To illustrate ease of use, we provide fully-complete example for text classification. More specifically, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews. Fine-Tuning a BERT Text Classifier for Chinese: import ktrain from ktrain text as txt # STEP 1: load and preprocess data trn , val , preproc = txt . texts_from_folder( ' Chn Senti Corp ' , maxlen=75, preprocess_mode=' bert ' ) # STEP 2: load model and wrap in Learner model = txt . text_classifier( ' bert ' , trn , preproc=preproc) learner = ktrain . get_learner(model , train_data=trn , val_data=val) # STEP 3: estimate l e a r n i n g rate learner . lr_find(show_plot=True) # STEP 4: t r a i n model learner . fit_onecycle(2e 5, 4) Table 1 compares ktrain to popular low-code and Auto ML libraries in their out-of-the-box support for a variety of machine learning tasks. |
| Researcher Affiliation | Industry | Arun S. Maiya EMAIL Institute for Defense Analyses Alexandria, VA, USA |
| Pseudocode | No | The paper includes Python code examples for demonstrating the library's use, such as 'Fine-Tuning a BERT Text Classifier for Chinese:' and 'Building an End-to-End Open-Domain QA System in ktrain'. These are actual code blocks, not pseudocode or algorithm blocks. The description of steps (e.g., STEP 1: Load and Preprocess Data) is in natural language prose. |
| Open Source Code | Yes | ktrain is open-source, free to use under a permissive Apache license, and available on Git Hub at: https://github.com/amaiya/ktrain. |
| Open Datasets | Yes | More specifically, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews.2 (Footnote 2: https://github.com/Tony607/Chinese_sentiment_analysis) using the well-studied 20 Newsgroups dataset.3 (Footnote 3: http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups) |
| Dataset Splits | No | The paper mentions loading training and validation data (e.g., 'trn , val , preproc = txt . texts_from_folder(...)') for the Chinese sentiment analysis example and loading documents into a list ('docs') for the 20 Newsgroups QA system. However, it does not specify the exact percentages, sample counts, or a detailed methodology for how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper generally mentions that 'fast models such as fast Text... and NBSVM... are amenable to being trained on a standard laptop CPU.' This is a general statement about capability, not a specific hardware specification used for running the experiments described in the paper. No specific CPU models, GPU models, or other detailed hardware configurations are provided. |
| Software Dependencies | No | The paper mentions several software components like 'Python library', 'TensorFlow', 'transformers', 'scikit-learn', and 'stellargraph', and provides Python code examples that import 'ktrain'. However, it does not provide specific version numbers for any of these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | The example for 'Fine-Tuning a BERT Text Classifier for Chinese:' includes the line: 'learner . fit_onecycle(2e 5, 4)', which explicitly provides a learning rate (2e-5) and the number of epochs (4) for the training process. |