reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ktrain: A Low-Code Library for Augmented Machine Learning

Authors: Arun S. Maiya

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present ktrain, a low-code Python library that makes machine learning more accessible and easier to apply. To illustrate ease of use, we provide fully-complete example for text classiﬁcation. More speciﬁcally, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews. Fine-Tuning a BERT Text Classiﬁer for Chinese: import ktrain from ktrain text as txt # STEP 1: load and preprocess data trn , val , preproc = txt . texts_from_folder( ' Chn Senti Corp ' , maxlen=75, preprocess_mode=' bert ' ) # STEP 2: load model and wrap in Learner model = txt . text_classifier( ' bert ' , trn , preproc=preproc) learner = ktrain . get_learner(model , train_data=trn , val_data=val) # STEP 3: estimate l e a r n i n g rate learner . lr_find(show_plot=True) # STEP 4: t r a i n model learner . fit_onecycle(2e 5, 4) Table 1 compares ktrain to popular low-code and Auto ML libraries in their out-of-the-box support for a variety of machine learning tasks.
Researcher Affiliation	Industry	Arun S. Maiya EMAIL Institute for Defense Analyses Alexandria, VA, USA
Pseudocode	No	The paper includes Python code examples for demonstrating the library's use, such as 'Fine-Tuning a BERT Text Classiﬁer for Chinese:' and 'Building an End-to-End Open-Domain QA System in ktrain'. These are actual code blocks, not pseudocode or algorithm blocks. The description of steps (e.g., STEP 1: Load and Preprocess Data) is in natural language prose.
Open Source Code	Yes	ktrain is open-source, free to use under a permissive Apache license, and available on Git Hub at: https://github.com/amaiya/ktrain.
Open Datasets	Yes	More speciﬁcally, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews.2 (Footnote 2: https://github.com/Tony607/Chinese_sentiment_analysis) using the well-studied 20 Newsgroups dataset.3 (Footnote 3: http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups)
Dataset Splits	No	The paper mentions loading training and validation data (e.g., 'trn , val , preproc = txt . texts_from_folder(...)') for the Chinese sentiment analysis example and loading documents into a list ('docs') for the 20 Newsgroups QA system. However, it does not specify the exact percentages, sample counts, or a detailed methodology for how these datasets were split into training, validation, or test sets.
Hardware Specification	No	The paper generally mentions that 'fast models such as fast Text... and NBSVM... are amenable to being trained on a standard laptop CPU.' This is a general statement about capability, not a specific hardware specification used for running the experiments described in the paper. No specific CPU models, GPU models, or other detailed hardware configurations are provided.
Software Dependencies	No	The paper mentions several software components like 'Python library', 'TensorFlow', 'transformers', 'scikit-learn', and 'stellargraph', and provides Python code examples that import 'ktrain'. However, it does not provide specific version numbers for any of these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup	Yes	The example for 'Fine-Tuning a BERT Text Classiﬁer for Chinese:' includes the line: 'learner . fit_onecycle(2e 5, 4)', which explicitly provides a learning rate (2e-5) and the number of epochs (4) for the training process.