reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

apricot: Submodular selection for data summarization in Python

Authors: Jacob Schreiber, Jeffrey Bilmes, William Stafford Noble

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the use of subset selection by training machine learning models to comparable accuracy using either the full data set or a representative subset thereof. To demonstrate the practical utility of the selected examples, we evaluated logistic regression models trained on subsets of examples from the two data sets.
Researcher Affiliation	Academia	Jacob Schreiber EMAIL Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195-4322, USA; Jeffrey Bilmes EMAIL Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195-4322, USA; William Stafford Noble EMAIL Department of Genome Science, University of Washington, Seattle, WA 98195-4322, USA
Pseudocode	No	The paper describes various algorithms (e.g., greedy algorithm, accelerated greedy algorithm, stochastic greedy, sample greedy, approximate lazy greedy, bidirectional greedy algorithms, Gree Di) but does not provide structured pseudocode or algorithm blocks for them.
Open Source Code	Yes	The code and tutorial Jupyter notebooks are available at https://github.com/jmschrei/apricot
Open Datasets	Yes	To illustrate this approach in apricot, we consider two data sets: classifying digits from images in the MNIST data set (Le Cun et al., 1998) and classifying articles of clothing from images in the Fashion MNIST data set (Xiao et al., 2017).
Dataset Splits	Yes	The subsets were chosen solely from the training sets (of 60,000 examples each) using either a facility location function or 20 iterations of random selection. The model is evaluated on the full test set each time.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software like "Python", "numba (Lam et al., 2015)", "scikit-learn transformers", and "keras (Chollet et al., 2015)", but it does not specify explicit version numbers for any of these components.
Experiment Setup	No	The paper mentions evaluating "logistic regression models" and using "subsets of varying sizes" but does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or other detailed training configurations for these models.