reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Refinery: An Open Source Topic Modeling Web Platform

Authors: Daeil Kim, Benjamin F. Swanson, Michael C. Hughes, Erik B. Sudderth

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Users can interactively organize articles by topic and also reﬁne this organization with phrase-level analysis. The results of an analysis on 500 New York Times articles that contained the keyword obama during the year 2013.
Researcher Affiliation	Academia	Daeil Kim EMAIL Benjamin F. Swanson EMAIL Michael C. Hughes EMAIL Erik B. Sudderth EMAIL Department of Computer Science, Brown University, Providence, RI 02192, USA
Pseudocode	No	The paper describes the functionality and architecture of the Reﬁnery platform in descriptive text, without presenting formal pseudocode or algorithm blocks for its underlying methods.
Open Source Code	Yes	The project website http://daeilkim.github.io/refinery/ contains Python code and further documentation. > git clone https :// github.com/daeilkim/refinery.git
Open Datasets	No	The paper refers to an 'analysis on 500 New York Times articles' but does not provide any specific link, DOI, repository, or formal citation with author/year to access this particular dataset or any other dataset used.
Dataset Splits	No	The paper mentions an 'analysis on 500 New York Times articles' but does not provide any details on how this or any other dataset was split into training, validation, or test sets.
Hardware Specification	No	The paper mentions running Reﬁnery in a 'Unix-like command line' environment with Virtualbox and Vagrant, but it does not specify any particular CPU or GPU models, memory, or other detailed hardware specifications used for running experiments.
Software Dependencies	No	The paper states: 'To make installation simple, it has only three dependencies: the Git version-control system, Virtualbox (Oracle, 2013), and Vagrant (Hashimoto, 2013).' This text provides software names but not specific version numbers. Other mentioned tools like BNPy and Splitta also lack version numbers.
Experiment Setup	No	The paper describes the Reﬁnery platform's features and the general approach to topic modeling (HDP), mentioning that users specify 'an upper bound on the number of inferred topics.' However, it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, epochs) or system-level training settings.