reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable Algorithms for Inference in Topic Models

Authors: Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models.
Researcher Affiliation	Academia	Sanjeev Arora EMAIL Department of Computer Science, Princeton University Rong Ge EMAIL Computer Science Department, Duke Unversity Frederic Koehler EMAIL Department of Mathematics, Princeton University Tengyu Ma EMAIL Department of Computer Science, Princeton University Ankur Moitra EMAIL Department of Mathematics and CSAIL, Massachusetts Institute of Technology
Pseudocode	Yes	Algorithm 1 Thresholded Linear Inverse Algorithm (TLI)
Open Source Code	Yes	Code to reproduce the results is available at: https:// github.com/frytvm/topic-inference
Open Datasets	No	The paper uses 'New York Times articles', 'Enron emails', and 'NIPS papers' but does not provide explicit access information (link, DOI, repository) or a specific citation for the datasets themselves.
Dataset Splits	No	The paper describes how synthetic data was generated and evaluated, and mentions using 'a subsample of real documents', but it does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset.
Hardware Specification	No	The paper mentions 'Solving LP (3) on 16 processors' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions using the 'Mosek LP solver' and 'MALLET (Mc Callum, 2002)' but does not specify version numbers for these software dependencies.
Experiment Setup	Yes	For each document, we sample r = 5 topics uniformly at random, and choose weights for these topics uniformly from the r-dimensional probability simplex.