Provable Algorithms for Inference in Topic Models
Authors: Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra
ICML 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. |
| Researcher Affiliation | Academia | Sanjeev Arora EMAIL Department of Computer Science, Princeton University Rong Ge EMAIL Computer Science Department, Duke Unversity Frederic Koehler EMAIL Department of Mathematics, Princeton University Tengyu Ma EMAIL Department of Computer Science, Princeton University Ankur Moitra EMAIL Department of Mathematics and CSAIL, Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Thresholded Linear Inverse Algorithm (TLI) |
| Open Source Code | Yes | Code to reproduce the results is available at: https:// github.com/frytvm/topic-inference |
| Open Datasets | No | The paper uses 'New York Times articles', 'Enron emails', and 'NIPS papers' but does not provide explicit access information (link, DOI, repository) or a specific citation for the datasets themselves. |
| Dataset Splits | No | The paper describes how synthetic data was generated and evaluated, and mentions using 'a subsample of real documents', but it does not specify explicit train/validation/test splits, percentages, or sample counts for any dataset. |
| Hardware Specification | No | The paper mentions 'Solving LP (3) on 16 processors' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using the 'Mosek LP solver' and 'MALLET (Mc Callum, 2002)' but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | For each document, we sample r = 5 topics uniformly at random, and choose weights for these topics uniformly from the r-dimensional probability simplex. |