A Fast Variational Approach for Learning Markov Random Field Language Models
Authors: Yacine Jernite, Alexander Rush, David Sontag
ICML 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate the quality of the models learned by our algorithm by applying it to a language modelling task. Additionally we show that this same estimation algorithm can be effectively applied to other common sequence modelling tasks such as part-of-speech tagging. |
| Researcher Affiliation | Collaboration | Yacine Jernite EMAIL CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA Alexander M. Rush EMAIL Facebook AI Research, 770 Broadway, New York, NY 10003, USA David Sontag EMAIL CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA |
| Pseudocode | Yes | Algorithm 1 Tightening the bound; Algorithm 2 Gradient ascent |
| Open Source Code | No | The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This refers to a third-party framework used, not the authors' own source code for their method. |
| Open Datasets | Yes | For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test. |
| Dataset Splits | Yes | For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test. |
| Hardware Specification | No | The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This mention of 'the GPU' is too general and does not specify any model or hardware details. |
| Software Dependencies | No | The paper mentions 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/)'. However, it does not specify a version number for Torch or any other software dependencies. |
| Experiment Setup | Yes | For model parameter optimization (the gradient step in Algorithm 2) we use L-BFGS (Liu & Nocedal, 1989) with backtracking line-search. For tightening the bound (Algorithm 1), we used 200 sub-gradient iterations, each requiring a round of belief propagation. Our sub-gradient rate parameter α was set as α = 103/2t where t is the number of preceding iterations where the dual objective did not decrease. |