Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity
Authors: Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, Lyle Ungar
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we derive the basic algorithm, and the theory underlying the algorithm. In a companion paper (Cohen et al., 2013), we describe experiments using the algorithm to learn an L-PCFG for natural language parsing. Our result rests on three theorems: the ο¬rst gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors can be estimated directly from training examples where hidden-variable values are missing; the third gives a PAC-style convergence bound for the estimation method. |
| Researcher Affiliation | Collaboration | Shay B. Cohen EMAIL School of Informatics University of Edinburgh Edinburgh, EH8 9LE, UK; Karl Stratos EMAIL Michael Collins EMAIL Department of Computer Science Columbia University New York, NY 10027, USA; Dean P. Foster EMAIL Yahoo! Labs New York, NY 10018, USA; Lyle Ungar EMAIL Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA |
| Pseudocode | Yes | Figure 7 shows an algorithm that derives estimates of the quantities in Eqs. 14, 15, and 16. As input, the algorithm takes a sequence of tuples prpi,1q, tpi,1q, tpi,2q, tpi,3q, opiq, bpiqq for i P t1 . . . Mu... Figure 8: Singular value decompositions. |
| Open Source Code | No | The text does not contain any explicit statements about open-source code being provided for the methodology described in this paper, nor does it provide a link to a code repository. It mentions companion experimental work but not code release for this specific paper's algorithm. |
| Open Datasets | Yes | Unfortunately, simple vanilla PCFGs induced from treebanks such as the Penn treebank (Marcus et al., 1993) typically give very poor parsing performance. |
| Dataset Splits | No | The text does not provide specific details about training, validation, or test dataset splits, such as percentages, sample counts, or explicit splitting methodologies. |
| Hardware Specification | No | The text mentions using 'the Extreme Science and Engineering Discovery Environment (XSEDE)' in the acknowledgements, but does not provide specific hardware details such as GPU/CPU models or processor types used for experiments in this paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical, deriving algorithms and proofs, and therefore does not include specific experimental setup details, hyperparameter values, or training configurations for its own evaluation. |