Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity

Authors: Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, Lyle Ungar

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we derive the basic algorithm, and the theory underlying the algorithm. In a companion paper (Cohen et al., 2013), we describe experiments using the algorithm to learn an L-PCFG for natural language parsing. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors can be estimated directly from training examples where hidden-variable values are missing; the third gives a PAC-style convergence bound for the estimation method.
Researcher Affiliation Collaboration Shay B. Cohen EMAIL School of Informatics University of Edinburgh Edinburgh, EH8 9LE, UK; Karl Stratos EMAIL Michael Collins EMAIL Department of Computer Science Columbia University New York, NY 10027, USA; Dean P. Foster EMAIL Yahoo! Labs New York, NY 10018, USA; Lyle Ungar EMAIL Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode Yes Figure 7 shows an algorithm that derives estimates of the quantities in Eqs. 14, 15, and 16. As input, the algorithm takes a sequence of tuples prpi,1q, tpi,1q, tpi,2q, tpi,3q, opiq, bpiqq for i P t1 . . . Mu... Figure 8: Singular value decompositions.
Open Source Code No The text does not contain any explicit statements about open-source code being provided for the methodology described in this paper, nor does it provide a link to a code repository. It mentions companion experimental work but not code release for this specific paper's algorithm.
Open Datasets Yes Unfortunately, simple vanilla PCFGs induced from treebanks such as the Penn treebank (Marcus et al., 1993) typically give very poor parsing performance.
Dataset Splits No The text does not provide specific details about training, validation, or test dataset splits, such as percentages, sample counts, or explicit splitting methodologies.
Hardware Specification No The text mentions using 'the Extreme Science and Engineering Discovery Environment (XSEDE)' in the acknowledgements, but does not provide specific hardware details such as GPU/CPU models or processor types used for experiments in this paper.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate experiments.
Experiment Setup No The paper is theoretical, deriving algorithms and proofs, and therefore does not include specific experimental setup details, hyperparameter values, or training configurations for its own evaluation.