reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Authors: Junjie Chen, Xiangheng He, Yusuke Miyao, Danushka Bollegala

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Sem Info correlates more strongly with parsing accuracy than LL, establishing Sem Info as a better unsupervised parsing objective. As a result, our algorithm significantly improves parsing accuracy by an average of 7.85 sentence-F1 scores across five PCFG variants and in four languages, achieving state-of-the-art level results in three of the four languages.
Researcher Affiliation	Academia	Department of Computer Science, the University of Tokyo1 GLAM Group on Language, Audio, & Music, Imperial College London2 Department of Computer Science, the University of Liverpool3 EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Tree CRF Sampler 1: function CRF-Sampler(i, j, x) 2: if j = i + 1 then 3: Return leaf node (i, j) 4: else 5: Sample split index k πCRF (k \| (i, j)) following Equation 14 Johnson et al. (2007) 6: Tleft CRF-Sampler(i, k, x) 7: Tright CRF-Sampler(k, j, x) 8: Return node (i, j) with children Tleft and Tright 9: end if 10: end function
Open Source Code	Yes	We release the source code at https://github.com/junjiechen-chris/Improving-Unsupervised-Constituency-Parsing-via-Maximizing-Semantic-Information.git.
Open Datasets	Yes	We conduct the evaluations in three datasets and four languages, namely Penn Tree Bank (PTB) (Marcus et al., 1999) for English, Chinese Treebank 5.1 (CTB) (Palmer et al., 2005) for Chinese, and SPMRL (Seddah et al., 2013) for German and French.
Dataset Splits	Yes	We adopt the standard data split for the PTB dataset (Sections 02-21 for training, Section 22 for validation, and Section 23 for testing) (Kim et al., 2019a). We adopt the official data split for the CTB and SPMRL datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using specific models like "gpt-4o-mini-2024-07-18 model" and tools like "snowball stemmer (Bird & Loper, 2004)", and that its "implementation is based on the source code of Yang et al. (2021b) and Liu et al. (2023)". However, it does not provide specific version numbers for general software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or operating systems.
Experiment Setup	Yes	We use 60 NTs for NPCFG and CPCFG, and 1024 NTs for TNPCFG, SNPCFG, and SCPCFG in our experiment. We include the maximum entropy regularization (Ziebart et al., 2008) and the traditional LL term log Z(x) in the training. The posterior optimization is similar to the method explained in the main text: (1) sampling tree from either P CRF (t\|x) or P P CF G(t\|x); and (2) perform policy gradient optimization in accordance with Equation 12.