reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cross-domain Constituency Parsing by Leveraging Heterogeneous Data

Authors: Peiming Guo, Meishan Zhang, Yulong Chen, Jianling Li, Min Zhang, Yue Zhang

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to verify the effectiveness of the proposed model on a newsdomain constituency treebank PTB (Marcus, Santorini, & Marcinkiewicz, 1993) and a multi-domain constituency treebank MCTB (Yang, Cui, Ning, Wu, & Zhang, 2022) consisting of five domains: dialogue, forum, law, literature and review. Experimental results show that both domain knowledge transfer and task knowledge transfer are effective for cross-domain constituency parsing.
Researcher Affiliation	Academia	Institute of Computing and Intelligence, Harbin Institute of Technology (Shenzhen), Shenzhen, China; School of Engineering, Westlake University, Hangzhou, China; School of New Media and Communication, Tianjin University, Tianjin, China
Pseudocode	No	The paper describes the model architecture and methods using textual explanations and mathematical equations (e.g., in Sections 3.1, 3.2, and 3.3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/guopeiming/CD_Cons Paring_Heter Data.
Open Datasets	Yes	We use PTB (Marcus et al., 1993) and MCTB (Yang et al., 2022) as the source and target constituency parsing datasets, respectively. For domain knowledge transfer, we collect 5 domain raw corpora with sources matching the target treebank in MCTB for the language modeling task, including Wizard (Dinan et al., 2019), Reddit (V olske et al., 2017), ECt HR (Stiansen & Voeten, 2019), Gutenberg3, and Amazon (He & Mc Auley, 2016). For task knowledge transfer, we select Co NLL03 (Tjong Kim Sang & De Meulder, 2003) and restaurant (Liu et al., 2019b) for NER, ccgbank (Hockenmaier & Steedman, 2007) for CCG supertagging and EWT treebank in universal dependencies v2.2 (Nivre et al., 2020) for dependency parsing.
Dataset Splits	Yes	We sample 10,000 sentences with lengths ranging from 8 to 256 for the corpora of auxiliary tasks. If the number of filtered sentences is less than 10,000, we include the entire dataset. For each batch, we sample examples of constituency parsing and auxiliary tasks by the 1:3 proportion. Specific tasks, domains and number of sentences are listed in Table 1. Additionally, we obtain pseudo constituency parse trees for data processing of auxiliary tasks using the basic constituency parser. Specifically, we sample 10/20/50 examples from MCTB for the few-shot setting. To avoid sample bias, we sample three times to generate different few-shot training sets by different seeds and report the average results.
Hardware Specification	No	The paper mentions using "BERT-large-uncased as pretrained language model backbone" but does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "BERT-large-uncased" as a pretrained language model backbone and the "Adam W algorithm" for optimization, but it does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	Hyperparameters. We use BERT-large-uncased as pretrained language model backbone (Devlin et al., 2019). The lengths l and hidden sizes d of shared, task and domain prefix are 25 and 1024, respectively. Weight factor of auxiliary tasks α is 0.1 for multi-task learning. Following Kitaev and Klein (2018), we set partition transformer layers to 2 for all chat-based parsers. For model training, we use the Adam W algorithm with learning rate 3e-5, batch size 60, weight decay 0.01, linear learning rate warmup over the first 400 steps to optimize parameters. We stop early training when the F1 score does not increase on the PTB development set for 4 epochs.