reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Why is constrained neural language generation particularly challenging?

Authors: Cristina Garbacea, Qiaozhu Mei

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an extensive survey on the emerging topic of constrained neural language generation in which we formally define and categorize the problems of natural language generation by distinguishing between conditions and constraints (the latter being testable conditions on the output text instead of the input), present constrained text generation tasks, and review existing methods and evaluation metrics for constrained text generation. Our aim is to highlight recent progress and trends in this emerging field, informing on the most promising directions and limitations towards advancing the state-of-the-art of constrained neural language generation research.
Researcher Affiliation	Academia	Cristina Gârbacea EMAIL Data Science Institute University of Chicago Qiaozhu Mei EMAIL School of Information, Department of EECS University of Michigan
Pseudocode	No	The paper describes various methods and approaches to constrained text generation in narrative form, without presenting any structured pseudocode or algorithm blocks. The descriptions of algorithms like 'beam search' or 'dynamic beam allocation' are textual rather than formatted pseudocode.
Open Source Code	No	This paper is a survey and review of existing methods in constrained neural language generation. It does not propose a new methodology that would require accompanying open-source code for its implementation. Therefore, no statement or link regarding source code for the paper's own methodology is present.
Open Datasets	Yes	Common Gen (Lin et al., 2020) benchmark proposes the task of constrained text generation with generative commonsense reasoning... Truthful QA (Lin et al., 2022) benchmark is proposed for measuring the factual accuracy and truthfulness of QA systems.
Dataset Splits	No	This paper is a survey and does not conduct original experiments using a specific dataset. Therefore, it does not provide explicit training/test/validation dataset splits for its own work. While it discusses how other research might use splits, it does not specify splits for a new dataset introduced or experimented on within this paper.
Hardware Specification	No	This paper is a survey and review of existing literature, not an experimental paper that details the execution of computational experiments. As such, it does not provide any specific hardware specifications (e.g., GPU, CPU models, or cloud resources) used for conducting experiments.
Software Dependencies	No	This paper is a survey and does not describe a novel methodology implemented by the authors. Consequently, it does not list specific software dependencies with version numbers that would be required to reproduce any new experimental results.
Experiment Setup	No	This paper provides a comprehensive review of constrained neural language generation. It does not present original experimental work, and therefore does not include specific details about an experimental setup, such as hyperparameters or training configurations, for its own studies.