reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Computational Modelling of Quantifier Use: Corpus, Models, and Evaluation

Authors: Guanyi Chen, Kees van Deemter

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To this end, we conduct a series of elicitation experiments in which human speakers were asked to perform a linguistic task that invites the use of quantiﬁed expressions. The experiments result in a corpus, called qtuna, made up of short texts that contain a large variety of quantiﬁed expressions. We analyse qtuna, summarise our ﬁndings, and explain how we design computational models of human quantiﬁer use accordingly. Finally, we evaluate these models in accordance with qtuna. Section 4 offers evaluations of its output, based on both expert judgements and scene reconstructions.
Researcher Affiliation	Academia	Guanyi Chen EMAIL Kees van Deemter EMAIL Department of Information and Computing Sciences Utrecht University Utrecht, The Netherlands
Pseudocode	Yes	Algorithm 1 The Greedy Algorithm for Generating Quantiﬁed Descriptions. Algorithm 2 The Incremental Algorithm for Generating Quantiﬁed Descriptions.
Open Source Code	Yes	The code for our quantiﬁed description generation algorithms is available at: https://github.com/a-quei/quantified-description-generation.
Open Datasets	Yes	The qtuna dataset and the corresponding materials are available at: https://github.com/a-quei/ qtuna.
Dataset Splits	Yes	For experiment A, we randomly selected 3 or 4 scenes from each of the 3 sub-corpora of qtuna to construct a set of, in total, 10 scenes... For experiment B, we focused on three new domain sizes namely N = 6, N = 10, and N = 16. For each of these, we sampled 6 scenes...
Hardware Specification	No	The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper discusses various computational and linguistic concepts and refers to general system components like a 'surface realiser', but it does not specify any particular software libraries, frameworks, or their version numbers (e.g., Python 3.x, PyTorch 1.x) that were used in implementation.
Experiment Setup	Yes	Generation terminates when all distractors are removed from S or the length of the generated description D reaches an upper bound δ. ... we introduced a probability θ with which the qdg-ia performs a one-oﬀre-ordering move; for the work reported in this paper, we set θ to 0.1.