reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LTLf Synthesis Under Unreliable Input

Authors: Christian Hagemeier, Giuseppe de Giacomo, Moshe Y. Vardi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	8 Empirical Evaluation We implemented the algorithms to empirically evaluate their performance. While the direct technique is optimal in the worst case, our empirical results show that other techniques perform comparably or even better. [...] Graphs of the runtime on different instances of the examples can be found in Figures 1 and 2.
Researcher Affiliation	Academia	1University of Oxford, Oxford, UK 2Rice University, Houston, Texas, USA EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes algorithms and techniques (e.g., Direct Automata, Belief-States, QLTLf Synthesis, MSO Encoding) but does not present them in structured pseudocode or algorithm blocks. Descriptions are purely textual.
Open Source Code	Yes	Code http://github.com/whitemech/ltlf-synth-unrelinput-aaai2025
Open Datasets	No	We used instances of the examples of varying sizes: hiker (trail length 10 to 50, increments of 5), sheep (even input sizes up to n = 10), and trap graphs (multiple sizes). For sheep, we restricted our attention to even input sizes as otherwise, trivially synthesis under unreliable input is impossible since standard synthesis already is. The paper describes how these examples were parameterized for benchmarking, but it does not provide access information (links, DOIs, or citations) to specific publicly available datasets.
Dataset Splits	No	The paper describes experiments on 'instances of the examples' (hiker, sheep, trap) but does not mention any training, validation, or test dataset splits, as these are problem instances rather than datasets in the typical machine learning sense.
Hardware Specification	Yes	Experiments were conducted on a high-performance compute cluster running Red Hat Enterprise Linux 8.10. Tests used an Intel Xeon Platinum 8268 processor (2.9 GHz) with 256 GB RAM per test, executed on a single CPU core.
Software Dependencies	No	We built the techniques on top of the LTLf-synthesis tool Syft, reusing existing implementations for belief-state and direct automata construction (Tabajara and Vardi 2020) with some adjustments. [...] the implementation of all the techniques makes use of the tool MONA (Henriksen et al. 1995). The paper mentions the software tools Syft and MONA but does not provide specific version numbers for them.
Experiment Setup	Yes	Experimental Setup. Experiments were conducted on a high-performance compute cluster running Red Hat Enterprise Linux 8.10. Tests used an Intel Xeon Platinum 8268 processor (2.9 GHz) with 256 GB RAM per test, executed on a single CPU core. Experimental Results. [...] We used an average of 2 runs to produce the results. [...] Striped bars reaching the red timeout line indicate tests that ran out of memory or exceeded 30 minutes.