reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Automated Strategy Invention for Confluence of Term Rewrite Systems

Authors: Liao Zhang, Fabian Mitterwallner, Jan Jakubuv, Cezary Kaliszyk

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we focus on confluence of term rewrite systems, and apply AI techniques to invent strategies for automatic confluence proving. Moreover, we randomly generate a large dataset to analyze confluence for term rewrite systems. We improve the state-of-the-art automatic confluence prover CSI: When equipped with our invented strategies, it surpasses its human-designed strategies both on the augmented dataset and on the original human-created benchmark dataset ARI-COPS, proving/disproving the confluence of several term rewrite systems for which no automated proofs were known before.
Researcher Affiliation	Academia	1University of Innsbruck 2Shanghai Jiao Tong University 3Czech Technical University in Prague 4University of Melbourne EMAIL, EMAIL, EMAIL. All listed institutions are universities, indicating academic affiliations for all authors. The email domains also include an academic domain (.ac.at) for one author, while the others use generic Gmail, but are associated with the academic institutions.
Pseudocode	Yes	Algorithm 1 Grackle Loop: an outline of the strategy portfolio invention loop. Algorithm 2 Term Generation
Open Source Code	No	The paper does not contain any explicit statement from the authors about releasing their own code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	We also augment the human-built confluence problems database (ARI-COPS)1, a representative benchmark for the annual confluence competition (Co Co)2. 1https://ari-cops.uibk.ac.at/ 2https://project-coco.uibk.ac.at/
Dataset Splits	Yes	For training purposes, we arbitrarily select 283 examples from ARI-COPS and 800 examples from the randomly generated dataset. To build the test dataset, we exclude the examples in the training dataset, subsequently randomly selecting 800 examples from the randomly generated dataset and the remaining 283 examples from ARI-COPS.
Hardware Specification	Yes	Here, a CPU denotes a core of the AMD EPYC 7513 32-core processor.
Software Dependencies	No	The paper mentions several tools used, such as CSI, Grackle, Param ILS, and SMAC3, but does not provide specific version numbers for any of them. For example, 'CSI is one of the state-of-the-art automatic confluence provers that participates in Co Co.' and 'Grackle [H ula and Jakub uv, 2022] is a strategy optimization system...'
Experiment Setup	Yes	The Grackle time limit for proving a TRS is 30 seconds, employed both in the evaluation and the strategy specialization phases. During the specialization phase, Grackle launches Param ILS for parameter tuning. The overall time limit for one strategy specialization phase is 45 minutes. The total execution time of Grackle is two days. Grackle performs parallel execution in both the evaluation and specialization phases; thus, we also limit the number of CPUs it can use. For each dataset, we perform two Grackle runs, configuring the numbers of available CPUs for a single strategy run to be either one or four. When it is set to one and four, the total number of available CPUs for Grackle is set to 52 and 66, respectively.