reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generating Counterfactual Explanations Under Temporal Constraints

Authors: Andrei Buliga, Chiara Di Francescomarino, Chiara Ghidini, Marco Montali, Massimiliano Ronzani

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evaluation shows that the generated counterfactuals are temporally meaningful and more interpretable for applications involving temporal dependencies. An empirical evaluation on real-world and synthetic datasets demonstrates the effectiveness of our approach.
Researcher Affiliation	Academia	1Fondazione Bruno Kessler, Via Sommarive, 18, POVO 38123, Trento, Italy 2Free University of Bozen-Bolzano, via Bruno Buozzi, 1 39100, Bozen-Bolzano, Italy 3University of Trento, Via Sommarive, 9, 38123 Trento, Italy
Pseudocode	Yes	Algorithm 1: Temporal Knowledge-aware Crossover operation Algorithm 2: Compute Safe Activities Algorithm 3: Temporal Knowledge-aware Mutation operator
Open Source Code	Yes	Code https://github.com/abuliga/AAAI2025-temporalconstrained-counterfactuals
Open Datasets	Yes	Experiments are conducted using three datasets commonly used in Process Mining, with details reported in Table 1: Claim Management (Rizzi, Di Francescomarino, and Maggi 2020) is a synthetic dataset pertaining to a claim management process, where accepted claims are labelled as true and rejected claims as false; BPIC2012 (van Dongen 2012) and BPIC2017 (van Dongen 2012) two real-life datasets about a loan application process, where traces with accepted loan offers are labelled as true, and declined offers as false.
Dataset Splits	Yes	For each dataset, LTLp formula φ, and prefix length, we split the data into 70% 10% 20% into training, validation, and testing, using a chronological order split.
Hardware Specification	Yes	Experiments were run on a M1 with 16GB RAM.
Software Dependencies	No	The paper mentions training an XGBoost model but does not specify the version of XGBoost or any other key software libraries used.
Experiment Setup	Yes	Regarding the coefficients in Eq. (7), after testing multiple configurations, the final configuration was set to α = 0.5, β = 0.5, γ = 0.5, δ = 0.5 to give all objectives the same weight. For the GA setting, we initialise the population through a hybrid approach: selecting close points from the reference population or, if unavailable, by randomly generating traces. We set the number of generations to 100, pc = 0.5, pmut = 0.2. In population selection, the top 50% of the population, w.r.t. the fitness function, moves to the next generation. Termination occurs at the max generation number or if no significant performance improvement occurs.