reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Most General Explanations of Tree Ensembles

Authors: Yacine Izza, Akexey Ignatiev, Sasha Rubin, Joao Marques-Silva, Peter J. Stuckey

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents a summary of empirical assessment of computing maximum inﬂated abductive explanations for tree ensembles the case study of RFmv and BT trained on some of the widely studied datasets. Our evaluation aims to investigate the following research questions: RQ1: Are hypercubes representing Max-i AXp explanations much larger than i AXp s on real world benchmarks? RQ2: Are the proposed logical encodings scale for practical RFs and BTs? RQ3: Does our algorithm converge quickly to deliver the optimal explanation?
Researcher Affiliation	Academia	Yacine Izza1 , Alexey Ignatiev2 , Sasha Rubin3 , Joao Marques-Silva4 , Peter J. Stuckey2,5 1CREATE, National University of Singapore, Singapore 2Monash University, Melbourne, Australia 3University of Sydney, Australia 4ICREA, University of Lleida, Spain 5OPTIMA ARC Industrial Training and Transformation Centre, Melbourne, Australia EMAIL, EMAIL, EMAIL, EMAIL, EMAIL. All listed affiliations are academic institutions or research centers.
Pseudocode	Yes	Algorithm 1 Computing maximum inﬂated AXp for TE Input: Expl. prob. E = (M, (v, c)); WCNF H, S Output: One Max-i AXp (X, E) 1: repeat 2: (µ, s) Max SAT(H, S) 3: E {S l j u Ii j \| (yi l,u S).µ(yi l,u) = 1} 4: X {i F \| \|Ei\| < \|Di\|} 5: has CEx Wi AXp(E; X, E) 6: if has CEx = true then 7: (Y, G) Findi CXp(E; F, E) 8: H H new Block Cl(Y, G) 9: until has CEx = false 10: return (X, E)
Open Source Code	Yes	The proposed approach is implemented in RFxpl1 and XReason2 Python packages. 1https://github.com/izzayacine/RFxpl 2https://github.com/alexeyignatiev/xreason
Open Datasets	Yes	The assessment of TEs (RFs and BTs) is performed on a selection of publicly available datasets, which originate from UCI ML Repository [UCI, 2020] and Penn ML Benchmarks [Olson et al., 2017] in total 12 datasets.
Dataset Splits	No	For each dataset, we randomly pick 25 instances to test. This describes the selection of instances for explanation generation but does not provide specific training/validation/test splits for the models themselves.
Hardware Specification	Yes	The experiments are conducted on Intel Core i5-10500 3.1GHz CPU with 16GByte RAM running Ubuntu 22.04 LTS.
Software Dependencies	Yes	The Py SAT toolkit [Ignatiev et al., 2018; Ignatiev et al., 2024] is used to instrument SAT or/and Max SAT oracle calls. RC2 [Ignatiev et al., 2019a] Max SAT solver, which is implemented in Py SAT, is applied to all Max SAT encodings (for the TE operation and hitting set dualization). Moreover, Gurobi [Gurobi Optimization, LLC, 2023] is applied to instrument MIP oracle calls using its Python interface.
Experiment Setup	No	The paper states: "When training RFs, we used Scikit-learn [Pedregosa and et al., 2011] and for BTs we applied XGBoost [Chen and Guestrin, 2016]." However, it does not specify any hyperparameters (e.g., number of trees, max depth, learning rate) or other detailed training configurations used for these models.