reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Integrating Logical Analysis of Data into Random Forests

Authors: David Ing, Said Jabbour, Lakhdar Saïs

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/).
Researcher Affiliation	Academia	David Ing , Said Jabbour , Lakhdar Sa ıs CRIL, CNRS Universit e d Artois, France EMAIL
Pseudocode	Yes	Algorithm 1: Classical Random Forest, Algorithm 2: Random Forest Based LAD (RF-LAD)
Open Source Code	No	To derive such explanations from those RFs, we utilized the recent Random Forest explanation tool (RFxpl: https://github.com/izzayacine/RFxpl) proposed by Izza and Marques-Silva [Izza and Marques-Silva, 2021]. This describes a third-party tool used by the authors, not the release of their own code for RF-LAD.
Open Datasets	Yes	The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/).
Dataset Splits	Yes	For every benchmark, a Repeated Stratified 10-fold cross-validation with 3 repetitions have been achieved to maintain the class distribution (i.e. to address imbalanced datasets).
Hardware Specification	Yes	Experiments are conducted on a computer equipped with Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz with 62Gib of memory.
Software Dependencies	No	In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). To enumerate the MSSes, we use the multithreaded implementations provided by the algorithm p MMCS [Murakami and Uno, 2014], and set the number of threads to 20 to ensure diversification in terms of the generated MSSes. Specific version numbers for these software components are not provided.
Experiment Setup	Yes	The parameters of RFs are kept at their default values (i.e. in a forest, K = 100 DTs), except for the maximum depth, where we set different depths d {3, 4, 5}. To control the complexity, we modified p MMCS by limiting the number of MSSes to 100K (i.e. NS = 100K) for all the considered datasets. For a fair comparison, we randomly select 100 different MSSes (without redundancy) from the 100K generated MSSes, resulting in 100 different DTs to build our RF-LAD.