On Integrating Logical Analysis of Data into Random Forests

Authors: David Ing, Said Jabbour, Lakhdar Saïs

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/).
Researcher Affiliation Academia David Ing , Said Jabbour , Lakhdar Sa ıs CRIL, CNRS Universit e d Artois, France EMAIL
Pseudocode Yes Algorithm 1: Classical Random Forest, Algorithm 2: Random Forest Based LAD (RF-LAD)
Open Source Code No To derive such explanations from those RFs, we utilized the recent Random Forest explanation tool (RFxpl: https://github.com/izzayacine/RFxpl) proposed by Izza and Marques-Silva [Izza and Marques-Silva, 2021]. This describes a third-party tool used by the authors, not the release of their own code for RF-LAD.
Open Datasets Yes The assessment is performed on 23 datasets, which are standard benchmarks originating from well-known repositories such as CP4IM (https: //dtai-static.cs.kuleuven.be/CP4IM/datasets/), Kaggle (www. kaggle.com), Open ML (www.openml.org), and UCI (archive. ics.uci.edu/ml/).
Dataset Splits Yes For every benchmark, a Repeated Stratified 10-fold cross-validation with 3 repetitions have been achieved to maintain the class distribution (i.e. to address imbalanced datasets).
Hardware Specification Yes Experiments are conducted on a computer equipped with Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz with 62Gib of memory.
Software Dependencies No In this section, we conduct comparative experiments between our Random Forest using MSSes of LAD (RF-LAD) and the state-of-the-art Random Forest (RF) (from the scikit-learn Python library [Pedregosa et al., 2011]). To enumerate the MSSes, we use the multithreaded implementations provided by the algorithm p MMCS [Murakami and Uno, 2014], and set the number of threads to 20 to ensure diversification in terms of the generated MSSes. Specific version numbers for these software components are not provided.
Experiment Setup Yes The parameters of RFs are kept at their default values (i.e. in a forest, K = 100 DTs), except for the maximum depth, where we set different depths d {3, 4, 5}. To control the complexity, we modified p MMCS by limiting the number of MSSes to 100K (i.e. NS = 100K) for all the considered datasets. For a fair comparison, we randomly select 100 different MSSes (without redundancy) from the 100K generated MSSes, resulting in 100 different DTs to build our RF-LAD.