reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compressing tree ensembles through Level-wise Optimization and Pruning

Authors: Laurens Devos, Timo Martens, Deniz Can Oruc, Wannes Meert, Hendrik Blockeel, Jesse Davis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We empirically evaluate LOP and aim to answer the following questions: (Q1) Given a learned binary classification forest, what is the effect of compression on model size and performance? (Q2) How does LOP s compression affect energy consumption, memory footprint, and verifiability of models? (Q3) How sensitive is LOP to its hyperparameters and R?
Researcher Affiliation	Academia	1KU Leuven Department of Computer Science, Leuven, Belgium 2Leuven.AI, KU Leuven Institute for Artificial Intelligence, Leuven, Belgium. Correspondence to: Jesse Davis <EMAIL>, Hendrik Blockeel <EMAIL>.
Pseudocode	Yes	Algorithm 1 shows the pseudocode for LOP.
Open Source Code	Yes	LOP4 uses logistic regression and optimizes the negative log-likelihood with L1 regularization. 4https://github.com/ML-KULeuven/lop_compress
Open Datasets	Yes	We consider 14 binary classification benchmark datasets.3 available on Open ML (Vanschoren et al., 2013): Compas, Vehicle, Spambase, Phoneme, Adult, Ijcnn1, Mnist (2 vs. 4), Dry Bean (6 vs. rest), Volkert (2 vs. 7), Credit, California, Mini Boo NE, Electricity, and Jannis.
Dataset Splits	Yes	We use 5-fold cross-validation with 3 folds for training (both training the ensemble and compressing it), 1 for validation, and 1 for testing.
Hardware Specification	Yes	All experiments are run on an Intel(R) Core(TM) i7-12700 with 64GB of memory.
Software Dependencies	No	For GR and LOP, we use scikit-learn (Pedregosa et al., 2011).
Experiment Setup	Yes	In each fold, we train models on all combinations of the following hyperparameters: XGBoost Random Forest M [10, 25, 50, 100] [50, 100, 250] D [4, 6, 8] [10, 15] η [0.1, 0.25, 0.5, 1.0] not applicable with M the number of trees, D the maximum depth of the trees and η the learning rate in XGBoost. This yields 48 XGBoost models and 6 Random Forest models, to which we then apply the different compression algorithms. The validation set is used to tune the regularization hyperparameter for LOP, GR, LRL1 and FP. More specifically, its optimal value is the one that leads to the smallest model and is within a maximum drop = 0.5% on the validation set s balanced accuracy. The same is done to find the optimal number of trees in IC. Additionally, we set R = 2 for LOP.