Generalized additive models via direct optimization of regularized decision stump forests

Authors: Magzhan Gabidolla, Miguel Á. Carreira-Perpiñán

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments, detailed in section 5, validate the proposed methods on regression and classification benchmarks, demonstrating improved results over existing state-of-the-art methods. Additionally, we highlight the inherent interpretability of a GAM model through a case study on car price prediction.
Researcher Affiliation Academia 1Dept. of Computer Science and Engineering, University of California, Merced, USA. Correspondence to: Miguel Á. Carreira-Perpiñán <EMAIL>.
Pseudocode Yes We provide its pseudocode in fig. 3. Figure 3. Pseudocode for Optimized Regularized Stump Forests.
Open Source Code No The paper describes the implementation details in Appendix C, including the programming languages and third-party open-source packages used (CVXPY, MOSEK, scikit-learn, interpret, Py Gam). However, it does not explicitly state that the authors' own implementation code for the methodology described in the paper is being released, nor does it provide a direct link to a repository for their specific code.
Open Datasets Yes California Housing dataset, a standard regression benchmark. Obtained through the scikit-learn s fetch_california_housing function. Wine The task is to predict the wine quality... Obtained from the UCI ML repository (Lichman, 2013). Churn The binary classification task to predict which customer will churn. The features are various characteristics of the customer. Obtained from Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-churn. FICO Dataset from the Explainable Machine Learning Challenge... Obtained from the official website: https://community.fico.com/s/explainable-machine-learning-challenge. IJCNN The dataset comes from an IJCNN 2001 competition. We obtain it from the LIBSVM binary dataset collection: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.
Dataset Splits Yes For all algorithms, including ours, we tune the important hyperparameters on a holdout set, and with the best found hyperparameters perform 5 experiments on different train/test splits to report mean and standard deviation. The total number of training points is 15,200 with additional 3,800 instances used for testing.
Hardware Specification Yes Except for the Neural Additive Model (which is trained on a GPU), all experiments are performed on a CPU Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 128 GB RAM. Only for this baseline we train it on an NVIDIA TITAN V GPU.
Software Dependencies Yes We use CVXPY version 1.4.3, and MOSEK version 10.1. Gradient Boosting (GB): we use the scikit-learn s implementation: Gradient Boosting Regressor and Gradient Boosting Classifier (Pedregosa et al., 2011). The version of the scikit-learn is 1.4.2. Explainable Boosting Machine (EBM): we use the official implementation from the interpret Python library1. The version of the interpret package is 0.6.1. Splines: we use the Py Gam package2 in Python. The version of the Py Gam package is 0.9.1.
Experiment Setup Yes For all algorithms, including ours, we tune the important hyperparameters on a holdout set, and with the best found hyperparameters perform 5 experiments on different train/test splits to report mean and standard deviation. We perform grid search on the following hyperparameter values: number of stumps = {200, 400, 600, 800}, roughness penalty λ = {2.0, 4.0, 6.0} for classification datasets, λ = {20.0, 40.0, 60.0} for regression datasets. We do not tune the deviation from bias hyperparameter α, and use the fixed value α = 0.1. (For ORSF) Gradient Boosting (GB): we set the maximum depth to 1 and perform grid search on the learning rate {0.01, 0.05, 0.1, 0.3}. We set the number of boosting iterations (n_estimators) to a very high number (106), and use early stopping based on a validation set with n_iter_no_change equal to 100. Explainable Boosting Machine (EBM): learning_rate: {0.005, 0.01, 0.05}, max_bins: {512, 1024, 2048}, min_samples_leaf: {2, 4, 8}. We set the interactions parameter to 0 to use only univariate terms. max_rounds is set to 25 000 with early_stopping_rounds 50.