reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized additive models via direct optimization of regularized decision stump forests

Authors: Magzhan Gabidolla, Miguel Á. Carreira-Perpiñán

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, detailed in section 5, validate the proposed methods on regression and classiﬁcation benchmarks, demonstrating improved results over existing state-of-the-art methods. Additionally, we highlight the inherent interpretability of a GAM model through a case study on car price prediction.
Researcher Affiliation	Academia	1Dept. of Computer Science and Engineering, University of California, Merced, USA. Correspondence to: Miguel Á. Carreira-Perpiñán <EMAIL>.
Pseudocode	Yes	We provide its pseudocode in ﬁg. 3. Figure 3. Pseudocode for Optimized Regularized Stump Forests.
Open Source Code	No	The paper describes the implementation details in Appendix C, including the programming languages and third-party open-source packages used (CVXPY, MOSEK, scikit-learn, interpret, Py Gam). However, it does not explicitly state that the authors' own implementation code for the methodology described in the paper is being released, nor does it provide a direct link to a repository for their specific code.
Open Datasets	Yes	California Housing dataset, a standard regression benchmark. Obtained through the scikit-learn s fetch_california_housing function. Wine The task is to predict the wine quality... Obtained from the UCI ML repository (Lichman, 2013). Churn The binary classiﬁcation task to predict which customer will churn. The features are various characteristics of the customer. Obtained from Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-churn. FICO Dataset from the Explainable Machine Learning Challenge... Obtained from the ofﬁcial website: https://community.fico.com/s/explainable-machine-learning-challenge. IJCNN The dataset comes from an IJCNN 2001 competition. We obtain it from the LIBSVM binary dataset collection: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.
Dataset Splits	Yes	For all algorithms, including ours, we tune the important hyperparameters on a holdout set, and with the best found hyperparameters perform 5 experiments on different train/test splits to report mean and standard deviation. The total number of training points is 15,200 with additional 3,800 instances used for testing.
Hardware Specification	Yes	Except for the Neural Additive Model (which is trained on a GPU), all experiments are performed on a CPU Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 128 GB RAM. Only for this baseline we train it on an NVIDIA TITAN V GPU.
Software Dependencies	Yes	We use CVXPY version 1.4.3, and MOSEK version 10.1. Gradient Boosting (GB): we use the scikit-learn s implementation: Gradient Boosting Regressor and Gradient Boosting Classiﬁer (Pedregosa et al., 2011). The version of the scikit-learn is 1.4.2. Explainable Boosting Machine (EBM): we use the ofﬁcial implementation from the interpret Python library1. The version of the interpret package is 0.6.1. Splines: we use the Py Gam package2 in Python. The version of the Py Gam package is 0.9.1.
Experiment Setup	Yes	For all algorithms, including ours, we tune the important hyperparameters on a holdout set, and with the best found hyperparameters perform 5 experiments on different train/test splits to report mean and standard deviation. We perform grid search on the following hyperparameter values: number of stumps = {200, 400, 600, 800}, roughness penalty λ = {2.0, 4.0, 6.0} for classiﬁcation datasets, λ = {20.0, 40.0, 60.0} for regression datasets. We do not tune the deviation from bias hyperparameter α, and use the ﬁxed value α = 0.1. (For ORSF) Gradient Boosting (GB): we set the maximum depth to 1 and perform grid search on the learning rate {0.01, 0.05, 0.1, 0.3}. We set the number of boosting iterations (n_estimators) to a very high number (106), and use early stopping based on a validation set with n_iter_no_change equal to 100. Explainable Boosting Machine (EBM): learning_rate: {0.005, 0.01, 0.05}, max_bins: {512, 1024, 2048}, min_samples_leaf: {2, 4, 8}. We set the interactions parameter to 0 to use only univariate terms. max_rounds is set to 25 000 with early_stopping_rounds 50.