reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decomposing Global Feature Effects Based on Feature Interactions

Authors: Julia Herbinger, Marvin N. Wright, Thomas Nagler, Bernd Bischl, Giuseppe Casalicchio

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the theoretical characteristics of the proposed methods based on various feature eﬀect methods in diﬀerent experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.
Researcher Affiliation	Academia	1 Department of Statistics, LMU Munich, Munich, Germany 2 Munich Center for Machine Learning (MCML), Munich, Germany 3 Leibniz Institute for Prevention Research and Epidemiology, Bremen, Germany 4 Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany 5 Department of Public Health, University of Copenhagen, Copenhagen, Denmark
Pseudocode	Yes	Algorithm 1: Partitioning algorithm of GADGET; Algorithm 2: PINT
Open Source Code	Yes	All proposed methods and reproducible scripts for the experiments are available online via https://github.com/Julia Herbinger/gadget/.
Open Datasets	Yes	bikesharing data set (James et al., 2022); COMPAS data set... collected by Pro Publica (Larson et al., 2016); spam data set (Hopkins et al., 1999)
Dataset Splits	Yes	The R2 measured on a separately drawn test set of size 10000 following the same distribution is 0.94. (Section 4.3) ...measured by a 5-fold cross-validation, indicating good performance (Hopkins et al., 1999). (Section 9)
Hardware Specification	No	The paper describes various models (feed-forward neural network, GAM, XGBoost, random forest, SVM) and their configurations, but it does not specify the hardware (e.g., CPU, GPU models) used for training or evaluation.
Software Dependencies	No	The paper mentions 'R package version 1.3-2' for ISLR2 within a dataset citation, but this refers to a tool associated with a dataset, not the specific software dependencies with version numbers used for implementing the authors' methodology (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We then draw 500 observations from these random variables and ﬁt a feed-forward neural network (NN) with a single hidden layer of size 10 and weight decay of 0.001. (Section 4.3) As stopping criteria, we choose a maximum tree depth of 6, a minimum number of observations per leaf of 40, and set the improvement parameter γ to 0.2. (Section 6.1)