reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests

Authors: Kristin Blesch, Niklas Koenen, Jan Kapar, Pegah Golchian, Lukas Burk, Markus Loecher, Marvin N. Wright

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of c ARFi on both simulated and real data. First, we demonstrate that c ARFi allows for valid inference procedures and achieves high power in testing for nonzero feature importance. Next, we compare the performance of c ARFi to that of CPI, its closest competitor in testing for conditional feature importance, and evaluate c ARFi-based relative feature importance, both by drawing on simulation studies from previous literature. Finally, we illustrate c ARFi s feature attributions to those of competing methods for a real data example.
Researcher Affiliation	Academia	1Leibniz Institute for Prevention Research & Epidemiology BIPS, Germany 2Faculty of Mathematics and Computer Science, University of Bremen, Germany 3Department of Business and Economics, Berlin School of Economics and Law, Germany 4Department of Public Health, University of Copenhagen, Denmark EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: c ARFi Input: (Xtrain, Y train), (Xtest, Y test), learner f, feature (set) of interest j, conditioning set C, ARF procedure a, loss function ℓ, number of replicates R 1: learn ˆf f(Xtrain, Y train) 2: fit ARF ˆa a(Xtrain) and estimate density ˆpˆa 3: sample R feature values for each test instance i: for each i [N], r [R]: Xtest,i(r) j ˆpˆa(xj\|Xtest,i C ) 4: define Xtest,i(r) := { Xtest,i(r) j , Xtest,i j } and calculate instance-wise loss difference w.r.t. j: r=1 ℓ( ˆf( Xtest,i(r), Y )) ℓ( ˆf(Xtest,i, Y )) 5: calculate \ c ARFij 1 N PN i=1 i j Output: \ c ARFij
Open Source Code	Yes	Code https://github.com/bips-hb/c ARFi paper
Open Datasets	Yes	Finally, we evaluate the behavior of c ARFi under different conditioning sets in a real-world setting using the widely used bike-sharing dataset (Fanaee-T and Gama 2013).
Dataset Splits	Yes	We train a random forest on two-thirds of the 8 645 instances and use the remaining as a holdout for the XAI method.
Hardware Specification	No	Experiments were run on the Beartooth Computing Environment (University of Wyoming Advanced Research Computing Center 2018).
Software Dependencies	No	An implementation of c ARFi in the R programming language and code for reproducing the results of this paper is available on the corresponding Git Hub repository as linked on the first page. The paper mentions the R programming language but does not specify a version or any libraries with version numbers.
Experiment Setup	Yes	In detail, for M = 10, 000 replicates, we generate N = 1, 000 instances of features X = X1, . . . , X10 from a multivariate Gaussian distribution N(0, Σ), where Σij = 0.5\|i j\|. Using effect sizes β = (0.0, 0.1, . . . , 0.9) and additive noise ϵ N(0, 1), we construct target variable Y according to two different settings: (1) linear setting: Y = βX+ϵ; (2) non-linear setting: Y = βX +ϵ, where x ij = 1 if Φ 1(0.25) xij Φ 1(0.75), else x ij = 1. We fit several prediction models ˆf to this data, including a (feedforward) neural network, support vector machine, random forest, and linear model. Subsequently, we use the mean squared error to assess ℓand thus obtain test statistics. For c ARFi, we use a minimum node size of 20, R = 5, and the root mean squared error (RMSE) as a loss function.