reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prediction via Shapley Value Regression

Authors: Amr Alkhatib, Roman Bresson, Henrik Boström, Michalis Vazirgiannis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate both the predictive performance of Via SHAP and the feature importance attribution with respect to the true Shapley values. This section begins with outlining the experimental setup. Then, the predictive performance of Via SHAP is evaluated. Afterwards, we benchmark the similarity between the feature importance obtained by Via SHAP and the ground truth Shapley values. We also evaluate the predictive performance and the accuracy of Shapley values on image data. Finally, we summarize the findings of the ablation study.
Researcher Affiliation	Academia	1 Orebro University, School of Science and Technology, Sweden 2KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science, Sweden 3 Ecole Polytechnique, IP Paris, France. Correspondence to: Amr Alkhatib <EMAIL>.
Pseudocode	Yes	Algorithm 1 VIASHAP Data: training data X, labels Y , scalar β Result: model parameters θ Initialize V : Via SHAP(ϕVia(x; θ)) while not converged do L 0 for each x X and y Y do sample S p(S) ˆy V(x) Lpred prediction loss(ˆy, y) Lϕ Vy(x S) Vy(0) 1 S ϕVia y (x; θ) 2 L + Lpred + β Lϕ end Compute gradients θL Update θ θ θL end
Open Source Code	Yes	The source code is available here: https://github. com/amrmalkhatib/Via SHAP
Open Datasets	Yes	We employ 25 publicly available datasets in the experiments, each divided into training, validation, and test subsets 1. ... 1The details of the datasets are available in Table 19 ... For image experiments, we use the CIFAR-10 dataset (Krizhevsky et al., 2014).
Dataset Splits	Yes	We employ 25 publicly available datasets in the experiments, each divided into training, validation, and test subsets 1. The training set is used to train the model, the validation set is used to detect overfitting and determine early stopping, and the test set is used to evaluate the model s performance. ... Table 19: The dataset information. # Features # Classes Dataset Size Train. Set Val. Set Test Set Open ML ID
Hardware Specification	Yes	The experiments were conducted using an NVIDIA Tesla V100f GPU and 16 cores of an Intel Xeon Gold 6338 processor.
Software Dependencies	No	The paper mentions various frameworks and models used (e.g., KANs, MLPs, ResNet50, U-Net) and references implementations like efficient-kan and fast-kan, but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The four implementations were trained with the β of (7) set to 10 and used 32 sampled coalitions per instance. The above hyperparameters were determined in a quasi-random manner. ... During data preprocessing, categorical feature categories are tokenized with numbers starting from one, reserving zero for missing values. We use standard normalization so the feature values are centered around 0. Via SHAP can be trained using the baseline removal approach or marginal expectations as a value function.