[Re] Classwise-Shapley values for data valuation
Authors: Markus Semmler, Miguel de Benito Delgado
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CS-Shapley, a data valuation method introduced in Schoch et al. (2022) for classification problems. We repeat the experiments in the paper, including two additional methods, the Least Core (Yan & Procaccia, 2021) and Data Banzhaf (Wang & Jia, 2023), a comparison not found in the literature. We include more conservative error estimates and additional metrics, like rank stability, and a variance-corrected version of Weighted Accuracy Drop, originally introduced in Schoch et al. (2022). We conclude that while CSShapley helps in the scenarios it was originally tested in, in particular for the detection of corrupted labels, it is outperformed by the conceptually simpler Data Banzhaf in the task of detecting highly influential points, except for highly imbalanced multi-class problems. |
| Researcher Affiliation | Industry | Markus Semmler EMAIL applied AI Initiative Gmb H Miguel de Benito Delgado EMAIL applied AI Institute g Gmb H |
| Pseudocode | No | The paper describes valuation methods and their equations (e.g., Equation 1, 2, 3, 4, 5, 6) and general procedures in narrative text, but does not include any distinct, structured pseudocode blocks or sections labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | Code for all our experiments is available in (Semmler, 2024), including both setups and instructions on running them. URL https://github.com/aai-institute/re-classwise-shapley. |
| Open Datasets | Yes | Datasets are from openml (Vanschoren et al., 2013). All but Covertype and MNIST-multi are for binary classification. ... Table 1: Datasets used. ... Diabetes tabular ... Click tabular ... Covertype tabular ... CPU tabular ... Phoneme tabular ... FMNIST image ... CIFAR10 image ... MNIST-binary image ... MNIST-multi image |
| Dataset Splits | Yes | Stratified sampling was used for the splits to maintain label distribution. ... Table 1: Datasets used. ... Training Valuation Test Diabetes 128 128 512 ... Click 500 500 2000 ... Covertype 500 500 2000 ... CPU 500 500 2000 ... Phoneme 500 500 2000 ... FMNIST 500 500 2000 ... CIFAR10 500 500 2000 ... MNIST-binary 500 500 2000 ... MNIST-multi 500 500 2000 |
| Hardware Specification | No | We ran all experiments with the method implementations available in the open source library Py DVL v0.9.1 (Transfer Lab, 2022), on several high-cpu VMs of a cloud vendor. The text mentions 'high-cpu VMs of a cloud vendor' but does not specify exact CPU models, GPU models, or cloud instance types. |
| Software Dependencies | Yes | We ran all experiments with the method implementations available in the open source library Py DVL v0.9.1 (Transfer Lab, 2022)... Models used to compute values and changes made to the default parameters in scikit-learn 1.2.2. |
| Experiment Setup | Yes | Parameters for all methods were taken as suggested in Schoch et al. (2022) or the corresponding papers. ... Table 2: Methods evaluated. Convergence criteria as provided by py DVL (Transfer Lab, 2022). ... TMCS ... ε = 10 4 ... Beta Shap ... α = 16, β = 1 ... Data Banzhaf ... K = 5000 samples ... Least Core ... K = 5000 constraints. ... Table 3: Models used to compute values and changes made to the default parameters in scikit-learn 1.2.2. Logistic regression solver= liblinear Gradient Boosting classifier n_estimators=40, min_samples_leaf=6, maxdepth=2 K-Nearest Neighbours n_neighbors=5, weights= uniform SVM kernel= rbf |