reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

No $D_{train}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning

Authors: Xiangyu Sun, Raquel Aoki, Kevin H. Wilson

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the performance of NTD-CFE against four baselines on several datasets and find that, despite not having access to a training dataset, NTD-CFE finds CFEs that make significantly fewer and significantly smaller changes to the input time-series. These properties make CFEs more actionable, as the magnitude of change required to alter an outcome is vastly reduced. The code is available in the supplementary material. ... In this section, we provide qualitative examples and quantitative experiment results to demonstrate the effectiveness of NTD-CFE for multivariate data-series data.
Researcher Affiliation	Industry	Xiangyu Sun EMAIL RBC Borealis Raquel Aoki EMAIL RBC Borealis Kevin H. Wilson EMAIL RBC Borealis
Pseudocode	Yes	Algorithm 1 NTD-CFE. Best viewed in color. Typical RL code is colored in gray.
Open Source Code	Yes	The code is available in the supplementary material.
Open Datasets	Yes	Nine real-world multivariate time-series datasets are used for evaluation (Appendix A for details). ... Life Expectancy 1https://www.kaggle.com/datasets/vrec99/life-expectancy-2000-2015 ... NATOPS 2http://www.timeseriesclassification.com/description.php?Dataset=NATOPS ... PEMS-SF 3https://www.timeseriesclassification.com/description.php?Dataset=PEMS-SF ... Heartbeat 4http://www.timeseriesclassification.com/description.php?Dataset=Heartbeat ... e Ring5https://www.timeseriesclassification.com/description.php?Dataset=ERing ... Racket Sports 6https://www.timeseriesclassification.com/description.php?Dataset=Racket Sports ... Basic Motions 7https://www.timeseriesclassification.com/description.php?Dataset=Basic Motions ... Japanese Vowels 8https://www.timeseriesclassification.com/description.php?Dataset=Japanese Vowels ... Libras 9https://www.timeseriesclassification.com/description.php?Dataset=Libras
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation split percentages or sample counts for the datasets used in their experiments. While it discusses 'invalid samples' which are testing samples, and mentions baselines requiring training datasets, it does not detail how the data was partitioned for training and evaluating the predictive models.
Hardware Specification	No	All the experiments are conducted on CPU and with 32GB of RAM. No specific CPU models or other detailed hardware specifications are provided.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2014) is used as the optimizer' and 'LSTM' as a model type, but it does not specify version numbers for any programming languages, libraries, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We use a unique set of hyperparameter values for NTD-CFE throughout the paper, unless otherwise stated, without fine-tuning them: proximity weight λpxmt = 0.001 maximum number of interventions per episode MT = 100 maximum number of episodes ME = 100 discount factor γ = 0.99 learning rate α = 0.0001 regularization weight λWD = 0.0 The RL policy network contains two hidden linear layers with 1000 and 100 neurons, respectively. Adam (Kingma & Ba, 2014) is used as the optimizer.