Causal Explanations for Sequential Decision Making

Authors: Samer B. Nashed, Saaduddin Mahmud, Claudia V. Goldman, Shlomo Zilberstein

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we performed a series of experiments to evaluate the practicality and effectiveness of the proposed system, focusing on real-world computational demands and the validity and reliability of metrics for comparing approximate and exact causal methods. Finally, we present two user studies that reveal user preferences for certain types of explanations and demonstrate a strong preference for explanations generated by our framework compared to those from other state-of-the-art systems.
Researcher Affiliation Academia Samer B. Nashed , University of Massachusetts Amherst, USA SAADUDDIN MAHMUD, University of Massachusetts Amherst, USA CLAUDIA V. GOLDMAN, Hebrew University, Israel SHLOMO ZILBERSTEIN, University of Massachusetts Amherst, USA
Pseudocode Yes Algorithm 1 Determine Weak Causes ... Algorithm 8 Mean RESP
Open Source Code No The paper does not provide an explicit statement or link to the authors' own source code for the methodology described.
Open Datasets Yes In our experiments, 60 states are sampled from the Lunar Lander10 MDP, from Open AI Gym [16]... in four environments: Lunar Lander, Taxi11, Black Jack12, and a version of Highway Env13 (highway-fast-v0; Kinematic Observation).
Dataset Splits No The paper describes sampling states for evaluation within environments ('60 states were sampled with replacement from each domain'), rather than providing explicit training/test/validation splits for a machine learning dataset.
Hardware Specification Yes All of our experiments were conducted on a Dell XPS 13 9310 Laptop with an 11th Gen Intel(R) Core(TM) i7-1185G7 3.00GHz processor and 16GB 4267MHz LPDDR4x RAM.
Software Dependencies No The paper mentions using 'stable baseline 3' for deep Q-learning, but does not provide a specific version number for this or any other key software library.
Experiment Setup No The paper mentions that policies were learned via 'deep Q-learning' or 'value iteration' and used a 'multi-layer perceptron', but it does not specify concrete hyperparameters such as learning rate, batch size, or number of epochs for these training processes.