Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making

Authors: Stelios Triantafyllou, Aleksa Sukovic, Yasaman Zolfimoselo, Goran Radanovic

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator. We experimentally validate the interpretability of our approach using two multi-agent environments: a grid-world environment, where two RL actors are instructed by an LLM planner to complete a sequence of tasks, and the sepsis management simulator from Fig. 1.
Researcher Affiliation Academia 1Max Planck Institute for Software Systems, Germany. Correspondence to: Stelios Triantafyllou <EMAIL>.
Pseudocode Yes Appendix G includes an algorithm for the approximation of the expected conditional variance of YI,ai,t. Algorithm 1 Estimates E[Var( YI,ai,t|τ, U<Sk)|τ]M
Open Source Code Yes Code to reproduce our experiments is available at https://github.com/stelios30/cf-effect-decomposition.git.
Open Datasets No We experimentally validate the interpretability of our approach using two multi-agent environments: a grid-world environment, where two RL actors are instructed by an LLM planner to complete a sequence of tasks, and the sepsis management simulator from Fig. 1. Our experimental setup and implementation closely follow that of (Triantafyllou et al., 2024).
Dataset Splits No Throughout both experiments, we use 100 posterior samples for estimating counterfactual effects and 20 additional ones for the conditional variance. We generate 600 trajectories with unsuccessful outcomes.
Hardware Specification Yes All experiments were run on a 64bit Debian-based machine having 2x12 CPU cores clocked at 3GHz with access to 1 TB of DDR3 1600MHz RAM and an NVIDIA A40 GPU.
Software Dependencies Yes The software stack relied on Python 3.9.13, with installed standard scientific packages for numeric calculations and visualization (we provide a full list of dependencies and their exact versions as part of our code).
Experiment Setup Yes We provide a full list of hyperparameters in Table 3. Table 3: Hyperparameters used for the Gridworld actors policies. Parameter name: Discount, Parameter value: 0.99. Parameter name: Target Update Freq., Parameter value: 1000. Parameter name: Batch size, Parameter value: 512. Parameter name: Learning Rate, Parameter value: 1e-4.