Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making
Authors: Stelios Triantafyllou, Aleksa Sukovic, Yasaman Zolfimoselo, Goran Radanovic
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator. We experimentally validate the interpretability of our approach using two multi-agent environments: a grid-world environment, where two RL actors are instructed by an LLM planner to complete a sequence of tasks, and the sepsis management simulator from Fig. 1. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems, Germany. Correspondence to: Stelios Triantafyllou <EMAIL>. |
| Pseudocode | Yes | Appendix G includes an algorithm for the approximation of the expected conditional variance of YI,ai,t. Algorithm 1 Estimates E[Var( YI,ai,t|τ, U<Sk)|τ]M |
| Open Source Code | Yes | Code to reproduce our experiments is available at https://github.com/stelios30/cf-effect-decomposition.git. |
| Open Datasets | No | We experimentally validate the interpretability of our approach using two multi-agent environments: a grid-world environment, where two RL actors are instructed by an LLM planner to complete a sequence of tasks, and the sepsis management simulator from Fig. 1. Our experimental setup and implementation closely follow that of (Triantafyllou et al., 2024). |
| Dataset Splits | No | Throughout both experiments, we use 100 posterior samples for estimating counterfactual effects and 20 additional ones for the conditional variance. We generate 600 trajectories with unsuccessful outcomes. |
| Hardware Specification | Yes | All experiments were run on a 64bit Debian-based machine having 2x12 CPU cores clocked at 3GHz with access to 1 TB of DDR3 1600MHz RAM and an NVIDIA A40 GPU. |
| Software Dependencies | Yes | The software stack relied on Python 3.9.13, with installed standard scientific packages for numeric calculations and visualization (we provide a full list of dependencies and their exact versions as part of our code). |
| Experiment Setup | Yes | We provide a full list of hyperparameters in Table 3. Table 3: Hyperparameters used for the Gridworld actors policies. Parameter name: Discount, Parameter value: 0.99. Parameter name: Target Update Freq., Parameter value: 1000. Parameter name: Batch size, Parameter value: 512. Parameter name: Learning Rate, Parameter value: 1e-4. |