Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

Authors: Guangyi Liu, Suzan Iloglu, Michael Caldara, Joseph W Durham, Michael M. Zavlanos

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80% in the simulation scenario.
Researcher Affiliation Collaboration 1Amazon Robotics, North Reading, MA, USA 2Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
Pseudocode Yes Algorithm 1 CB-based Worst-Case Reward Estimator Algorithm 2 DRMARL with CB-based Worst-Case Reward Estimator
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository for the methodology described.
Open Datasets No Due to common industry confidentiality practices, we cannot disclose the specific data source and report only relative performance improvements. The data represents realistic package flow patterns typical of Amazon robotic sortation facilities.
Dataset Splits No We train the DRMARL policy over 300 episodes using training data generated from 9 distinct induction distribution groups. Similarly, the regular MARL policies are trained for 300 episodes, each on one of the same groups. For testing, we evaluate both policies on newly generated induction data from 21 distinct distribution groups, conducting five experiments per group. The paper mentions training and testing on different 'induction distribution groups' but does not provide specific percentages or counts for traditional train/validation/test splits of a single dataset for reproducibility of data partitioning.
Hardware Specification Yes approximately 924 hours on a cloud instance with 64 v CPUs (Intel Xeon Scalable 4th generation) and 128 GB RAM.
Software Dependencies No The paper cites commercial solvers like Google ORTools (Perron & Furnon, 2024) and Xpress (FICO, 2023) but does not provide specific version numbers for these or any other ancillary software libraries or environments used for implementation.
Experiment Setup Yes In the simplified simulation environment, ... One training or testing episode consists of 5 hours, with each time step being 30 minutes long... We train the DRMARL policy over 300 episodes... In the large-scale simulation environment, ... One training/testing episode lasts 11 hours, with each time step lasting five minutes... Algorithm 1 specifies 'Learning rate l CB, initial parameters ψ0, induction distribution groups G, MARL policy with QMARL, exploration rate εCB'. Algorithm 2 specifies 'Learning rate lr, initial parameters θ0, induction distribution groups G, pre-trained CB-based estimator QCB, exploration rate ε'.