reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

Authors: Guangyi Liu, Suzan Iloglu, Michael Caldara, Joseph W Durham, Michael M. Zavlanos

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80% in the simulation scenario.
Researcher Affiliation	Collaboration	1Amazon Robotics, North Reading, MA, USA 2Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA.
Pseudocode	Yes	Algorithm 1 CB-based Worst-Case Reward Estimator Algorithm 2 DRMARL with CB-based Worst-Case Reward Estimator
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code, nor does it include a link to a code repository for the methodology described.
Open Datasets	No	Due to common industry confidentiality practices, we cannot disclose the specific data source and report only relative performance improvements. The data represents realistic package flow patterns typical of Amazon robotic sortation facilities.
Dataset Splits	No	We train the DRMARL policy over 300 episodes using training data generated from 9 distinct induction distribution groups. Similarly, the regular MARL policies are trained for 300 episodes, each on one of the same groups. For testing, we evaluate both policies on newly generated induction data from 21 distinct distribution groups, conducting five experiments per group. The paper mentions training and testing on different 'induction distribution groups' but does not provide specific percentages or counts for traditional train/validation/test splits of a single dataset for reproducibility of data partitioning.
Hardware Specification	Yes	approximately 924 hours on a cloud instance with 64 v CPUs (Intel Xeon Scalable 4th generation) and 128 GB RAM.
Software Dependencies	No	The paper cites commercial solvers like Google ORTools (Perron & Furnon, 2024) and Xpress (FICO, 2023) but does not provide specific version numbers for these or any other ancillary software libraries or environments used for implementation.
Experiment Setup	Yes	In the simplified simulation environment, ... One training or testing episode consists of 5 hours, with each time step being 30 minutes long... We train the DRMARL policy over 300 episodes... In the large-scale simulation environment, ... One training/testing episode lasts 11 hours, with each time step lasting five minutes... Algorithm 1 specifies 'Learning rate l CB, initial parameters ψ0, induction distribution groups G, MARL policy with QMARL, exploration rate εCB'. Algorithm 2 specifies 'Learning rate lr, initial parameters θ0, induction distribution groups G, pre-trained CB-based estimator QCB, exploration rate ε'.