reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning

Authors: Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Zhongxia Yan, Cathy Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using these traffic scenarios, we benchmark popular multi-agent RL and human-like driving algorithms and demonstrate that the popular multi-agent RL algorithms struggle to generalize in CRL settings.
Researcher Affiliation	Academia	1MIT, EMAIL 2ETH Zurich, EMAIL
Pseudocode	No	The paper describes the methodology and components of Intersection Zoo through textual descriptions and mathematical equations, such as Equation 2 for optimization and Equation 3 for reward definition, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and documentation are available at https://github.com/mit-wu-lab/Intersection Zoo.
Open Datasets	Yes	Intersection Zoo is built on data-informed simulations of 16,334 signalized intersections derived from 10 major US cities, modeled in an open-source industry-grade microscopic traffic simulator. By modeling factors affecting vehicular exhaust emissions (e.g., temperature, road conditions, travel demand), Intersection Zoo provides one million data-driven traffic scenarios. We use Open Street Maps (OSM) (Haklay & Weber, 2008) data and follow guidelines provided by Qu et al. (2022). Intersection lane lengths, lane counts, turn lane configurations, and speed limits are extracted from OSM. Road grades are taken from US geological surveys (Survey). To model the vehicle arrival process, we use the Annual Average Daily Traffic data (AADT) (Huntsinger, 2022) released by the Departments of Transportation of each state/city. We source vehicle age, fuel type, and vehicle type distributions from the openly available MOVES databases (epa) and data from US National Centers for Environmental Information (for Environmental Information) is used for atmospheric condition modeling with temperature and humidity changes. with real-world arterial driving data from City Sim (Zheng et al., 2022).
Dataset Splits	Yes	By default, Intersection Zoo provide interfaces for train/test split evaluations to measure generalization, which is often used with zero-shot policy transfer (Harrison et al., 2019; Higgins et al., 2017; Kirk et al., 2021). This means we train policies on one subset of context MDPs and test on another subset of context MDPs. This includes both IID and OOD evaluation protocols. Hence, OOD evaluation can be performed by training in one city (train CMDP) and testing in another city (test CMDP). Similarly, IID testing can be performed by train/test split of context-MDPs within a given city.
Hardware Specification	Yes	Experiments were carried out in a computing cluster with 20 CPUs and an NVidia Volta V100 GPU with 32GB RAM.
Software Dependencies	No	The paper mentions using RLLib and SUMO as software dependencies, but it does not provide specific version numbers for these tools. For example, 'All experiments are carried out using RLLib (Liang et al., 2018) with the default hyperparameter configuration.' and 'All traffic scenarios are configured for use in the open-source agent-based traffic simulator SUMO (Lopez et al., 2018).'
Experiment Setup	Yes	All experiments are carried out using RLLib (Liang et al., 2018) with the default hyperparameter configuration. We leverage 10 multiple workers in training the multi-task learning policies. Each benchmarking run took roughly 24 hours in RLLib, with 5000 episodes (each with a horizon of 1000 steps with 50 warmups). For the reported results in Section 6, for each algorithm, we train with four random seeds. We train for 500 training iterations to ensure policies are well-converged.