reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy

Authors: Yong Liang Goh, Zhiguang Cao, Yining Ma, Jianan Zhou, Mohammed Haroon Dupty, Wee Sun Lee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results demonstrate the superiority of our approach over existing methods on 9 real-world maps with 16 VRP variants each.
Researcher Affiliation	Collaboration	1School of Computing, National University of Singapore, Singapore 2Institute of Data Science, National University of Singapore, Singapore 3Grabtaxi Holdings Pte Ltd, Singapore 4Grab-NUS AI Lab, Singapore 5School of Computing and Information Systems, Singapore Management University, Singapore 6Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge MA, United States 7College of Computing and Data Science, Nanyang Technological University, Singapore.
Pseudocode	Yes	The overall process can be viewed in Algorithm 1 in Appendix D.
Open Source Code	No	The paper does not contain an explicit statement or a direct link to the source code for the methodology described.
Open Datasets	Yes	We utilize nine country maps1: USA13509, JA9847, BMM33708, KZ9976, SW24978, EG7146, FI10639, GR9882. Dataset details are in Appendix E. 1https://www.math.uwaterloo.ca/tsp/world/countries.html
Dataset Splits	Yes	(3) in-distribution refers to the three distributions that the models observe during training: USA13509, JA9847, BM33708; (4) out-distribution refers to the six distributions that the models do not observe during training: KZ9976, SW24978, VM22775, EG7146, FI10639, GR9882. We sample 1,000 test examples per problem for each country map and solve them using traditional solvers.
Hardware Specification	Yes	All experiments run on a single A100-80Gb GPU.
Software Dependencies	No	The paper mentions using "HGS (Vidal, 2022)" and "Google's OR-tools routing solver (Furnon & Perron)". While these tools are identified and cited, specific version numbers for the software themselves (e.g., HGS v1.0 or OR-tools v9.x) are not explicitly provided in the text.
Experiment Setup	Yes	We use the ADAM optimizer to train all neural solvers from scratch on 20,000 instances per epoch for 1,000 epochs. All models plateau at this epoch, and the relative rankings do not change with further training. At each training epoch, we sample a country from the in-distribution set, followed by a subset of points from the distribution and a problem from the in-task set, as shown in Figure 1. For SHIELD, we use 3 Mo D layers in the decoder and only allow 10% of tokens per layer. The number of clusters is set to Nc = 5, with B = 5 iterations of soft clustering. The encoder consists of 6 Mo E layers. We provide full details of the hyperparameters in Appendix I.