Neural Deconstruction Search for Vehicle Routing Problems

Authors: André Hottung, Paula Wong-Chung, Kevin Tierney

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NDS on several challenging problems, including the CVRP, the vehicle routing problem with time windows (VRPTW), and the prize-collecting vehicle routing problem (PCVRP). NDS demonstrates substantial performance gains compared to existing learned construction methods and matches or surpasses state-of-the-art OR methods across various routing problems of different sizes. To the best of our knowledge, NDS is the first learning-based approach that achieves this milestone.
Researcher Affiliation Academia André Hottung EMAIL Bielefeld University, Germany Paula Wong-Chung EMAIL University of British Columbia, Canada Kevin Tierney EMAIL Bielefeld University, Germany
Pseudocode Yes Algorithm 1 NDS Training 1: procedure Train(Iterations per instance I, rollouts per solution K, improvement steps J) 2: Initialize policy network πθ 3: while Termination criteria not reached do 4: l Generate Instance() 5: s Generate Start Solution(l) 6: for j = 1, . . . , J do 7: s Improvement Step(s, πθ) Improve solution using procedure shown in Figure 2 8: end for 9: for i = 1, . . . , I do 10: {τ1, τ2, . . . , τK} Rollout Policy(πθ, l, s, K) Sample K rollouts 11: sk Remove Customers(s, τk) k {1, . . . , K} 12: s k Greedy Insertion( sk, τk) k {1, . . . , K} 13: rk max(Obj(s) Obj(s k), 0) k {1, . . . , K} Calculate reward 14: b 1 K PK k=1 rk Calculate baseline 15: k = arg maxk {1,...,K} rk 16: gi (rk b) θ log πθ(τk |l, s, vk ) Calculate gradients 17: s s k Update s with best found solution 18: end for 19: θ θ + αPI i=1 gi Optimizer step with accumulated gradients 20: end while 21: end procedure
Open Source Code Yes Our implementation of NDS is available at https://github.com/ahottung/NDS.
Open Datasets Yes CVRP We use the instance generator from Kool et al. (2019) to create scenarios with uniformly distributed customer locations, and the generator from Queiroga et al. (2022) for generating more realistic instances with clustered customer locations to better simulate real-world conditions. ... For the CVRP, we use the test instances from Kool et al. (2019) for N=100 (10,000 instances), Drakulic et al. (2023) for N=500 (128 instances), and Ye et al. (2024b) for N=1000 and N=2000 (100 instances each).
Dataset Splits Yes For the CVRP, we use the test instances from Kool et al. (2019) for N=100 (10,000 instances), Drakulic et al. (2023) for N=500 (128 instances), and Ye et al. (2024b) for N=1000 and N=2000 (100 instances each). For the VRPTW and PCVRP, we generate new test sets consisting of 10,000 instances for N=100 and 250 instances for settings with more than 100 customers. To allow for a fair comparison, we ensure that all reinforcement learning based approaches are trained on instances from the same distribution that is also used for sampling the test instances.
Hardware Specification Yes All experiments are conducted on a research cluster utilizing a single Nvidia A100 GPU per run.
Software Dependencies Yes Additionally, we include Py VRP (Wouda et al., 2024) (version 0.9.0), which is an open-source extension of HGS for other VRP variants.
Experiment Setup Yes NDS Training For each problem and problem size, we perform a separate training run. Training consists of 2000 epochs for settings with 1000 or fewer customers. For the 2000 customer setting, we resume training from the 1000 customer model checkpoint at 1500 epochs and train for an additional 500 epochs. In each epoch, we process 1500 instances, with each instance undergoing 100 iterations, 128 rollouts, and 10 initial improvement steps. The learning rate is set to 10 4 and 15 customers are selected per deconstruction step across all problem sizes. ... NDS Test Configuration For NDS, the starting temperature λstart is set to 0.1 and decays exponentially to 0.001 throughout the search. The threshold factor δ is fixed at 15. During the improvement step, 200 rollouts are performed per instance, and each deconstructed solution is reconstructed 5 times (1 based on the selected order of the DNN and 4 using a random customer order). The number of augmentations is set to 8 for the CVRP and VRPTW, and 128 for the PCVRP.