Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation

Authors: Federico Julian Camerota Verdù, Lorenzo Castelli, Luca Bortolussi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report experimental results on the search capabilities of LRBS and its effect on the generalization of pre-trained DRL agents to large TSP instances and two pickup and delivery variants. We use checkpoints of models from de O. da Costa et al. (2020), pre-trained on Euclidean TSP instances with 100 nodes, and from (Ma et al. 2022), pre-trained on PDTSP and PDTSPL instances with 100 nodes.
Researcher Affiliation Academia 1Dipartimento di Matematica, Informatica e Geoscienze, Universit a degli Studi di Trieste, Italy 2Dipartimento di Ingegneria e Architettura, Universit a degli Studi di Trieste, Italy EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Limited Rollout Beam Search
Open Source Code Yes Code https://github.com/federico-camerota/LRBS
Open Datasets No The TSP instances in our experiments are generated as in Kool, van Hoof, and Welling (2019) where the coordinates of nodes are sampled uniformly at random in the unit square. We consider problems with N = {100, 150, 200, 500, 1000} nodes. To ensure a fair comparison with the pre-trained policies, for N = 100 we use the same 10, 000 test instances of de O. da Costa et al. (2020). For the other problems, we generate datasets with 1, 000 random instances for N = {125, 200} and with 128 instances for N = 500, 1000. For PDTSP and PDTSPL experiments we generate sets of 128 random instances with 200 and 500 nodes.
Dataset Splits Yes For the other problems, we generate datasets with 1, 000 random instances for N = {125, 200} and with 128 instances for N = 500, 1000. For PDTSP and PDTSPL experiments we generate sets of 128 random instances with 200 and 500 nodes. In the second part of Table 2, we show results on the generalization of LRBS after fine-tuning the DRL policy on a small set of randomly generated instances with the same number of nodes as the test set (FT) and when adapting the policy parameters online (OA). The LRBS configurations are the same used for the non-adaptive experiments, with the only exception of the LRBS + FT on the TSP1000 where we use (β = 10, α = 6). For all the considered problems, the FT dataset of randomly generated instances is of size equal to 10% of the test set size and each instance is solved only once by the LRBS algorithm and after each policy rollout the new parameters ϕ are updated according to the gradient in Equation 1.
Hardware Specification Yes We run all our experiments using a single NVIDIA Ampere GPU with 64GB of HBM2 memory.
Software Dependencies No The paper does not explicitly mention any specific software dependencies with version numbers.
Experiment Setup Yes In all our experiments on TSP, for LRBS, we set a search budget such that α β = 60 and fix the other parameters to ns = 20 and Tmax = 5000, similarly to previous works and balancing solution performance and runtime. On PDTSP and PDTSPL we reduce the budget to 40 and when doing adaptation we use ns = 10 to lower memory consumption. The best values of α and β for each dataset were determined by testing the method on a set of 10 randomly generated instances of the same size as those in the test set.