Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation
Authors: Federico Julian Camerota Verdù, Lorenzo Castelli, Luca Bortolussi
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report experimental results on the search capabilities of LRBS and its effect on the generalization of pre-trained DRL agents to large TSP instances and two pickup and delivery variants. We use checkpoints of models from de O. da Costa et al. (2020), pre-trained on Euclidean TSP instances with 100 nodes, and from (Ma et al. 2022), pre-trained on PDTSP and PDTSPL instances with 100 nodes. |
| Researcher Affiliation | Academia | 1Dipartimento di Matematica, Informatica e Geoscienze, Universit a degli Studi di Trieste, Italy 2Dipartimento di Ingegneria e Architettura, Universit a degli Studi di Trieste, Italy EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Limited Rollout Beam Search |
| Open Source Code | Yes | Code https://github.com/federico-camerota/LRBS |
| Open Datasets | No | The TSP instances in our experiments are generated as in Kool, van Hoof, and Welling (2019) where the coordinates of nodes are sampled uniformly at random in the unit square. We consider problems with N = {100, 150, 200, 500, 1000} nodes. To ensure a fair comparison with the pre-trained policies, for N = 100 we use the same 10, 000 test instances of de O. da Costa et al. (2020). For the other problems, we generate datasets with 1, 000 random instances for N = {125, 200} and with 128 instances for N = 500, 1000. For PDTSP and PDTSPL experiments we generate sets of 128 random instances with 200 and 500 nodes. |
| Dataset Splits | Yes | For the other problems, we generate datasets with 1, 000 random instances for N = {125, 200} and with 128 instances for N = 500, 1000. For PDTSP and PDTSPL experiments we generate sets of 128 random instances with 200 and 500 nodes. In the second part of Table 2, we show results on the generalization of LRBS after fine-tuning the DRL policy on a small set of randomly generated instances with the same number of nodes as the test set (FT) and when adapting the policy parameters online (OA). The LRBS configurations are the same used for the non-adaptive experiments, with the only exception of the LRBS + FT on the TSP1000 where we use (β = 10, α = 6). For all the considered problems, the FT dataset of randomly generated instances is of size equal to 10% of the test set size and each instance is solved only once by the LRBS algorithm and after each policy rollout the new parameters ϕ are updated according to the gradient in Equation 1. |
| Hardware Specification | Yes | We run all our experiments using a single NVIDIA Ampere GPU with 64GB of HBM2 memory. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers. |
| Experiment Setup | Yes | In all our experiments on TSP, for LRBS, we set a search budget such that α β = 60 and fix the other parameters to ns = 20 and Tmax = 5000, similarly to previous works and balancing solution performance and runtime. On PDTSP and PDTSPL we reduce the budget to 40 and when doing adaptation we use ns = 10 to lower memory consumption. The best values of α and β for each dataset were determined by testing the method on a set of 10 randomly generated instances of the same size as those in the test set. |