GOAL: A Generalist Combinatorial Optimization Agent Learner
Authors: Darko Drakulić, Sofia Michel, Jean-Marc Andreoli
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train GOAL on a set of routing, scheduling and classic graph problems and show that it is only slightly inferior to the specialized baselines while being the first multi-task model that solves a wide range of COPs. Finally we showcase the strong transfer learning capacity of GOAL by fine-tuning it on several new problems. Our code is available at this url. [...] 4 EXPERIMENTS Our experiments aim to address the following questions: (i) does the GOAL approach enable learning a single backbone to effectively solve a variety of CO problems (Sec 4.1)? (ii) can it be used as a pretrained model and efficiently fine-tuned to solve new CO problems, unseen at training (Sec 4.2)? (iii) to what extent do our mixed-attention blocks and multi-type architecture contribute to the results (Sec 4.3)? |
| Researcher Affiliation | Industry | Darko Drakuli c, Sofia Michel & Jean-Marc Andreoli NAVER LABS Europe, France EMAIL |
| Pseudocode | No | The paper describes the architecture and methods in prose, with mathematical equations for attention (e.g., equations 1a, 1b, and 2). However, it does not include a dedicated section or figure explicitly labeled "Pseudocode" or "Algorithm", nor are there any structured, code-like blocks describing a procedure or method. |
| Open Source Code | Yes | Our code is available at this url. [...] REPRODUCIBILITY STATEMENT For full reproducibility, the implementation of the backbone, the adapters and all the experiments described in this paper are publicly available at https://github.com/naver/goal-co. |
| Open Datasets | No | For each problem, we generate a training dataset of 1 million random instances and solve them using traditional solvers. [...] The description of the problems and the associated data generation processes is provided in Appendix F. [...] Data generation. Node locations are randomly sampled from the unit square, while node demands are randomly generated integers from the interval [1, 10]. |
| Dataset Splits | Yes | Training tasks. We train GOAL on eight classic and varied CO problems. For each problem, we generate a training dataset of 1 million random instances and solve them using traditional solvers. [...] The best model was selected based on the average performance on a validation set of 128 instances per problem. [...] In Table 1 we compare the performance of GOAL to the relevant neural baselines for each problem as well as its single-task version (i.e. GOAL trained separately for each task). First, we observe that the best performance is achieved by the single-task GOAL in 7 out of the 8 tasks, with a computation time comparable to that of the baselines. |
| Hardware Specification | Yes | We trained the model on 8 Nvidia V100 GPU servers, with batch size of 256 for 7 days, completing ca. 400 epochs. |
| Software Dependencies | No | The paper mentions using "Adam W optimizer" and references traditional solvers like "LKH (Helsgaun, 2017)", "HGS (Vidal, 2022)", "ORTools (Perron & Furnon, 2022)", "Fast WVC (Cai, 2023)", and "Hi GHS (Huangfu & Hall, 2018)" for generating expert solutions. However, it does not provide specific version numbers for the software libraries or frameworks used in the implementation of GOAL itself. |
| Experiment Setup | Yes | We use Adam W optimizer (Loshchilov & Hutter, 2018), with an initial learning rate of 0.0005 and a decay rate of 0.97 every 10th epoch. We trained the model on 8 Nvidia V100 GPU servers, with batch size of 256 for 7 days, completing ca. 400 epochs. The best model was selected based on the average performance on a validation set of 128 instances per problem. The model hyperparameters are described in Appendix A. [...] The GOAL backbone consists of L=9 layers with an embedding size of D= D=128 (the same for both nodes and edges), a feed-forward dimension of 512, and Re Zero normalization Bachlechner et al. (2021). The mixed-attention blocks have 8 heads. The task specific input adapters produce representations of low dimension ℓ=8 for nodes and ℓ=4 for edges, which are mapped to embeddings by a shared codebook of dimension ℓ D for nodes and ℓ D for edges, respectively. |