Efficient Training of Multi-task Neural Solver for Combinatorial Optimization
Authors: Chenguang Wang, Zhang-Hua Fu, Pinyan Lu, Tianshu Yu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted for 12 tasks: Four types of COPs, the Travelling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), the Orienteering Problem (OP), and the Knapsack Problem (KP), and each of them with three problem scales. We compare our approach with single-task training (STL), extensive MTL baselines (Mao et al., 2021; Yu et al., 2020; Navon et al., 2022; Kendall et al., 2018; Liu et al., 2021a;b) and the SOTA task grouping method, TAG (Fifty et al., 2021) under the cases of the same training budgets and same training epochs. |
| Researcher Affiliation | Academia | Chenguang Wang EMAIL School of Data Science, The Chinese University of Hong Kong, Shenzhen Shenzhen Research Institute of Big Data Zhang-Hua Fu EMAIL School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial Intelligence and Robotics for Society Pinyan Lu EMAIL Shanghai University of Finance and Economics Tianshu Yu EMAIL School of Data Science, The Chinese University of Hong Kong, Shenzhen |
| Pseudocode | No | The paper describes algorithmic concepts and a pipeline (Figure 1) but does not contain a formally labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions. |
| Open Source Code | Yes | Our code is open-sourced and available at https://github.com/LOGO-CUHKSZ/MTL-COP. |
| Open Datasets | Yes | On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches. Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach. |
| Dataset Splits | No | The paper states that experiments are averaged over "10,000 instances for each task" and describes training for "1000 epochs," but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for these instances. |
| Hardware Specification | Yes | The model is trained using 8 Nvidia Tesla A100 GPUs in parallel, and the evaluations are done on a single NVIDIA GeForce RTX 3090. |
| Software Dependencies | No | The paper mentions using Adam (Kingma & Ba, 2015) and an open-source bandit algorithm repository (Besson, 2018), but it does not provide specific version numbers for these or other major software dependencies like Python or deep learning frameworks, which are necessary for reproducible setup. |
| Experiment Setup | Yes | In each epoch, we process a total of 100 * 1000 instances with a batch size of 512. The POMO size is equal to the problem scale, except for KP-200, where it is 100. We optimize the model using Adam (Kingma & Ba, 2015) with a learning rate of 1e-4 and weight decay of 1e-6. The training of the model involves 1000 epochs in the standard setting. The learning rate is decreased by 1e-1 at the 900th epoch. During the first epoch, we use the bandit algorithm to explore at the beginning of the training process. We then collect gradient information by updating the bandit algorithm with every 12 batches of data. |