Preference Optimization for Combinatorial Optimization Problems
Authors: Mingjun Pan, Guanquan Lin, You-Wei Luo, Bin Zhu, Zhien Dai, Lijun Sun, Chun Yuan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on various benchmarks, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP) and the Flexible Flow Shop Problem (FFSP), demonstrate that our method significantly outperforms existing RL algorithms, achieving superior convergence efficiency and solution quality. |
| Researcher Affiliation | Collaboration | 1China Mobile 2Peking University 3Tsinghua University 4Jiaying University 5Sun Yat-Sen University 6Central South University |
| Pseudocode | Yes | Algorithm 1 Preference Optimization for COPs. |
| Open Source Code | No | The paper does not provide a direct link to a code repository, an explicit statement of code release, or specify that the code is included in supplementary materials for the methodology described. |
| Open Datasets | Yes | The fundamental COPs including TSP, CVRP and FFSP, serve as our testbed. In routing problems, the reward r(x, τ) is defined as the Euclidean length of the trajectory τ. ... To evaluate generalization capabilities, we conduct zero-shot testing on problems with diverse distributions as specified in (Bi et al., 2022) and on standard benchmark datasets: TSPLib (Reinelt, 1991) and CVRPLib (Uchoa et al., 2017). |
| Dataset Splits | No | The paper mentions testing on "10k instances" and "1k instances" for different problem types, and discusses "zero-shot testing on problems with diverse distributions." However, it does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, exact counts, or specific citation to a predefined split) for its own experimental setup, only that instances are sampled or evaluated on benchmark sets. |
| Hardware Specification | Yes | Most experiments were conducted on a server with NVIDIA 24GB-RTX 4090 GPUs and an Intel Xeon Gold 6133 CPU. ... We implement PO upon these baselines and train them from scratch for 100k steps on a single 80GB-A800 GPU. |
| Software Dependencies | No | The paper provides a Python code snippet demonstrating the core logic and imports `torch.nn.functional`, indicating the use of Python and PyTorch. However, it does not specify version numbers for Python, PyTorch, or any other software libraries or tools used in the experiments. |
| Experiment Setup | Yes | The hyperparameter configurations for these solvers primarily follow their original implementations, with detailed specifications provided in Appendix E.3. ... In our experimental setup, we set the tanh clip to 50.0 for VRPs, which has been shown to facilitate the training process (Jin et al., 2023). The following table presents the parameter settings for the four training frameworks: POMO (Kwon et al., 2020), Pointerformer (Jin et al., 2023), AM (Kool et al., 2019), and Sym-NCO (Kim et al., 2023). ... Table 4: Hyperparameter setting for POMO. (includes Alpha, Epochs, Epoch Size, Encoder Layer Number, Batch Size, Embedding Dimension, Attention Head Number, Feed Forward Dimension, Tanh Clip, Learning Rate) |