reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Preference Optimization for Combinatorial Optimization Problems

Authors: Mingjun Pan, Guanquan Lin, You-Wei Luo, Bin Zhu, Zhien Dai, Lijun Sun, Chun Yuan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on various benchmarks, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP) and the Flexible Flow Shop Problem (FFSP), demonstrate that our method significantly outperforms existing RL algorithms, achieving superior convergence efficiency and solution quality.
Researcher Affiliation	Collaboration	1China Mobile 2Peking University 3Tsinghua University 4Jiaying University 5Sun Yat-Sen University 6Central South University
Pseudocode	Yes	Algorithm 1 Preference Optimization for COPs.
Open Source Code	No	The paper does not provide a direct link to a code repository, an explicit statement of code release, or specify that the code is included in supplementary materials for the methodology described.
Open Datasets	Yes	The fundamental COPs including TSP, CVRP and FFSP, serve as our testbed. In routing problems, the reward r(x, τ) is defined as the Euclidean length of the trajectory τ. ... To evaluate generalization capabilities, we conduct zero-shot testing on problems with diverse distributions as specified in (Bi et al., 2022) and on standard benchmark datasets: TSPLib (Reinelt, 1991) and CVRPLib (Uchoa et al., 2017).
Dataset Splits	No	The paper mentions testing on "10k instances" and "1k instances" for different problem types, and discusses "zero-shot testing on problems with diverse distributions." However, it does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, exact counts, or specific citation to a predefined split) for its own experimental setup, only that instances are sampled or evaluated on benchmark sets.
Hardware Specification	Yes	Most experiments were conducted on a server with NVIDIA 24GB-RTX 4090 GPUs and an Intel Xeon Gold 6133 CPU. ... We implement PO upon these baselines and train them from scratch for 100k steps on a single 80GB-A800 GPU.
Software Dependencies	No	The paper provides a Python code snippet demonstrating the core logic and imports `torch.nn.functional`, indicating the use of Python and PyTorch. However, it does not specify version numbers for Python, PyTorch, or any other software libraries or tools used in the experiments.
Experiment Setup	Yes	The hyperparameter configurations for these solvers primarily follow their original implementations, with detailed specifications provided in Appendix E.3. ... In our experimental setup, we set the tanh clip to 50.0 for VRPs, which has been shown to facilitate the training process (Jin et al., 2023). The following table presents the parameter settings for the four training frameworks: POMO (Kwon et al., 2020), Pointerformer (Jin et al., 2023), AM (Kool et al., 2019), and Sym-NCO (Kim et al., 2023). ... Table 4: Hyperparameter setting for POMO. (includes Alpha, Epochs, Epoch Size, Encoder Layer Number, Batch Size, Embedding Dimension, Attention Head Number, Feed Forward Dimension, Tanh Clip, Learning Rate)