Reinforcement Learning for Node Selection in Branch-and-Bound

Authors: Alexander Julian Mattick, Christopher Mutschler

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform all our training and the experiments on a Ryzen 7 5800x3d CPU, where training completes after approx. 6h. For our experiments we consider the instances of TSPLIB (Reinelt, 1991) and MIPLIB (Gleixner et al., 2021) which are one of the most used datasets for benchmarking MIP frameworks and thusly form a strong baseline to test against. We further test against the UFLP instance generator by (Kochetov & Ivanenko, 2005), which specifically produces instances hard to solve for branch-and-bound, and against MINLPLIB (Bussieck et al., 2003), which contains mixed integer nonlinear programs, to show generalization to vastly different problems.
Researcher Affiliation Academia Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS Nordostpark 84, 90411 Nürnberg, Germany
Pseudocode Yes Algorithm 1 Pseudocode for node selection using our policy Algorithm 2 Pseudocode for training our policy
Open Source Code Yes Our source code is publicly available and can be used to reproduce the experiments.3 We use the default hyperparameters as proposed by Clean RL (Huang et al., 2021). 3Source code: https://github.com/MattAlexMiracle/BranchAndBoundBisim
Open Datasets Yes For our experiments we consider the instances of TSPLIB (Reinelt, 1991) and MIPLIB (Gleixner et al., 2021) which are one of the most used datasets for benchmarking MIP frameworks and thusly form a strong baseline to test against. We further test against the UFLP instance generator by (Kochetov & Ivanenko, 2005), which specifically produces instances hard to solve for branch-and-bound, and against MINLPLIB (Bussieck et al., 2003), which contains mixed integer nonlinear programs
Dataset Splits Yes To generate these intermediary-difficult problems, we adopt a multi-step approach: ... This method is used to produce 200 intermediary-difficult training instances. ... We train our node selection policy on problem instances according to Sec. 4.4 and apply it on problems from different benchmarks.
Hardware Specification Yes We perform all our training and the experiments on a Ryzen 7 5800x3d CPU, where training completes after approx. 6h.
Software Dependencies Yes Our source code is publicly available and can be used to reproduce the experiments.3 We use the default hyperparameters as proposed by Clean RL (Huang et al., 2021). ... We test against the strong SCIP 8.0.4 baseline.
Experiment Setup Yes For this work we choose |dmodel| = 512, but we did not find significant differences between different model sizes past a width of 256. For training we use Adam W Loshchilov & Hutter (2017) with a standard learning rate of 3 10 4 and default PPO parameters. ... For training, we set Niter = 16, K = 128, and ε = 0.1, and the weight of the entropy regularization λent = 0.01.