Reinforcement Learning for Node Selection in Branch-and-Bound
Authors: Alexander Julian Mattick, Christopher Mutschler
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform all our training and the experiments on a Ryzen 7 5800x3d CPU, where training completes after approx. 6h. For our experiments we consider the instances of TSPLIB (Reinelt, 1991) and MIPLIB (Gleixner et al., 2021) which are one of the most used datasets for benchmarking MIP frameworks and thusly form a strong baseline to test against. We further test against the UFLP instance generator by (Kochetov & Ivanenko, 2005), which specifically produces instances hard to solve for branch-and-bound, and against MINLPLIB (Bussieck et al., 2003), which contains mixed integer nonlinear programs, to show generalization to vastly different problems. |
| Researcher Affiliation | Academia | Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS Nordostpark 84, 90411 Nürnberg, Germany |
| Pseudocode | Yes | Algorithm 1 Pseudocode for node selection using our policy Algorithm 2 Pseudocode for training our policy |
| Open Source Code | Yes | Our source code is publicly available and can be used to reproduce the experiments.3 We use the default hyperparameters as proposed by Clean RL (Huang et al., 2021). 3Source code: https://github.com/MattAlexMiracle/BranchAndBoundBisim |
| Open Datasets | Yes | For our experiments we consider the instances of TSPLIB (Reinelt, 1991) and MIPLIB (Gleixner et al., 2021) which are one of the most used datasets for benchmarking MIP frameworks and thusly form a strong baseline to test against. We further test against the UFLP instance generator by (Kochetov & Ivanenko, 2005), which specifically produces instances hard to solve for branch-and-bound, and against MINLPLIB (Bussieck et al., 2003), which contains mixed integer nonlinear programs |
| Dataset Splits | Yes | To generate these intermediary-difficult problems, we adopt a multi-step approach: ... This method is used to produce 200 intermediary-difficult training instances. ... We train our node selection policy on problem instances according to Sec. 4.4 and apply it on problems from different benchmarks. |
| Hardware Specification | Yes | We perform all our training and the experiments on a Ryzen 7 5800x3d CPU, where training completes after approx. 6h. |
| Software Dependencies | Yes | Our source code is publicly available and can be used to reproduce the experiments.3 We use the default hyperparameters as proposed by Clean RL (Huang et al., 2021). ... We test against the strong SCIP 8.0.4 baseline. |
| Experiment Setup | Yes | For this work we choose |dmodel| = 512, but we did not find significant differences between different model sizes past a width of 256. For training we use Adam W Loshchilov & Hutter (2017) with a standard learning rate of 3 10 4 and default PPO parameters. ... For training, we set Niter = 16, K = 128, and ε = 0.1, and the weight of the entropy regularization λent = 0.01. |