SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch

Authors: Shengyu Feng, Yiming Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate its advanced performance in both branching quality and training efficiency over previous methods for various MILPs. Our empirical results demonstrate the clear advantage of SORREL in branching quality and generalization performance (trained on small instances and tested on larger ones). Table 1 and 2 present the comparative results evaluated on the standard testing instances and transfer testing instances, respectively. In our ablation study, we first examine the efficacy of our offline RL algorithm by evaluating SORREL with offline RL only, denoted as SORREL-offline, in standard testing.
Researcher Affiliation Academia Shengyu Feng, Yiming Yang Carnegie Mellon University EMAIL
Pseudocode No The paper describes the methods using textual explanations and mathematical formulations but does not contain any structured pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described.
Open Datasets Yes We evaluate all methods on five commonly used MILP benchmarks for NP-hard problems, including Set Covering (SC), Maximal Independent Set (MIS), Combinatorial Auction (CA), Capacitated Facility Location (CFL) and Multiple Knapsack (MK). We follow the instance generation process in Gasse et al. (2019) to generate 10,000 MILP instances for training, 2,000 instances for validation, and 100 instances for testing, with the same distribution.
Dataset Splits Yes We follow the instance generation process in Gasse et al. (2019) to generate 10,000 MILP instances for training, 2,000 instances for validation, and 100 instances for testing, with the same distribution. For consistency, 20 validation instances and 100 training instances are held for validation and online finetuning of SORREL.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies Yes All our results are measured using SCIP 7.0.1 as the backend solver, employing a 1-hour time limit.
Experiment Setup Yes All our results are measured using SCIP 7.0.1 as the backend solver, employing a 1-hour time limit. Five different random seeds are used during the evaluation, which leads to 100 × 5 = 500 runs on standard testing instances and 20 × 5 = 100 runs on transfer testing instances. We report the 1-shifted geometric mean of solving time and the 10-shifted geometric mean of the number of nodes for all runs, following the conventions in Achterberg and Wunderling (2013). We collect the suboptimal demonstrations for training by running VHB on randomly sampled training instances to generate the B&B search tree until 100,000 transitions are collected. The same process is repeated to create the validation set with 20,000 transitions for GCNN. Here, λ controls the relative portion of the regularization term, whose value is chosen to be λ = α 1 N P (si,ai) D1 |Q(si, ai)|, where α is a hyperparameter and 1 N P (si,ai) D1 |Q(si, ai)| is estimated from the mini-batch. ϵ (0, 1) is a hyperparameter of the clipping threshold. In this work, we let pch(schi|si) = κ for schi with the lower LDB among child states, where κ is a hyperparameter.