reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch

Authors: Shengyu Feng, Yiming Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate its advanced performance in both branching quality and training efficiency over previous methods for various MILPs. Our empirical results demonstrate the clear advantage of SORREL in branching quality and generalization performance (trained on small instances and tested on larger ones). Table 1 and 2 present the comparative results evaluated on the standard testing instances and transfer testing instances, respectively. In our ablation study, we first examine the efficacy of our offline RL algorithm by evaluating SORREL with offline RL only, denoted as SORREL-offline, in standard testing.
Researcher Affiliation	Academia	Shengyu Feng, Yiming Yang Carnegie Mellon University EMAIL
Pseudocode	No	The paper describes the methods using textual explanations and mathematical formulations but does not contain any structured pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described.
Open Datasets	Yes	We evaluate all methods on five commonly used MILP benchmarks for NP-hard problems, including Set Covering (SC), Maximal Independent Set (MIS), Combinatorial Auction (CA), Capacitated Facility Location (CFL) and Multiple Knapsack (MK). We follow the instance generation process in Gasse et al. (2019) to generate 10,000 MILP instances for training, 2,000 instances for validation, and 100 instances for testing, with the same distribution.
Dataset Splits	Yes	We follow the instance generation process in Gasse et al. (2019) to generate 10,000 MILP instances for training, 2,000 instances for validation, and 100 instances for testing, with the same distribution. For consistency, 20 validation instances and 100 training instances are held for validation and online finetuning of SORREL.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies	Yes	All our results are measured using SCIP 7.0.1 as the backend solver, employing a 1-hour time limit.
Experiment Setup	Yes	All our results are measured using SCIP 7.0.1 as the backend solver, employing a 1-hour time limit. Five different random seeds are used during the evaluation, which leads to 100 × 5 = 500 runs on standard testing instances and 20 × 5 = 100 runs on transfer testing instances. We report the 1-shifted geometric mean of solving time and the 10-shifted geometric mean of the number of nodes for all runs, following the conventions in Achterberg and Wunderling (2013). We collect the suboptimal demonstrations for training by running VHB on randomly sampled training instances to generate the B&B search tree until 100,000 transitions are collected. The same process is repeated to create the validation set with 20,000 transitions for GCNN. Here, λ controls the relative portion of the regularization term, whose value is chosen to be λ = α 1 N P (si,ai) D1 \|Q(si, ai)\|, where α is a hyperparameter and 1 N P (si,ai) D1 \|Q(si, ai)\| is estimated from the mini-batch. ϵ (0, 1) is a hyperparameter of the clipping threshold. In this work, we let pch(schi\|si) = κ for schi with the lower LDB among child states, where κ is a hyperparameter.