Decision-Focused Surrogate Modeling for Mixed-Integer Linear Optimization
Authors: Shivi Dixit, Rishabh Gupta, Qi Zhang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results from two computational case studies indicate that this decision-focused surrogate modeling approach is highly data-efficient and provides very accurate predictions of the optimal solutions. In these examples, the resulting surrogate LPs outperform state-of-the-art neural-network-based optimization proxies. |
| Researcher Affiliation | Collaboration | Shivi Dixit EMAIL Department of Chemical Engineering and Materials Science, University of Minnesota Twin Cities, MN, USA. Rishabh Gupta EMAIL Exxon Mobil Technology and Engineering Company, Spring, TX, USA. Qi Zhang EMAIL Department of Chemical Engineering and Materials Science, University of Minnesota Twin Cities, MN, USA. |
| Pseudocode | Yes | Algorithm 1 The penalty-based BCD algorithm. |
| Open Source Code | Yes | The codes developed for these case studies are available at https://github.com/ddolab/DFSM-for-MILPs. |
| Open Datasets | No | To generate different problem instances, we choose η N(2.5, 5.5), αt N(6, 16), βt N(0.5, 2), and Einit N(90, 97), and set τ = 5, P max = 1, and Emax = 100. Moreover, the value of S is also varied between {1, 2, 3} across these instances. This way we generate 10 different instances, each representing a hybrid vehicle with different attributes. For each instance, the training dataset is generated by varying the power demand profile D, which is the model input, and solving the original MILP for that demand profile to obtain the corresponding optimal solution (E , P eng , z ). |
| Dataset Splits | Yes | For each instance, the training dataset is generated by varying the power demand profile D, which is the model input, and solving the original MILP for that demand profile to obtain the corresponding optimal solution (E , P eng , z ). Repeating this process provides a set of data points, each consisting of an input-solution pair. The decision prediction errors and optimality gaps for each instance are determined using 100 unseen test data points, each given by a different power demand profile. Note that while the maximum number of training data points we use to construct each DFSOM is 100, we use up to 500 data points to train each of the three NN-based optimization proxies. |
| Hardware Specification | Yes | All DFSM instances were solved utilizing 24 cores and 60 GB memory on the Mesabi cluster of the Minnesota Supercomputing Institute (MSI) equipped with Intel Haswell E5-2680v3 processors. |
| Software Dependencies | Yes | All optimization problems were implemented in Julia v1.7.2 using the modeling environment Ju MP v0.22.3 (Dunning et al., 2017). We applied Gurobi v10.3 to solve all LPs and MILPs, and all NLPs were solved using IPOPT v0.9.1. |
| Experiment Setup | Yes | To generate different problem instances, we choose η N(2.5, 5.5), αt N(6, 16), βt N(0.5, 2), and Einit N(90, 97), and set τ = 5, P max = 1, and Emax = 100. Moreover, the value of S is also varied between {1, 2, 3} across these instances. The additional linear inequalities in the postulated surrogate LP take the form of A(D, θ)[E P eng z] b(D, θ), where the number of constraints |V| is a hyperparameter that can be adjusted. Each element of A and b is a cubic polynomial in D with θ being the coefficients. For the feedforward NN, the numbers of hidden layers and neurons in each layer are determined through a trial-and-error approach aiming to minimize the training loss over a fixed number of epochs. ... The design of the specific NN model used in the hybrid vehicle control case study is as follows: the input layer contains T nodes, the same as the dimension of the input vector u. The two hidden layers have 3T and 6T nodes, and finally, the output layer consists of 3T + 1 nodes aligning with the dimensionality of the decision space. We also account for the effect of different learning rates (namely 0.1, 0.01, and 0.001) on the training loss. Based on our analysis, we determine 0.01 to be the most suitable learning rate for our purpose. |