Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting
Authors: Daniel McKenzie, Howard Heaton, Samy Wu Fung
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on two representative ILPs: the shortest path problem and the knapsack problem, demonstrate that this combination DYS on the forward pass, JFB on the backward pass yields a scheme which scales more effectively to highdimensional problems than existing schemes. All code associated with this paper is available at https://github.com/mines-opt-ml/fpo-dys. 5 Numerical Experiments We demonstrate the effectiveness of DYS-Net through four experiments. |
| Researcher Affiliation | Collaboration | Daniel Mc Kenzie EMAIL Department of Applied Mathematics and Statistics Colorado School of Mines Samy Wu Fung EMAIL Department of Applied Mathematics and Statistics Department of Computer Science Colorado School of Mines Howard Heaton EMAIL Typal Academy |
| Pseudocode | Yes | Algorithm 1 DYS-Net 1: Inputs:A and b defining C, hyperparameters α, γ, K 2: Initialize: Attach neural network wΘ( ). Compute SVD of A for PC2 formula 3: Forward Pass: Initialize z0. Given wΘ(d) compute z1, . . . , z K using equation 15. Return x K PC1(z K) xΘ. 4: Backward Pass: Compute pΘ(x K) pΘ(xΘ) using equation 19 and return. |
| Open Source Code | Yes | All code associated with this paper is available at https://github.com/mines-opt-ml/fpo-dys. |
| Open Datasets | Yes | Finally, as an illustrative example, we consider the Warcraft terrains dataset first studied in Vlastelica et al. (2019). |
| Dataset Splits | No | We use a validation set for model selection as we observe that, for all methods, the best loss is seldom achieved at the final iteration. Evaluation Given test data {(di, w(di), x (di))}N i=1 and a fully trained model wΘ(d) we define the regret: ... We train each network for 50 epochs, except for the baseline. |
| Hardware Specification | Yes | All networks were trained using a AMD Threadripper Pro 3955WX: 16 cores, 3.90 GHz, 64 MB cache, PCIe 4.0 CPU and an NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | DYS-net is implemented as an abstract Py Torch (Paszke et al., 2019) layer in the provided code. To instantiate it, a user only need specify A and b (see equation 3) and choose an architecture for wΘ. Our shortest path prediction experiment described in Section 5.1.1 is bottlenecked by the fact that the academic license of Gurobi does not allow for shortest path prediction problems on grids larger than 30 30. |
| Experiment Setup | Yes | In all our experiments with DYS-net we used α = 0.05 and K = 1, 000, with an early stopping condition of |zk+1 zk|2 tol with tol = 0.01. We train for a maximum of 100 epochs or 30 minutes, whichever comes first. Exact hyperparameters are discussed in Appendix C. For all models we use an initial learning rate of 10 3 and a scheduler that reduces the learning rate whenever the validation loss plateaus. We also used weight decay with a parameter of 5 10 4. After experimenting with different values, we select γ = 5 10 4 for DYS-net and γ = 5 10 1 for CVX-net. |