Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

Authors: Daniel McKenzie, Howard Heaton, Samy Wu Fung

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on two representative ILPs: the shortest path problem and the knapsack problem, demonstrate that this combination DYS on the forward pass, JFB on the backward pass yields a scheme which scales more effectively to highdimensional problems than existing schemes. All code associated with this paper is available at https://github.com/mines-opt-ml/fpo-dys. 5 Numerical Experiments We demonstrate the effectiveness of DYS-Net through four experiments.
Researcher Affiliation Collaboration Daniel Mc Kenzie EMAIL Department of Applied Mathematics and Statistics Colorado School of Mines Samy Wu Fung EMAIL Department of Applied Mathematics and Statistics Department of Computer Science Colorado School of Mines Howard Heaton EMAIL Typal Academy
Pseudocode Yes Algorithm 1 DYS-Net 1: Inputs:A and b defining C, hyperparameters α, γ, K 2: Initialize: Attach neural network wΘ( ). Compute SVD of A for PC2 formula 3: Forward Pass: Initialize z0. Given wΘ(d) compute z1, . . . , z K using equation 15. Return x K PC1(z K) xΘ. 4: Backward Pass: Compute pΘ(x K) pΘ(xΘ) using equation 19 and return.
Open Source Code Yes All code associated with this paper is available at https://github.com/mines-opt-ml/fpo-dys.
Open Datasets Yes Finally, as an illustrative example, we consider the Warcraft terrains dataset first studied in Vlastelica et al. (2019).
Dataset Splits No We use a validation set for model selection as we observe that, for all methods, the best loss is seldom achieved at the final iteration. Evaluation Given test data {(di, w(di), x (di))}N i=1 and a fully trained model wΘ(d) we define the regret: ... We train each network for 50 epochs, except for the baseline.
Hardware Specification Yes All networks were trained using a AMD Threadripper Pro 3955WX: 16 cores, 3.90 GHz, 64 MB cache, PCIe 4.0 CPU and an NVIDIA RTX A6000 GPU.
Software Dependencies No DYS-net is implemented as an abstract Py Torch (Paszke et al., 2019) layer in the provided code. To instantiate it, a user only need specify A and b (see equation 3) and choose an architecture for wΘ. Our shortest path prediction experiment described in Section 5.1.1 is bottlenecked by the fact that the academic license of Gurobi does not allow for shortest path prediction problems on grids larger than 30 30.
Experiment Setup Yes In all our experiments with DYS-net we used α = 0.05 and K = 1, 000, with an early stopping condition of |zk+1 zk|2 tol with tol = 0.01. We train for a maximum of 100 epochs or 30 minutes, whichever comes first. Exact hyperparameters are discussed in Appendix C. For all models we use an initial learning rate of 10 3 and a scheduler that reduces the learning rate whenever the validation loss plateaus. We also used weight decay with a parameter of 5 10 4. After experimenting with different values, we select γ = 5 10 4 for DYS-net and γ = 5 10 1 for CVX-net.