Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
Authors: Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee
ICLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted several experiments to examine whether we can improve the heuristics of the logic solver CADET through our deep reinforcement learning approach. |
| Researcher Affiliation | Collaboration | Gil Lederman Electrical Engineering and Computer Sciences University of California at Berkeley EMAIL Markus N. Rabe Google Research EMAIL Edward A. Lee Electrical Engineering and Computer Sciences University of California at Berkeley EMAIL Sanjit A. Seshia Electrical Engineering and Computer Sciences University of California at Berkeley EMAIL |
| Pseudocode | No | Not found. The paper describes algorithms and network architectures but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2We provide the code and data of our experiments at https://github.com/lederg/learningqbf. |
| Open Datasets | Yes | We consider a set of formulas representing the search for reductions between collections of first-order formulas generated by Jordan & Kaiser (2013), which we call Reductions in the following. We additionally consider the 2QBF evaluation set of the annual competition of QBF solvers, QBFEVAL (Pulina, 2016). |
| Dataset Splits | Yes | We filtered out 2573 formulas that are solved without any heuristic decisions. In order to enable us to answer question 2 (see above), we further set aside a test set of 200 formulas, leaving us with a training set of 1835 formulas. |
| Hardware Specification | Yes | We trained a model on the reduction problems training set for 10M steps on an AWS server of type C5. |
| Software Dependencies | No | Not found. The paper mentions the CADET solver and uses terms like GNN, but does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | The hyperparameters that resulted in the best model are δL = 16, δC = 64, and τ = 1, leading to a model with merely 8353 parameters. Literal embedding dimension: δL = 16 Clause embedding dimension: δC = 64 Learning rate: 0.0006 for the first 2m steps, then 0.0001 Discount factor: γ = 0.99 Gradient clipping: 2 Number of iterations (size of graph convolution): 1 Minimal number of timesteps per batch: 1200 |