G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

Authors: Zhaoyu Li, Jinpei Guo, Xujie Si

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results provide valuable insights into the performance of GNNbased SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space. Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Researcher Affiliation Academia Zhaoyu Li EMAIL University of Toronto Jinpei Guo EMAIL Shanghai Jiao Tong University Xujie Si EMAIL University of Toronto
Pseudocode Yes Table 9: Supported GNN models in G4SATBench. Graph Method Message-passing Algorithm Notes h(k) c , s(k) c = Layer Norm LSTM1 l N(c) MLPl h(k 1) l , h(k 1) c , s(k 1) c ! h(k) l , s(k) l = Layer Norm LSTM2 c N(l) MLPc h(k 1) c , h(k 1) l , h(k 1) l , s(k 1) l ! sc, sl are the hidden states which are initialized to zero vectors.
Open Source Code Yes Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Open Datasets Yes To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNNbased SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Dataset Splits Yes For each easy and medium dataset, we generate 80k pairs of satisfiable and unsatisfiable instances for training, 10k pairs for validation, and 10k pairs for testing. For each hard dataset, we produce 10k testing pairs.
Hardware Specification Yes Each experiment is performed on a single RTX8000 GPU and 16 AMD EPYC 7502 CPU cores, and the total time cost is approximately 8,000 GPU hours.
Software Dependencies No All neural networks in our study are implemented using Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019). For all GNN models, we set the feature dimension d to 128 and the number of message passing iterations T to 32. The MLPs in the models consist of two hidden layers with the Re LU (Nair & Hinton, 2010) activation function. To select the optimal hyperparameters for each GNN baseline, we conduct a grid search over several settings. Specifically, we explore different learning rates from {10 3, 5 10 4, 10 4, 5 10 5, 10 5}, training epochs from {50, 100, 200}, weight decay values from {10 6, 10 7, 10 8, 10 9, 10 10}, and gradient clipping norms from {0.1, 0.5, 1}. We employ Adam (Kingma & Ba, 2015) as the optimizer and set the batch size to 128, 64, or 32 to fit within the maximum GPU memory (48G).
Experiment Setup Yes For all GNN models, we set the feature dimension d to 128 and the number of message passing iterations T to 32. The MLPs in the models consist of two hidden layers with the Re LU (Nair & Hinton, 2010) activation function. To select the optimal hyperparameters for each GNN baseline, we conduct a grid search over several settings. Specifically, we explore different learning rates from {10 3, 5 10 4, 10 4, 5 10 5, 10 5}, training epochs from {50, 100, 200}, weight decay values from {10 6, 10 7, 10 8, 10 9, 10 10}, and gradient clipping norms from {0.1, 0.5, 1}. We employ Adam (Kingma & Ba, 2015) as the optimizer and set the batch size to 128, 64, or 32 to fit within the maximum GPU memory (48G). For the parameters τ and κ of the unsupervised loss in Equation 4 and Equation 5, we try the default settings (τ = t 0.4 and κ = 10, where t is the global step during training) as the original paper (Amizadeh et al., 2019a) as well as other values (τ {0.05, 0.1, 0.2, 0.5}, κ {1, 2, 5}) and empirically find τ = 0.1, κ = 1 yield the best results.