reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

Authors: Zhaoyu Li, Jinpei Guo, Xujie Si

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results provide valuable insights into the performance of GNNbased SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space. Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Researcher Affiliation	Academia	Zhaoyu Li EMAIL University of Toronto Jinpei Guo EMAIL Shanghai Jiao Tong University Xujie Si EMAIL University of Toronto
Pseudocode	Yes	Table 9: Supported GNN models in G4SATBench. Graph Method Message-passing Algorithm Notes h(k) c , s(k) c = Layer Norm LSTM1 l N(c) MLPl h(k 1) l , h(k 1) c , s(k 1) c ! h(k) l , s(k) l = Layer Norm LSTM2 c N(l) MLPc h(k 1) c , h(k 1) l , h(k 1) l , s(k 1) l ! sc, sl are the hidden states which are initialized to zero vectors.
Open Source Code	Yes	Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Open Datasets	Yes	To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNNbased SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. Our codebase is available at https://github.com/zhaoyu-li/G4SATBench.
Dataset Splits	Yes	For each easy and medium dataset, we generate 80k pairs of satisfiable and unsatisfiable instances for training, 10k pairs for validation, and 10k pairs for testing. For each hard dataset, we produce 10k testing pairs.
Hardware Specification	Yes	Each experiment is performed on a single RTX8000 GPU and 16 AMD EPYC 7502 CPU cores, and the total time cost is approximately 8,000 GPU hours.
Software Dependencies	No	All neural networks in our study are implemented using Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019). For all GNN models, we set the feature dimension d to 128 and the number of message passing iterations T to 32. The MLPs in the models consist of two hidden layers with the Re LU (Nair & Hinton, 2010) activation function. To select the optimal hyperparameters for each GNN baseline, we conduct a grid search over several settings. Specifically, we explore different learning rates from {10 3, 5 10 4, 10 4, 5 10 5, 10 5}, training epochs from {50, 100, 200}, weight decay values from {10 6, 10 7, 10 8, 10 9, 10 10}, and gradient clipping norms from {0.1, 0.5, 1}. We employ Adam (Kingma & Ba, 2015) as the optimizer and set the batch size to 128, 64, or 32 to fit within the maximum GPU memory (48G).
Experiment Setup	Yes	For all GNN models, we set the feature dimension d to 128 and the number of message passing iterations T to 32. The MLPs in the models consist of two hidden layers with the Re LU (Nair & Hinton, 2010) activation function. To select the optimal hyperparameters for each GNN baseline, we conduct a grid search over several settings. Specifically, we explore different learning rates from {10 3, 5 10 4, 10 4, 5 10 5, 10 5}, training epochs from {50, 100, 200}, weight decay values from {10 6, 10 7, 10 8, 10 9, 10 10}, and gradient clipping norms from {0.1, 0.5, 1}. We employ Adam (Kingma & Ba, 2015) as the optimizer and set the batch size to 128, 64, or 32 to fit within the maximum GPU memory (48G). For the parameters τ and κ of the unsupervised loss in Equation 4 and Equation 5, we try the default settings (τ = t 0.4 and κ = 10, where t is the global step during training) as the original paper (Amizadeh et al., 2019a) as well as other values (τ {0.05, 0.1, 0.2, 0.5}, κ {1, 2, 5}) and empirically find τ = 0.1, κ = 1 yield the best results.