reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MallobSat: Scalable SAT Solving by Clause Sharing

Authors: Dominik Schreiber, Peter Sanders

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive evaluations, our approach named Mallob Sat significantly outperforms an updated Horde Sat, doubling its mean speedup. Our clause sharing results in effective parallelization even if all threads execute identical solver programs that only differ based on which clauses they import at which times. We thus argue that Mallob Sat is not a portfolio solver with the added bonus of clause sharing but rather a clause-sharing solver where adding some explicit diversification is useful but not essential. We also discuss the last four iterations of the International SAT Competition (2020 2023), where our system ranked very favorably, and identify several previously unsolved competition problems that Mallob Sat solved successfully. ... In comprehensive empirical analyses on HPC systems, we investigate the benefits of our techniques.
Researcher Affiliation	Academia	Dominik Schreiber EMAIL Peter Sanders EMAIL Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany
Pseudocode	No	The paper describes algorithms and methods verbally and with figures (e.g., Fig. 2 "Information flow in the clause sharing..."), but it does not contain any structured pseudocode or algorithm blocks with numbered steps or specific algorithm labels.
Open Source Code	Yes	Our software and data are available online.12 12 https://zenodo.org/doi/10.5281/zenodo.10184679
Open Datasets	Yes	For the evaluation of the configurations of our system we use the 349 benchmark instances from ISC 2022 (Balyo et al., 2022a) that some solver was able to solve (across all tracks). ... To subsequently evaluate the behavior and speedups of our system (Section 8.4 and later), we use a different benchmark set to reduce overfitting effects, namely the ISC 2021 benchmark set (Balyo et al., 2021) consisting of 400 instances.
Dataset Splits	No	The paper uses standard benchmark sets (ISC 2021 and ISC 2022) but does not specify any particular training/test/validation splits within these datasets for its experiments. It evaluates performance on the entire benchmark set as provided by the competition.
Hardware Specification	Yes	We performed our experiments on Super MUC-NG. This system at the Leibniz Rechenzentrum features a total of 311 040 compute cores with a main memory of 719 TB and a peak performance of 26.9 Peta Flop/s .13 In particular, Super MUC-NG features 6 336 thin compute nodes each with a two-socket Intel Skylake Xeon Platinum 8174 processor clocked at 2.7 GHz with 48 physical cores (96 hardware threads) and 96 GB of main memory. Nodes are interconnected via Omni Path (Birrittella et al., 2015) and run SUSE Linux Enterprise Service (SLES). ... We used bw Uni Cluster2.0, an HPC system that features both an Infini Band interconnect ... as well as less expensive, user-accessible Gigabit Ethernet. ... Each HPC compute node of bw Uni Cluster2.0 features 96 GB of RAM and two Intel Xeon Gold 6230 sockets each with 20 cores (40 hardware threads). ... In this cloud environment, solvers are deployed in Docker containers. In the parallel track, one machine of type m4.16xlarge is used. This instance type features Intel Broadwell Xeon E5-2686 v4 processors and is advertised to feature 64 logical cores and 256 GB of RAM. ... In the cloud track, 100 machines of type m4.4xlarge are used in parallel. This type is advertised to feature 16 virtual CPUs and 64 GB of RAM.
Software Dependencies	Yes	We implemented Mallob Sat in C++17 as an application engine within Mallob, a decentralized job scheduling platform (Sanders & Schreiber, 2022). Using the Message Passing Interface (MPI; Gropp et al., 1999), Mallob can be deployed on a single machine or on many interconnected machines at once. ... We compiled our software with GCC 11.2 and Intel MPI 2019.12. We updated Horde Sat to also use the latest Lingeling+Yal SAT (Biere, 2018) backend. ... apart from the above initial setup and the use of Open MPI instead of Intel MPI
Experiment Setup	Yes	We test the parallel solvers with a comparably low wallclock time limit of 300 s since we aim to solve SAT instances as rapidly as possible in a costly large-scale distributed environment, where we invest substantially more resources than with sequential solving. ... This configuration features our latest techniques and data structures; two sharings per second with compensation for unused sharing volume; distributed filtering with a resharing period of 30 epochs (15 s); incrementing each LBD score before import; a clause buffer limit of β = 100 000; and a process setup with one MPI process for each socket of 24 cores. ... For a given threshold ŷs, if a given serialized formula description has size s > ŷs, then only t = max{1, tŷs/s } threads will be spawned for each process. The choice of ŷs depends on the amount of available main memory per process. The system we used only features 2 GB of RAM per solver if all physical cores are used. Based on monitoring the memory usage for different large formulas in our system, we use ŷs := 50 106. ... we adjusted Mallob Sat’s sharing buffer limit to β = 250 000 literals for the following experiments.