reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KLay: Accelerating Arithmetic Circuits for Neurosymbolic AI

Authors: Jaron Maene, Vincent Derkinderen, Pedro Zuidberg Dos Martires

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that KLAY achieves speedups of multiple orders of magnitude over the state of the art, thereby paving the way towards scaling neurosymbolic AI to larger real-world applications. ... We implement KLAY as a Python library supporting two popular tensor libraries: Py Torch and Jax. We evaluate the runtime performance of KLAY on several synthetic benchmarks and neurosymbolic experiments. All experiments were conducted on the same machine, which has an NVIDIA Ge Force RTX 4090 as GPU and an Intel i9-13900K as CPU.
Researcher Affiliation	Academia	Jaron Maene & Vincent Derkinderen Department of Computer Science KU Leuven Leuven, Belgium EMAIL Pedro Zuidberg Dos Martires Centre for Applied Autonomous Sensor Systems Orebro University Orebro, Sweden EMAIL
Pseudocode	Yes	Algorithm 1 contains pseudo-code for the layerwise circuit evaluations, where we use the common scatter function to segment and aggregate the El vectors. ... Algorithm 2 and 3 contains pseudo-code of the previously discussed layerization and tensorization procedures. Algorithm 4 contains the pseudo-code for the evaluation of KLAY in the logarithmic semiring instead of the real semiring.
Open Source Code	Yes	The functionality of KLAY has been described in pseudo-code (Algorithms 1, 2, and 3) and has been implemented as a Python library to easily replicate all experiments in the paper, available at https://github.com/ML-KULeuven/klay.
Open Datasets	Yes	The Sudoku benchmark is a classification problem, determining whether a 4 4 grid of images forms a valid Sudoku (Augustine et al., 2022). The Grid (Xu et al., 2018) and Warcraft (Poganˇci c et al., 2020) instances require the prediction of a valid low-cost path. Finally, hierarchical multi-level classification (HMLC) concerns the consistent classification of labels in a hierarchy (Giunchiglia & Lukasiewicz, 2020). ... MNIST-addition is a common neurosymbolic task where the input is two numbers represented as MNIST images and the model needs to predict their sum. For more details, we refer to Manhaeve et al. (2018).
Dataset Splits	No	The paper mentions generating synthetic data by varying variables and clauses for 3-CNF formulas and using 'batch size 128' for neurosymbolic tasks, but it does not specify explicit training/validation/test splits, percentages, or absolute sample counts for any of the datasets used.
Hardware Specification	Yes	All experiments were conducted on the same machine, which has an NVIDIA Ge Force RTX 4090 as GPU and an Intel i9-13900K as CPU.
Software Dependencies	No	The paper mentions supporting "Py Torch and Jax" and using the "Py SDD library" and "D4 knowledge compiler", but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Timings are averaged over 10 runs per SDD. ... We measure the training time of the MNIST-addition task in Table 2. MNIST-addition is a common neurosymbolic task where the input is two numbers represented as MNIST images and the model needs to predict their sum. For more details, we refer to Manhaeve et al. (2018). As a reference, we also include Scallop, which aims to improve the scalability of Deep Prob Log by approximating using top-k provenance (Li et al., 2023). While Deep Prob Log and Scallop cannot perform batched inference, KLAY can by using multi-rooted circuits. This is reflected in speed-ups of two orders of magnitude over the existing Deep Prob Log implementation, even on rather small circuits. Even though KLAY remains exact unlike Scallop, KLAY also demonstrates large speedups here. Table 2: MNIST-addition training time in seconds for one epoch. We report the average and standard deviation over 10 epochs, using a batch size of 128.