reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Neural-Probabilistic Answer Set Programming

Authors: Arseny Skryagin, Daniel Ochs, Devendra Singh Dhami, Kristian Kersting

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SLASH on various tasks, including the benchmark task of MNIST addition and Visual Question Answering (VQA). ... We refer to App. A for each experiment's SLASH program, including queries, and App. C for a detailed description of hyperparameters and further experimental details.
Researcher Affiliation	Academia	Arseny Skryagin EMAIL Daniel Ochs EMAIL Devendra Singh Dhami EMAIL Computer Science Department, TU Darmstadt Darmstadt, Germany Kristian Kersting EMAIL Computer Science Department, TU Darmstadt German Research Center for Artificial Intelligence (DFKI) Darmstadt, Germany
Pseudocode	Yes	Algorithm 1 Gradient computation Algorithm 2 Potential Solutions with SAME
Open Source Code	Yes	Code is available at https://github.com/ml-research/SLASH
Open Datasets	Yes	We evaluate SLASH on various tasks, including the benchmark task of MNIST addition and Visual Question Answering (VQA). ... the benchmark task of MNIST-Addition (Manhaeve et al., 2018), and Sudoku (Yang et al., 2020) present the advantages coming with SAME. ... We refer to Manmadhan and Kovoor (2020) and Kodali and Berleant (2022) for a detailed review. Recently, more neuro-symbolic approaches to VQA have been proposed. Yi et al. (2018) proposed a model which creates a structural scene representation of the image, parses a natural language question into a program, and then executes the program to obtain an answer. A few works utilize logic programming: Scallop's (Huang et al., 2021) top-k approach allows for answering complex reasoning questions on real-world images. Eiter et al. (2022) showed how ASP could be used on top of the outputs of a pretrained YOLO network to answer CLEVR questions (Johnson et al., 2017).
Dataset Splits	No	The paper mentions training on '10k samples on C2' or '10, 100, 1k and 10k training samples on C2' for VQAR, but does not explicitly provide percentages or absolute counts for training, validation, and test splits across all datasets or define how these splits were obtained or used for full reproducibility.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments. It only mentions that 'DPPLs as of now utilize a GPU for neural computations, while solving and computing gradients happens on the CPU' without any specific model numbers or configurations.
Software Dependencies	Yes	For all experiments, the ADAM optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.999, ϵ = 1e 8 and no weight decay was used. ... We have used Einsum Networks (Ei Nets) for implementing the PC. ... SLASH uses CLINGO as the underlying solver to produce potential solutions
Experiment Setup	Yes	For all experiments, the ADAM optimizer (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.999, ϵ = 1e 8 and no weight decay was used. ... The learning rate and batch size for SLASH and the baselines are shown in Tab. 7. ... The slot encoder had a number of 4 slots and 3 attention iterations over all experiments. ... The learning rate for the baseline slot encoder was 0.0004 and 512. The learning rate and batch size for SLASH Attention were 0.01 and 512 for Shape World4 and CLEVR for the PCs, and 0.0004 for the slot encoder.