reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

Authors: Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our framework achieves state-of-the-art results on the Rea SCAN and CLEVR-Co Gen T compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark. 4 Experiments We evaluate our method across three key aspects: compositional generalization, vision-language reasoning, and handling linguistic variety. We present our experiments on Rea SCAN (Wu et al. 2021) and CLEVR-Co Gen T (Johnson et al. 2017a) for compositional generalization.
Researcher Affiliation	Academia	Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi Michigan State University EMAIL
Pseudocode	No	The paper describes its methodology in Section 3, including figures (Figure 1, Figure 2, Figure 3) and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/HLR/Ne Sy Co Co
Open Datasets	Yes	We evaluate our method across three key aspects: compositional generalization, vision-language reasoning, and handling linguistic variety. We present our experiments on Rea SCAN (Wu et al. 2021) and CLEVR-Co Gen T (Johnson et al. 2017a) for compositional generalization. In the context of visual reasoning, we discuss our experiments and findings using the CLEVR dataset and its extensions. Finally, to assess how our neuro-symbolic methods handle linguistic variety, we introduce a new benchmark called CLEVR-SYN.
Dataset Splits	Yes	CLEVR Co Gen T Benchmark... In test split A, cubes are restricted to gray, blue, brown, or yellow, while cylinders are limited to red, green, purple, or cyan. Split B swaps these color sets between cubes and cylinders... Rea SCAN includes seven compositional test splits with specific held-out combinations compared to the training data: A1: yellow square referred with color and shape. A2: red square referred anywhere in the command. A3: small cylinder referred to by size and shape. B1: Co-occurrences of a small red circle and a large blue square. B2: Co-occurrences of same size as and inside of relationships. C1: Three relative clause commands. C2: Two relative clause using that is instead of and. ...CLVER-SYN... This benchmark consists of three test splits using all of the samples in the CLEVR validation split. In the easy test, only one concept has changed in the program. In the medium test, a maximum of three concepts have been replaced. In the hard test, all concepts in the list are replaced.
Hardware Specification	Yes	All experiments were conducted on Ubuntu OS with an AMD EPYC 7413 24-core CPU and an NVIDIA A6000 GPU, featuring 48GB of memory and 700GB of RAM.
Software Dependencies	No	Our implementation is based on the Py Torch deep learning library (Paszke et al. 2019), with the Spa Cy toolkit (Honnibal and Montani 2017) used for extracting dependency parsing of natural language queries. We employed the LLa MA3.1 70B model (Dubey, Jauhri, and Others 2024) with 4-bit quantization as our primary language model, selected for its open-source availability and performance parity with GPT3.5, allowing us to maintain transparency and adaptability in our experiments. The experiments vector embeddings for predicates and other linguistic components were derived from Glo Ve (Pennington, Socher, and Manning 2014) language encoder. While LLaMA3.1 70B is a specific model identifier, explicit version numbers for PyTorch, SpaCy, and GloVe are not provided in the text, only their respective citations.
Experiment Setup	Yes	The hyperparameters used in the experiments are detailed in Table 9. Table 9: Hyperparameters of Ne Sy Co Co for Rea SCAN, CLVER, and CLVER-Co Gen T (Shared FNN [1024,512, 256, 128, 1], Visual Repr. Projection 512, Predicate Repr. Projection 512, Learning Rate {10 3, 10 4, 10 5}, Batch Size 32, Number of Parameters 14.2M/14.3M, Epochs 100, Curriculum Learning Yes, Language Encoder Glove-6B-300D, Embedding Size 300).