NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

Authors: Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our framework achieves state-of-the-art results on the Rea SCAN and CLEVR-Co Gen T compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark. 4 Experiments We evaluate our method across three key aspects: compositional generalization, vision-language reasoning, and handling linguistic variety. We present our experiments on Rea SCAN (Wu et al. 2021) and CLEVR-Co Gen T (Johnson et al. 2017a) for compositional generalization.
Researcher Affiliation Academia Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi Michigan State University EMAIL
Pseudocode No The paper describes its methodology in Section 3, including figures (Figure 1, Figure 2, Figure 3) and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/HLR/Ne Sy Co Co
Open Datasets Yes We evaluate our method across three key aspects: compositional generalization, vision-language reasoning, and handling linguistic variety. We present our experiments on Rea SCAN (Wu et al. 2021) and CLEVR-Co Gen T (Johnson et al. 2017a) for compositional generalization. In the context of visual reasoning, we discuss our experiments and findings using the CLEVR dataset and its extensions. Finally, to assess how our neuro-symbolic methods handle linguistic variety, we introduce a new benchmark called CLEVR-SYN.
Dataset Splits Yes CLEVR Co Gen T Benchmark... In test split A, cubes are restricted to gray, blue, brown, or yellow, while cylinders are limited to red, green, purple, or cyan. Split B swaps these color sets between cubes and cylinders... Rea SCAN includes seven compositional test splits with specific held-out combinations compared to the training data: A1: yellow square referred with color and shape. A2: red square referred anywhere in the command. A3: small cylinder referred to by size and shape. B1: Co-occurrences of a small red circle and a large blue square. B2: Co-occurrences of same size as and inside of relationships. C1: Three relative clause commands. C2: Two relative clause using that is instead of and. ...CLVER-SYN... This benchmark consists of three test splits using all of the samples in the CLEVR validation split. In the easy test, only one concept has changed in the program. In the medium test, a maximum of three concepts have been replaced. In the hard test, all concepts in the list are replaced.
Hardware Specification Yes All experiments were conducted on Ubuntu OS with an AMD EPYC 7413 24-core CPU and an NVIDIA A6000 GPU, featuring 48GB of memory and 700GB of RAM.
Software Dependencies No Our implementation is based on the Py Torch deep learning library (Paszke et al. 2019), with the Spa Cy toolkit (Honnibal and Montani 2017) used for extracting dependency parsing of natural language queries. We employed the LLa MA3.1 70B model (Dubey, Jauhri, and Others 2024) with 4-bit quantization as our primary language model, selected for its open-source availability and performance parity with GPT3.5, allowing us to maintain transparency and adaptability in our experiments. The experiments vector embeddings for predicates and other linguistic components were derived from Glo Ve (Pennington, Socher, and Manning 2014) language encoder. While LLaMA3.1 70B is a specific model identifier, explicit version numbers for PyTorch, SpaCy, and GloVe are not provided in the text, only their respective citations.
Experiment Setup Yes The hyperparameters used in the experiments are detailed in Table 9. Table 9: Hyperparameters of Ne Sy Co Co for Rea SCAN, CLVER, and CLVER-Co Gen T (Shared FNN [1024,512, 256, 128, 1], Visual Repr. Projection 512, Predicate Repr. Projection 512, Learning Rate {10 3, 10 4, 10 5}, Batch Size 32, Number of Parameters 14.2M/14.3M, Epochs 100, Curriculum Learning Yes, Language Encoder Glove-6B-300D, Embedding Size 300).