Consistency of Compositional Generalization Across Multiple Levels

Authors: Chuanhao Li, Zhen Li, Chenchen Jing, Xiaomeng Fan, Wenbo Ye, Yuwei Wu, Yunde Jia

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We incorporate various types of methods of two tasks including VQA and temporal video grounding (TVG) into our framework, and conduct experiments on our GQA-CCG dataset, the GQA dataset and the Charades-CG dataset (Li et al. 2022), for validating the effectiveness and generalizability of our framework. Experimental results show that our framework effectively enhances the consistency of compositional generalization across multiple levels and improves the accuracy of compositional generalization at different levels, while maintaining comparable independent and identically distributed (IID) generalization capability.
Researcher Affiliation Academia 1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology 2Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University 3School of Computer Science, Zhejiang University EMAIL EMAIL
Pseudocode No The paper describes the proposed framework and optimization process using mathematical formulations and descriptive text, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Repository https://github.com/Never More LCH/CCG
Open Datasets Yes To enable the quantitative evaluation for the consistency of compositional generalization across multiple levels, we build a new dataset in the context of visual question answering (VQA), i.e., GQA-CCG, based on the GQA dataset (Hudson and Manning 2019), a large-scale dataset organized for compositional VQA. ... For VQA, we evaluate the framework on our GQA-CCG dataset and the GQA dataset (Hudson and Manning 2019). ... For TVG, we use the recently released Charades-CG dataset (Li et al. 2022) that contains compositional referring expressions about real-world videos...
Dataset Splits Yes For a training set Dt, we first divide Dt into multiple validation sets {Dvi}K i=1 based on the compositional complexity of samples. ... We use the the train balanced split and the val all split of GQA in the process of constructing GQA-CCG, and here we denote them as Dt and Dv, respectively.
Hardware Specification No The paper discusses models and their parameters, and mentions 'GPU memory' in a theoretical context, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts, or specific computing platforms) used for running the experiments.
Software Dependencies No The paper mentions using tools like the 'benepar toolkit (Kitaev, Cao, and Klein 2019)' and 'GPT-3.5', and 'BERT embeddings (Devlin et al. 2019)', but does not provide specific version numbers for these or any other ancillary software dependencies used in the experimental setup.
Experiment Setup Yes Specifically, for the initial parameter θ(0), we train for Tp iterations to update θ by performing gradient descent. At each iteration (i = 1, ..., Tp), we update the parameter as follows: θ(i+1) = θ(i) β(i) θ θ(i), where θ(i) = d Lt dθ(i) , and β(i) θ is the learning rate of θ at iteration i. ... At each iteration (j = 1, ..., Tm), we perform a meta update operation to ωi as follows: ω(j+1) i = ω(j) i β(j) ωi ω(j) i , where ω(j) i = d Lv dω(j) i , and β(j) ωi is the learning rate of ωi at iteration j.