Consistency of Compositional Generalization Across Multiple Levels
Authors: Chuanhao Li, Zhen Li, Chenchen Jing, Xiaomeng Fan, Wenbo Ye, Yuwei Wu, Yunde Jia
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We incorporate various types of methods of two tasks including VQA and temporal video grounding (TVG) into our framework, and conduct experiments on our GQA-CCG dataset, the GQA dataset and the Charades-CG dataset (Li et al. 2022), for validating the effectiveness and generalizability of our framework. Experimental results show that our framework effectively enhances the consistency of compositional generalization across multiple levels and improves the accuracy of compositional generalization at different levels, while maintaining comparable independent and identically distributed (IID) generalization capability. |
| Researcher Affiliation | Academia | 1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology 2Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University 3School of Computer Science, Zhejiang University EMAIL EMAIL |
| Pseudocode | No | The paper describes the proposed framework and optimization process using mathematical formulations and descriptive text, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Repository https://github.com/Never More LCH/CCG |
| Open Datasets | Yes | To enable the quantitative evaluation for the consistency of compositional generalization across multiple levels, we build a new dataset in the context of visual question answering (VQA), i.e., GQA-CCG, based on the GQA dataset (Hudson and Manning 2019), a large-scale dataset organized for compositional VQA. ... For VQA, we evaluate the framework on our GQA-CCG dataset and the GQA dataset (Hudson and Manning 2019). ... For TVG, we use the recently released Charades-CG dataset (Li et al. 2022) that contains compositional referring expressions about real-world videos... |
| Dataset Splits | Yes | For a training set Dt, we first divide Dt into multiple validation sets {Dvi}K i=1 based on the compositional complexity of samples. ... We use the the train balanced split and the val all split of GQA in the process of constructing GQA-CCG, and here we denote them as Dt and Dv, respectively. |
| Hardware Specification | No | The paper discusses models and their parameters, and mentions 'GPU memory' in a theoretical context, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts, or specific computing platforms) used for running the experiments. |
| Software Dependencies | No | The paper mentions using tools like the 'benepar toolkit (Kitaev, Cao, and Klein 2019)' and 'GPT-3.5', and 'BERT embeddings (Devlin et al. 2019)', but does not provide specific version numbers for these or any other ancillary software dependencies used in the experimental setup. |
| Experiment Setup | Yes | Specifically, for the initial parameter θ(0), we train for Tp iterations to update θ by performing gradient descent. At each iteration (i = 1, ..., Tp), we update the parameter as follows: θ(i+1) = θ(i) β(i) θ θ(i), where θ(i) = d Lt dθ(i) , and β(i) θ is the learning rate of θ at iteration i. ... At each iteration (j = 1, ..., Tm), we perform a meta update operation to ωi as follows: ω(j+1) i = ω(j) i β(j) ωi ω(j) i , where ω(j) i = d Lv dω(j) i , and β(j) ωi is the learning rate of ωi at iteration j. |