reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistency of Compositional Generalization Across Multiple Levels

Authors: Chuanhao Li, Zhen Li, Chenchen Jing, Xiaomeng Fan, Wenbo Ye, Yuwei Wu, Yunde Jia

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We incorporate various types of methods of two tasks including VQA and temporal video grounding (TVG) into our framework, and conduct experiments on our GQA-CCG dataset, the GQA dataset and the Charades-CG dataset (Li et al. 2022), for validating the effectiveness and generalizability of our framework. Experimental results show that our framework effectively enhances the consistency of compositional generalization across multiple levels and improves the accuracy of compositional generalization at different levels, while maintaining comparable independent and identically distributed (IID) generalization capability.
Researcher Affiliation	Academia	1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology 2Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University 3School of Computer Science, Zhejiang University EMAIL EMAIL
Pseudocode	No	The paper describes the proposed framework and optimization process using mathematical formulations and descriptive text, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Repository https://github.com/Never More LCH/CCG
Open Datasets	Yes	To enable the quantitative evaluation for the consistency of compositional generalization across multiple levels, we build a new dataset in the context of visual question answering (VQA), i.e., GQA-CCG, based on the GQA dataset (Hudson and Manning 2019), a large-scale dataset organized for compositional VQA. ... For VQA, we evaluate the framework on our GQA-CCG dataset and the GQA dataset (Hudson and Manning 2019). ... For TVG, we use the recently released Charades-CG dataset (Li et al. 2022) that contains compositional referring expressions about real-world videos...
Dataset Splits	Yes	For a training set Dt, we first divide Dt into multiple validation sets {Dvi}K i=1 based on the compositional complexity of samples. ... We use the the train balanced split and the val all split of GQA in the process of constructing GQA-CCG, and here we denote them as Dt and Dv, respectively.
Hardware Specification	No	The paper discusses models and their parameters, and mentions 'GPU memory' in a theoretical context, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts, or specific computing platforms) used for running the experiments.
Software Dependencies	No	The paper mentions using tools like the 'benepar toolkit (Kitaev, Cao, and Klein 2019)' and 'GPT-3.5', and 'BERT embeddings (Devlin et al. 2019)', but does not provide specific version numbers for these or any other ancillary software dependencies used in the experimental setup.
Experiment Setup	Yes	Specifically, for the initial parameter θ(0), we train for Tp iterations to update θ by performing gradient descent. At each iteration (i = 1, ..., Tp), we update the parameter as follows: θ(i+1) = θ(i) β(i) θ θ(i), where θ(i) = d Lt dθ(i) , and β(i) θ is the learning rate of θ at iteration i. ... At each iteration (j = 1, ..., Tm), we perform a meta update operation to ωi as follows: ω(j+1) i = ω(j) i β(j) ωi ω(j) i , where ω(j) i = d Lv dω(j) i , and β(j) ωi is the learning rate of ωi at iteration j.