reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion

Authors: Qingguo Hu, Ante Wang, Jia Song, Delai Qiu, Qingsong Liu, Jinsong Su

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate substantial gains across four challenging specialized tasks and four widely-used comprehensive benchmarks. Especially on specialized tasks, our method achieves an average improvement of 5.4% and 4.0% compared to the corresponding baselines when utilizing LLa VA1.5-7B and LLa VA-1.5-13B, respectively.
Researcher Affiliation	Collaboration	1School of Informatics, Xiamen University, China 2Xiamen Unisound Intelligence Technology Co., Ltd 3Shanghai Artificial Intelligence Laboratory, China 4Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China
Pseudocode	No	The paper describes methods and processes through text and figures (e.g., Figure 2 and Figure 3 illustrate data construction and self-improvement overview), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and the supplementary file are available at https: //github.com/XMUDeep LIT/CVC.
Open Datasets	Yes	The CVC instances are constructed based on COCO dataset [Lin et al., 2014]. We conduct in-depth analyses on a range of challenging specialized tasks and widely-used comprehensive benchmarks, aiming to test the effectiveness of our method on the deep visual perception and general capabilities of LVLMs, respectively. Challenging specialized tasks: MMVP [Tong et al., 2024], Winoground [Thrush et al., 2022], V Bench [Wu and Xie, 2024], and VSR [Liu et al., 2023b] and comprehensive benchmarks: MME [Fu et al., 2023], MMBench [Liu et al., 2023c], SEEDBench [Li et al., 2023] and MM-Vet [Yu et al., 2023].
Dataset Splits	Yes	By default, we use 90K of our data for training across all experiments unless otherwise noted. During training, we combine our data with the 665K instruction data from LLa VA-1.5 for multimodal instruction tuning. We follow [Liu et al., 2024a] to use the same testing scripts and evaluation metrics for fair comparison.
Hardware Specification	Yes	All experiments are conducted on 8 A100 80G GPUs.
Software Dependencies	No	The paper mentions several models and frameworks used (e.g., LLaMA2-7B, RoBERTa, GLIP, SAM, LLaVA-1.5), but does not provide specific version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We set γ, N, and α to 0.3, 16, and 0.75, respectively. During training, we combine our data with the 665K instruction data from LLa VA-1.5 for multimodal instruction tuning. To ensure a fair comparison, our training starts from the pretrained (i.e., not yet instruction-tuned) weights of LLa VA1.5, following the same training hyperparameters.