reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models

Authors: Quang-Hung Le, Long Hoang Dang, Ngan Hoang Le, Truyen Tran, Thao Minh Le

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our Prom Vi L framework significantly outperforms baselines on various visual grounding and compositional question answering tasks.
Researcher Affiliation	Academia	Quang-Hung Le1, Long Hoang Dang2, Ngan Hoang Le3, Truyen Tran1, Thao Minh Le1 1Applied Artificial Intelligence Institute (A2I2), Deakin University, Australia 2Posts and Telecommunications Institute of Technology, Vietnam 3University of Arkansas, USA EMAIL, EMAIL, thile@@uark.edu, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Progressive Multi-granularity Decoding
Open Source Code	No	We will release our code and datasets to support further research in this field.
Open Datasets	Yes	Our experiments are reproducible in academia, using only public data and models. We introduce a dataset construction pipeline to create a new dataset of nested compositional V-L pairs curated from Visual Genome, enabling training on multiple complexity levels.
Dataset Splits	No	The paper mentions using Compo VL for training and Compo VL-hard for evaluation, and refers to 'val' and 'test' splits for standard benchmarks like GQA and Ref COCOg, implying standard splits. However, specific percentages or sample counts for training/validation splits within their main Compo VL dataset are not explicitly provided.
Hardware Specification	Yes	Fine-tuning takes around 7 hours on a single NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions 'Lo RA (Hu et al. 2021) tuning', 'spa Cy (Honnibal et al. 2020)', and 'Berkeley Neural Parser (Kitaev, Cao, and Klein 2019)' but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	We perform Lo RA (Hu et al. 2021) tuning with r=64, learning rate 1e-4, warm-up ratio 0.1, and batch size 4.