reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG

Authors: Wenbin Wang, Yongcheng Jing, Liang Ding, Yingjie Wang, Li Shen, Yong Luo, Bo Du, Dacheng Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on HR benchmarks demonstrate the significant effectiveness of RAP, with LLa VA-v1.513B achieving a 43% improvement on V Bench and 19% on HR-Bench.
Researcher Affiliation	Academia	1School of Computer Science, National Engineering Research Center for Multimedia Software and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, China 2Nanyang Technological University, Singapore 639798 3The University of Sydney, Australia 4Shenzhen Campus of Sun Yat-sen University, China.
Pseudocode	Yes	Algorithm 1 Spatial-Awareness Layout; Algorithm 2 Retrieval-Augmented Perception
Open Source Code	Yes	Code is available at https://github.com/Dream Mr/RAP.
Open Datasets	Yes	We evaluate our RAP on two HR benchmarks: V Bench and HR-Bench. V Bench, derived from SA-1B (Kirillov et al., 2023), averages a resolution of 2246 1582. More details about HR-Bench can be found in Sect. 3.1. HR-Bench 8K, with 8K-resolution images from DIV8K (Gu et al., 2019) and the Internet, includes Fine-grained Single-instance Perception (FSP) and Fine-grained Cross-instance Perception (FCP) tasks.
Dataset Splits	No	The paper describes dataset characteristics and task types (FSP, FCP) for HR-Bench and V Bench, but does not explicitly provide information on how these datasets are split into training, validation, or testing sets with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or computing cluster specifications used for running the experiments.
Software Dependencies	No	The paper mentions various MLLMs (e.g., LLaVA-v1.5, LLaVA-v1.6) and components like Vis RAG and Sig LIP, but it does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	We set τ = 0.6 throughout the paper. where b is a bias value, set here at 0.2 and d denotes the depth of the image tree.