POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval

Authors: Yaoyang Liu, Junlin Li, Yinjun Wu, Zhen Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies on representative RAG-based QA tasks show that POQD outperforms existing query decomposition strategies in both retrieval performance and end-to-end QA accuracy. POQD is available at https://github.com/PKU-SDS-lab/POQD-ICML25. 5. Experiments 5.1. Experimental setup Performance analysis We perform end-to-end RAG training on the QA datasets introduced in Section 5.1. For this experiment, we not only report the end-to-end QA accuracy in Table 2 but also compare the ground-truth relevant documents or images against the retrieved ones by POQD and baseline methods in Table 1.
Researcher Affiliation Academia 1School of Infomation, Renmin University 2School of Computer Science, Peking University 3Fundamental Industry Training Center, Tsinghua University. Correspondence to: Yinjun Wu <EMAIL>, Zhen Chen <EMAIL>.
Pseudocode Yes Algorithm 1 Optimize query decomposition Algorithm 2 Training POQD
Open Source Code Yes POQD is available at https://github.com/PKU-SDS-lab/POQD-ICML25.
Open Datasets Yes We employ Web Questions (Web QA) (Berant et al., 2013; Chang et al., 2021), Multi Modal QA (Talmor et al.), Many Modal QA (Hannan et al., 2020) and Strategy QA (Geva et al., 2021a) dataset for experiments. Among these datasets, the former three include questions requiring retrieval from multi-modal data.
Dataset Splits No The paper uses well-known benchmark datasets but does not explicitly describe the train/validation/test splits used for its experiments, nor does it refer to predefined splits with specific citations or file names. It mentions selecting questions from datasets but not how the datasets themselves were partitioned for training, validation, and testing.
Hardware Specification No The paper mentions "GPU Cluster support with AIBD platform from Fundamental Industry Training Center of Tsinghua University" but does not provide specific details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions various models like "Sentence-Bert model (Reimers, 2019)", "CLIP model (Radford et al., 2021)", "Llama3.1-8B (Dubey et al., 2024)", "Llava-v1.5-7B (Liu et al., 2024)", "GPT-4 (Achiam et al., 2023)", and "RoBERTa model (Liu et al., 2019)". However, it does not specify versions for general software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes Throughout the experiments, the default values of α, τ and κ are configured as 0.02, 3 and 5, respectively. Regarding the configuration for the retrieval process, we retrieve the Top-1 most relevant images and the Top-2 most relevant documents in the image QA and text QA tasks, respectively.