C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Authors: Guoxin Chen, Minpeng Liao, Peiying Yu, Dingmin Wang, Zile Qiao, Chao Yang, Xin Zhao, Kai Fan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in both in-domain and outof-distribution scenarios demonstrate that C-3PO significantly enhances RAG performance while maintaining plug-and-play flexibility and superior generalization capabilities. Code is available at https://github.com/Chen-GX/C-3PO. |
| Researcher Affiliation | Collaboration | 1Gaoling School of Artificial Intelligence, Renmin University of China 2Tongyi Lab 3Soochow University 4University of Oxford 5Tsinghua University. Correspondence to: Guoxin Chen <EMAIL>, Minpeng Liao <EMAIL>, Wayne Xin Zhao <EMAIL>, Kai Fan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Inference Process of our C-3PO input question q, the retrieval server (Retriever), the LLM server (LLM), the proxy model in our C-3PO π, instruction for different agent (Reasoning Router, Information Filter, and Decision Maker). output The Answer. 1: a1 π(q, instruction1) {Reasoning Router agent} |
| Open Source Code | Yes | Code is available at https://github.com/Chen-GX/C-3PO. |
| Open Datasets | Yes | To comprehensively evaluate our C-3PO, we experiment on both single-hop datasets including Natural Questions (NQ) (Kwiatkowski et al., 2019), Pop QA (Mallen et al., 2023), and Trivia QA (TQA) (Joshi et al., 2017), as well as multi-hop datasets including 2Wiki Multi Hop QA (2Wiki) (Ho et al., 2020), Musique (Trivedi et al., 2022), and Hotpot QA (HQA) (Yang et al., 2018). |
| Dataset Splits | Yes | For each dataset, we only use 6000 randomly sampled questions instead of the full training set. ... For the in-domain test sets, we randomly sampled 1,000 instances as the test set. |
| Hardware Specification | No | Our implementation supports two high-performance inference engines: SGLang and VLLM, allowing users to optimize for different deployment scenarios and hardware configurations. ... We utilize Qwen2-72B-Instruct (Yang et al., 2024a) as fixed LLM server, while Qwen2-0.5B or Qwen2-1.5B is trained as candidate lightweight proxy for efficient edge deployment. The paper does not specify concrete hardware details such as GPU or CPU models. |
| Software Dependencies | No | We utilize Llama-Factory (Zheng et al., 2024) as our training framework for the initial supervised fine-tuning phase. ... For the RL training phase, we adopt Open RLHF (Hu et al., 2024) as our primary training framework, coupled with VLLM (Kwon et al., 2023) inference engine. ... We integrate SGLang4 as our LLM server, which provides compatibility with various state-of-the-art language models, including Qwen2-72B-Instruct (Yang et al., 2024a) and Llama3.3-70B-Instruct (Dubey et al., 2024). ... We employ contriever-msmarco (Izacard et al., 2022) as our dense retriever. While specific software components are named, no version numbers for these components or any other underlying software (e.g., Python, PyTorch) are provided. |
| Experiment Setup | Yes | Table 6. Key hyperparameters in the supervised warm-up phase. Hyperparameter Value Learning Rate 4e-5 Batch size 512 #Epochs 3 Optimizer type Adam W (Loshchilov & Hutter, 2019) ... Table 7. Key hyperparameters in the RL phase. Hyperparameter Value Learning Rate of Policy model 5e-7 Learning Rate of Value model 5e-6 Batch size 1024 KL Coefficient 0.005 Optimizer type Adam ... |