MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models
Authors: Yujing Wang, Hainan Zhang, Liang Pang, Binghui Guo, Hongwei Zheng, Zhiming Zheng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on two conversational RAG datasets demonstrate that Ma Fe Rw achieves superior generation metrics and more stable training compared to baselines. Further analysis shows that multi-aspect dense rewards provide a more stable training process and generation results than single reward, validating the stability and transferability of Ma Fe Rw. |
| Researcher Affiliation | Academia | Yujing Wang1,2, Hainan Zhang1,2*, Liang Pang4, Binghui Guo2, Hongwei Zheng3, Zhiming Zheng1,2 1Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing 2 School of Artificial Intelligence, Beihang University, China 3Beijing Academy of Blockchain and Edge Computing, China 4Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical formulations but does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | Code https://github.com/TAP-LLM/Ma Fe Rw |
| Open Datasets | Yes | We conduct main experiments on two multi-turn dialogue RAG datasets, including QRe CC (Anantha et al. 2020) and Topi OCQA (Adlakha et al. 2022). And conduct the transferability experiment on the WSDM@24 Multi Doc QA dataset 1. 1https://sites.google.com/view/wsdm24-docqa |
| Dataset Splits | No | The paper mentions using 'test sets' for reward model accuracy but does not provide specific details on how the datasets (QRe CC, Topi OCQA, WSDM@24 Multi Doc QA) were split into training, validation, or test sets with percentages or sample counts. |
| Hardware Specification | No | The paper mentions the use of specific models like T5-base and Llama-2-13b-chat, but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'pre-trained T5-base model', 'FAISS (Johnson, Douze, and Jegou 2021)', 'msmarco-roberta-base-ance-firstp (Reimers and Gurevych 2019)', and 'Llama-2-13b-chat model (Touvron et al. 2023)'. However, it does not provide specific version numbers for these software libraries, models, or any underlying programming languages or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | No | Details on hyperparameter determination are provided in the Appendix2. |