EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness

Authors: Yunxiao Zhang, Guanming Xiong, Haochen Li, Wen Zhao

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the performance of our method. Our method achieves competitive results on the Hotpot QA and Web Shop and datasets, requiring 75% and 50% less data, respectively, while outperforming existing methods.
Researcher Affiliation Collaboration Yunxiao Zhang1 , Guanming Xiong1 , Haochen Li2 and Wen Zhao1 1Peking University 201.AI EMAIL, gm EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes steps in regular paragraph text without structured formatting like pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes Hotpot QA [Yang et al., 2018] is a multi-hop question-answering benchmark... Web Shop [Yao et al., 2022] is a simulated online shopping environment...
Dataset Splits Yes For Hotpot QA, we use the first 10,000 training questions as the data pool and randomly select 500 dev questions. For Web Shop, we use 8,500 instructions as the data pool and another 500 instructions for evaluation. For each dataset, we selected 30 samples with the lowest GE score for guideline updating, and then annotated 800 samples for fine-tuning.
Hardware Specification Yes For fine-tuning, we choose LLAMA-3.1-8B-Instruct (L-8B) and Mistral-7B-Instruct-v0.3 (M-7B), training for 4 epochs with a learning rate of 5e-6 using 8 NVIDIA 80GB A100 GPUs.
Software Dependencies No The paper mentions using the Open AI GPT4-o API (gpt-4o-2024-08-06) and specific pre-trained models (LLAMA-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3) but does not provide versions for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup Yes For all inference, we set temperature=0.7, top p=0.95, max length=512. For fine-tuning, we choose LLAMA-3.1-8B-Instruct (L-8B) and Mistral-7B-Instruct-v0.3 (M-7B), training for 4 epochs with a learning rate of 5e-6 using 8 NVIDIA 80GB A100 GPUs.