EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
Authors: Yunxiao Zhang, Guanming Xiong, Haochen Li, Wen Zhao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the performance of our method. Our method achieves competitive results on the Hotpot QA and Web Shop and datasets, requiring 75% and 50% less data, respectively, while outperforming existing methods. |
| Researcher Affiliation | Collaboration | Yunxiao Zhang1 , Guanming Xiong1 , Haochen Li2 and Wen Zhao1 1Peking University 201.AI EMAIL, gm EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes steps in regular paragraph text without structured formatting like pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Hotpot QA [Yang et al., 2018] is a multi-hop question-answering benchmark... Web Shop [Yao et al., 2022] is a simulated online shopping environment... |
| Dataset Splits | Yes | For Hotpot QA, we use the first 10,000 training questions as the data pool and randomly select 500 dev questions. For Web Shop, we use 8,500 instructions as the data pool and another 500 instructions for evaluation. For each dataset, we selected 30 samples with the lowest GE score for guideline updating, and then annotated 800 samples for fine-tuning. |
| Hardware Specification | Yes | For fine-tuning, we choose LLAMA-3.1-8B-Instruct (L-8B) and Mistral-7B-Instruct-v0.3 (M-7B), training for 4 epochs with a learning rate of 5e-6 using 8 NVIDIA 80GB A100 GPUs. |
| Software Dependencies | No | The paper mentions using the Open AI GPT4-o API (gpt-4o-2024-08-06) and specific pre-trained models (LLAMA-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3) but does not provide versions for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA. |
| Experiment Setup | Yes | For all inference, we set temperature=0.7, top p=0.95, max length=512. For fine-tuning, we choose LLAMA-3.1-8B-Instruct (L-8B) and Mistral-7B-Instruct-v0.3 (M-7B), training for 4 epochs with a learning rate of 5e-6 using 8 NVIDIA 80GB A100 GPUs. |