STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning
Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen, Tianlin Li, Weipeng Jiang, Yang Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate STAFF on three LLMs and three downstream tasks and show that STAFF improves the performance of SOTA methods by up to 54.3% and reduces selection overhead by up to 70.5% at different pruning rates. Experiment results show that STAFF outperforms SOTA methods in coreset selection across different pruning rates, improving fine-tuning performance by up to 54.3% compared to the best baseline method and saving up to 70.5% of selection overhead. |
| Researcher Affiliation | Academia | 1Xi an Jiaotong University 2University of Massachusetts, Amherst 3Nanyang Technological University {EMAIL,chaoshen@xjtu,EMAIL}.edu.cn EMAIL {EMAIL,yangliu@ntu}.edu.sg |
| Pseudocode | Yes | Algorithm 1 STAFF for Coreset Selection |
| Open Source Code | Yes | Our code is publicly available at https: //github.com/shiningrain/STAFF. Our implementation and data are publically available1. 1Our code is available at https://github.com/shiningrain/STAFF. To follow the Open Science Policy and support reproducibility, we have released code about our implementations and evaluations. All resources are available in https://github.com/shi ningrain/STAFF. |
| Open Datasets | Yes | We evaluate STAFF on three datasets on different downstream tasks, namely, the Bio Instruct dataset (Tran et al., 2024) (biology question-answering), Dialog Sum dataset (Chen et al., 2021) (dialogue summarization), and the Kazakh-English subset of WMT-19 dataset (Barrault et al., 2019) (translation of minority languages). |
| Dataset Splits | Yes | In the experiment, we divided each dataset into the training set and the test set according to a ratio of 9:1. |
| Hardware Specification | Yes | All fine-tuning experiments are conducted on one NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | While the paper mentions software like Lo RA for fine-tuning and a fine-tuning framework, it does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We set fine-tuning budget T in selection to 3 and K to 50. The number of samples used in verification for each bin (bv) is 10. For fine-tuning pre-trained models on three datasets of downstream tasks, we perform a grid search over learning rate {1e 5, 2e 5, 1e 4, 2e 4} and the batch size {2, 4, 8}. We opt for a fixed number of epochs (e.g., 4 epochs) in all experiments. Table 5 provides specific learning rates for each model on different datasets. |