Large Language Models are Demonstration Pre-Selectors for Themselves
Authors: Jiarui Jin, Yuwei Wu, Haoxuan Li, Xiaoting He, Weinan Zhang, Yiming Yang, Yong Yu, Jun Wang, Mengyue Yang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with LLMs ranging from 300M to 8B parameters show that FEEDER can reduce training data size by over 20% while maintaining performance and seamlessly integrating with various downstream demonstration selection strategies in ICL. Our empirical evaluations encompass six LLM bases, ranging from 335M to 7B parameters, and include six demonstration selectors in the demonstration selection stage, applied to text classification, reasoning, and semantic parsing tasks. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University 2Xiaohongshu Inc. 3Carnegie Mellon University 4Peking University 5No Affiliation 6University College London 7University of Bristol. |
| Pseudocode | Yes | Algorithm 1 Bi-level Optimization Input: Training dataset DTRAIN, LLM ΨLLM. Output: Approximated subset e DFEEDER, tuned LLM ΨLLM. Algorithm 2 Approximation Algorithm for FEEDER Input: Training dataset DTRAIN. Output: An approximated FEEDER set e DFEEDER. Algorithm 3 Exact Algorithm for FEEDER Input: Training dataset DTRAIN. Output: An exact FEEDER set e DFEEDER. Algorithm 4 Alternative Exact Algorithm for FEEDER Input: Training dataset DTRAIN. Output: Exact FEEDER e DFEEDER. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Our evaluations are mainly conducted on 6 text classification datasets: SST-2 (Socher et al., 2013), SST-5 (Socher et al., 2013), COLA (Warstadt et al., 2018), TREC (Voorhees & Tice, 2000), SUBJ (Pang & Lee, 2004), and FPB (Malo et al., 2014). We also further assess FEEDER on the reasoning dataset GSM8K (Cobbe et al., 2021), the semantic-parsing dataset SMCALFlow (Andreas et al., 2020), and the scientific question-answering dataset GPQA (Rein et al., 2024). |
| Dataset Splits | Yes | For each dataset, we directly follow the official splits to obtain DTRAIN and DTEST. We report both the mean and variance of accuracy using 8 different seeds and 5 different permutations of n-shots. |
| Hardware Specification | Yes | All our experiments are conducted with NVIDIA A100s. |
| Software Dependencies | No | The paper mentions "Sentence Transformers library2 from Hugging Face" but does not specify a version number for this library or any other software component used in the experiments. |
| Experiment Setup | Yes | The batch size is set as 32, the warm steps is set as 100, the learning rate is set as 5 × 10−5, and the weight decay is set as 0.01. |