Rapid Selection and Ordering of In-Context Demonstrations via Prompt Embedding Clustering
Authors: Kha Pham, Hung Le, Man Ngo, Truyen Tran
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we investigate the prompt embedding space... We provide extensive analyses to confirm the clustering property. In particular, we visualize prompt embeddings in 2D spaces using UMAP, run K-Means clustering on high-dimensional embedding spaces, and quantify the importance of input tokens by their partial derivative norms. Experimental results consistently support the existence of clusters... We apply Cluster-based Search in two selection scenarios... In both cases, our proposed method achieves competitive accuracies compared to exhaustive search while being significantly faster saving 92% to nearly 100% execution time. |
| Researcher Affiliation | Academia | 1 Applied Artificial Intelligence Institute, Deakin University 2 Faculty of Data Science in Business, Ho Chi Minh University of Banking, Vietnam 1 EMAIL 2 EMAIL |
| Pseudocode | Yes | Algorithm 1 Entropy-Based Selecting Criterion Input: set of prompt candidates P set c Best = inf set p Best = None for p in P do compute logits ℓ 1 compute confidence score c (ℓ 1) if c (ℓ 1) > c Best then c Best = c (ℓ 1) p Best = p end if end for Output: p Best |
| Open Source Code | No | The paper does not provide an explicit statement or a link indicating that the authors have released the source code for their methodology. |
| Open Datasets | Yes | For text classification, we consider tasks of sentiment classification and language identification. We use data from SST-2 (Socher et al., 2013) dataset for sentiment classification... Dataset for language identification is taken from Hugging Face (Hugging Face, 2021)... For the common-sense reasoning task, we leverage question-answer pairs from the Commonsense QA dataset (Talmor et al., 2019)... For the mathematical arithmetic task, we use questions and answers from the Add Sub dataset (Hosseini et al., 2014)... we train decoder-only Transformers from scratch on Wiki Text2 dataset (Merity et al., 2017). |
| Dataset Splits | No | The paper mentions generating prompts with k demonstrations from a k_total pool and using '1,000 tuples of (E, q)' or '100 randomized prompts' for experiments. For Transformers trained from scratch, it states training on 'Wiki Text2 dataset with SGD optimizer with learning rate 5e-1 in 100 epochs.' However, it does not explicitly specify traditional training/validation/test splits for any of the datasets used for either the pre-trained LLMs or the custom-trained Transformers. |
| Hardware Specification | No | The paper mentions various LLMs used (GPT-2, GPT-Neo, Llama-v1/v2, MPT, Phi-2, Qwen-2.5) and describes their architecture (e.g., 12 self-attention layers, token embedding size 768 for custom-trained Transformers), but it does not specify the particular GPU models, CPU types, or other hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions using UMAP, K-Means clustering, t-SNE for visualizations and analysis, and SGD optimizer for training. However, it does not provide specific version numbers for any of these software components, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | Specifically, we train from scratch four Transformers: one with full components, one without positional encoding, one without causal attention mask, and one without both. Other Transformer components are always included. Each Transformer has 12 self-attention layers, each with 12 attention heads; the token embedding size is 768, and the hidden size in MLP layers is 2048. We train the Transformers with different types of positional encodings, namely the sinusoidal, rotary, and trainable positional encoding, on Wiki Text2 dataset with SGD optimizer with learning rate 5e-1 in 100 epochs. For fair comparisons, the training task for all Transformers is next-token prediction. |