PersonalLLM: Tailoring LLMs to Individual Preferences
Authors: Thomas Zollo, Andrew Siah, Naimeng Ye, Li, Hongseok Namkoong
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore basic in-context learning and meta-learning baselines to illustrate the utility of Personal LLM and highlight the need for future methodological development. |
| Researcher Affiliation | Academia | Thomas P. Zollo Columbia University EMAIL Andrew Wei Tung Siah Columbia University EMAIL Naimeng Ye Columbia University EMAIL Ang Li Columbia University EMAIL Hongseok Namkoong Columbia University EMAIL |
| Pseudocode | Yes | C.1 PSEUDOCODE Below is the pseudocode for the baselines in Section 4. Actual code is available at Algorithm 1 Meta Learn KShot ICLAlgorithm |
| Open Source Code | Yes | Our data 1 and code 2 are publicly available, and full documentation for our dataset is available in Appendix A. ... 2https://github.com/namkoong-lab/Personal LLM |
| Open Datasets | Yes | Our dataset is available at https:// huggingface.co/datasets/namkoong-lab/Personal LLM. |
| Dataset Splits | Yes | We split the resulting dataset into 9,402 training examples and 1,000 test examples. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (GPU/CPU models, memory, etc.) used to run its experiments. It mentions LLMs used for response generation but not the experimental compute infrastructure. |
| Software Dependencies | No | Semantic features were captured using pre-trained classifiers, while syntactic features were engineered using nltk (Bird and Loper, 2004). ... Our linear regression models are built using sklearn (Pedregosa et al., 2011), with default parameter settings. ... The paper mentions the use of 'nltk' and 'sklearn' but does not provide specific version numbers for these software dependencies, only citations to their foundational papers. |
| Experiment Setup | Yes | Inference is performed using 1, 3, and 5 such examples (see Appendix C.1 for exact templates), and evaluated by scoring with each user s (weighted-ensembled) preference model. We also compare to a zero-shot baseline, with no personalization. ... C.2 PROMPT TEMPLATE Below is a prompt template we used in our experiments for winning and losing responses appended during inference. |