PersonalLLM: Tailoring LLMs to Individual Preferences

Authors: Thomas Zollo, Andrew Siah, Naimeng Ye, Li, Hongseok Namkoong

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore basic in-context learning and meta-learning baselines to illustrate the utility of Personal LLM and highlight the need for future methodological development.
Researcher Affiliation Academia Thomas P. Zollo Columbia University EMAIL Andrew Wei Tung Siah Columbia University EMAIL Naimeng Ye Columbia University EMAIL Ang Li Columbia University EMAIL Hongseok Namkoong Columbia University EMAIL
Pseudocode Yes C.1 PSEUDOCODE Below is the pseudocode for the baselines in Section 4. Actual code is available at Algorithm 1 Meta Learn KShot ICLAlgorithm
Open Source Code Yes Our data 1 and code 2 are publicly available, and full documentation for our dataset is available in Appendix A. ... 2https://github.com/namkoong-lab/Personal LLM
Open Datasets Yes Our dataset is available at https:// huggingface.co/datasets/namkoong-lab/Personal LLM.
Dataset Splits Yes We split the resulting dataset into 9,402 training examples and 1,000 test examples.
Hardware Specification No The paper does not explicitly describe the specific hardware (GPU/CPU models, memory, etc.) used to run its experiments. It mentions LLMs used for response generation but not the experimental compute infrastructure.
Software Dependencies No Semantic features were captured using pre-trained classifiers, while syntactic features were engineered using nltk (Bird and Loper, 2004). ... Our linear regression models are built using sklearn (Pedregosa et al., 2011), with default parameter settings. ... The paper mentions the use of 'nltk' and 'sklearn' but does not provide specific version numbers for these software dependencies, only citations to their foundational papers.
Experiment Setup Yes Inference is performed using 1, 3, and 5 such examples (see Appendix C.1 for exact templates), and evaluated by scoring with each user s (weighted-ensembled) preference model. We also compare to a zero-shot baseline, with no personalization. ... C.2 PROMPT TEMPLATE Below is a prompt template we used in our experiments for winning and losing responses appended during inference.