Personality Alignment of Large Language Models

Authors: Minjun Zhu, Yixuan Weng, Linyi Yang, Yue Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our work paves the way for future AI systems to make decisions and reason personally, enhancing the relevance and meaning of AI interactions for each user and advancing human-centered artificial intelligence. The dataset and code are released at https://github.com/zhu-minjun/PAlign. ... Experiments have shown its high alignment efficiency compared to DPO and PPO (Schulman et al., 2017; Rafailov et al., 2023) where it requires only 1/6 of the time to achieve superior performance. It even outperforms the GPT-4o model using PAS based on the Llama-3-8B model. ... 5 EXPERIMENTS
Researcher Affiliation Academia Minjun Zhu1,2, Yixuan Weng2, Linyi Yang3, Yue Zhang2 1Zhejiang University 2School of Engineering, Westlake University 3University College London zhuminjun,EMAIL, EMAIL EMAIL
Pseudocode Yes THE REWARD FUNCTION OF PPO def calculate_score(text, correct_option): """ This function calculates the reward score for a given response based on the DPO (Direct Policy Optimization) framework. It compares the response description to a correct option and returns a score reflecting the accuracy. Parameters: text (str): The response text to be evaluated. correct_option (str): The correct answer description. Returns: int: The calculated reward score, ranging from -5 to 0. If the response does not match any known descriptions, returns -6. """ # Mapping of scores to their corresponding descriptive labels SCORES_BACK = { 5: Very Accurate , 4: Moderately Accurate , 3: Neither Accurate Nor Inaccurate , 2: Moderately Inaccurate , 1: Very Inaccurate , 0: Unknown } # Iterate over the scores and their descriptions for score, description in SCORES_BACK.items(): if description in text: # Find the score corresponding to the correct option correct_score = next(key for key, value in SCORES_BACK.items() if value == correct_option) # Calculate and return the negative absolute difference between the scores return -abs(score - correct_score) # Return -6 if the text does not match any known descriptions return -6
Open Source Code Yes The dataset and code are released at https://github.com/zhu-minjun/PAlign.
Open Datasets Yes Inspired by psychometrics, we created the Personality Alignment with Personality Inventories (PAPI) dataset... The dataset and code are released at https://github.com/zhu-minjun/PAlign. ... We have collected 307,313 samples based on the IPIP-NEO-120 and IPIP-NEO-300 from the International Personality Item Pool (IPIP)1. Additionally, we collected 18,192 Dark Triad samples2... 1https://ipip.ori.org/index.htm 2https://openpsychometrics.org
Dataset Splits Yes We apply K-Means clustering separately to the IPIP and Dark Triad datasets. From the IPIP dataset (307,313 samples), we select 300 representative clusters as the first part of our Test-Set. Similarly, we cluster the Dark Triad dataset (18,192 samples) and select 300 representative samples, forming the second part of our Test-Set. The remaining data constitutes our Dev-Set.
Hardware Specification Yes For the Llama-3 models, we employ the Huggingface (Wolf et al., 2019) and Pytorch (Paszke et al., 2019) framework to set up local inference on the NVIDIA A100 GPUs. Furthermore, for the Llama-3-70B model, we use bf4 (Dettmers et al., 2022) for inference.
Software Dependencies No For the Llama-3 models, we employ the Huggingface (Wolf et al., 2019) and Pytorch (Paszke et al., 2019) framework to set up local inference on the NVIDIA A100 GPUs. Furthermore, for the Llama-3-70B model, we use bf4 (Dettmers et al., 2022) for inference.
Experiment Setup Yes DPO (Rafailov et al., 2023): ...We maintain the original settings with a learning rate of 5e-4, warmup steps of 100, and a weight decay of 0.05. We use the Adam W optimizer (Kingma & Ba, 2014), a batch size of 16, and a Lo RA alpha of 100. We train each model for 250 steps to ensure sufficient training. ... PPO (Schulman et al., 2017): ...following the original implementation with a learning rate of 1.41e-5, a batch size of 16, and a Lo RA alpha of 100. We train each model for 250 steps.