Personality Alignment of Large Language Models
Authors: Minjun Zhu, Yixuan Weng, Linyi Yang, Yue Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work paves the way for future AI systems to make decisions and reason personally, enhancing the relevance and meaning of AI interactions for each user and advancing human-centered artificial intelligence. The dataset and code are released at https://github.com/zhu-minjun/PAlign. ... Experiments have shown its high alignment efficiency compared to DPO and PPO (Schulman et al., 2017; Rafailov et al., 2023) where it requires only 1/6 of the time to achieve superior performance. It even outperforms the GPT-4o model using PAS based on the Llama-3-8B model. ... 5 EXPERIMENTS |
| Researcher Affiliation | Academia | Minjun Zhu1,2, Yixuan Weng2, Linyi Yang3, Yue Zhang2 1Zhejiang University 2School of Engineering, Westlake University 3University College London zhuminjun,EMAIL, EMAIL EMAIL |
| Pseudocode | Yes | THE REWARD FUNCTION OF PPO def calculate_score(text, correct_option): """ This function calculates the reward score for a given response based on the DPO (Direct Policy Optimization) framework. It compares the response description to a correct option and returns a score reflecting the accuracy. Parameters: text (str): The response text to be evaluated. correct_option (str): The correct answer description. Returns: int: The calculated reward score, ranging from -5 to 0. If the response does not match any known descriptions, returns -6. """ # Mapping of scores to their corresponding descriptive labels SCORES_BACK = { 5: Very Accurate , 4: Moderately Accurate , 3: Neither Accurate Nor Inaccurate , 2: Moderately Inaccurate , 1: Very Inaccurate , 0: Unknown } # Iterate over the scores and their descriptions for score, description in SCORES_BACK.items(): if description in text: # Find the score corresponding to the correct option correct_score = next(key for key, value in SCORES_BACK.items() if value == correct_option) # Calculate and return the negative absolute difference between the scores return -abs(score - correct_score) # Return -6 if the text does not match any known descriptions return -6 |
| Open Source Code | Yes | The dataset and code are released at https://github.com/zhu-minjun/PAlign. |
| Open Datasets | Yes | Inspired by psychometrics, we created the Personality Alignment with Personality Inventories (PAPI) dataset... The dataset and code are released at https://github.com/zhu-minjun/PAlign. ... We have collected 307,313 samples based on the IPIP-NEO-120 and IPIP-NEO-300 from the International Personality Item Pool (IPIP)1. Additionally, we collected 18,192 Dark Triad samples2... 1https://ipip.ori.org/index.htm 2https://openpsychometrics.org |
| Dataset Splits | Yes | We apply K-Means clustering separately to the IPIP and Dark Triad datasets. From the IPIP dataset (307,313 samples), we select 300 representative clusters as the first part of our Test-Set. Similarly, we cluster the Dark Triad dataset (18,192 samples) and select 300 representative samples, forming the second part of our Test-Set. The remaining data constitutes our Dev-Set. |
| Hardware Specification | Yes | For the Llama-3 models, we employ the Huggingface (Wolf et al., 2019) and Pytorch (Paszke et al., 2019) framework to set up local inference on the NVIDIA A100 GPUs. Furthermore, for the Llama-3-70B model, we use bf4 (Dettmers et al., 2022) for inference. |
| Software Dependencies | No | For the Llama-3 models, we employ the Huggingface (Wolf et al., 2019) and Pytorch (Paszke et al., 2019) framework to set up local inference on the NVIDIA A100 GPUs. Furthermore, for the Llama-3-70B model, we use bf4 (Dettmers et al., 2022) for inference. |
| Experiment Setup | Yes | DPO (Rafailov et al., 2023): ...We maintain the original settings with a learning rate of 5e-4, warmup steps of 100, and a weight decay of 0.05. We use the Adam W optimizer (Kingma & Ba, 2014), a batch size of 16, and a Lo RA alpha of 100. We train each model for 250 steps to ensure sufficient training. ... PPO (Schulman et al., 2017): ...following the original implementation with a learning rate of 1.41e-5, a batch size of 16, and a Lo RA alpha of 100. We train each model for 250 steps. |