Linear Representations of Political Perspective Emerge in Large Language Models

Authors: Junsol Kim, James Evans, Aaron Schein

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformerbased LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers ideology, we then show that the same probes can predict measures of news outlets slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance.
Researcher Affiliation Collaboration Junsol Kim University of Chicago EMAIL James Evans University of Chicago Google EMAIL EMAIL Aaron Schein University of Chicago EMAIL
Pseudocode No The paper describes steps in regular paragraph text and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The data and code for reproducing our results are available on Github5. 5https://github.com/Junsol Kim/Representation Political LLM
Open Datasets Yes We use the first dimension of DW-NOMINATE scores for all lawmakers associated with the 116th United States Congress (N=552).2... We use data from Ad Fontes Media, which scores U.S. news outlets on a 5-point scale from Left to Right... This analysis utilizes the Manifesto Project dataset, which provides ideological labels y for 411 political parties worldwide on a left-to-right continuum (from -50 = left to 50 = right) (Gemenis, 2013).
Dataset Splits Yes To evaluate the fit of each linear probe, we performed 2-fold cross-validation, using a random partition of lawmakers into two folds of equal size.
Hardware Specification No The paper mentions the large language models used (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b, Gemma-2-2b) but does not provide specific details about the hardware (e.g., GPU type, CPU type, memory) on which these models were run or trained for the experiments.
Software Dependencies No The paper mentions the use of various large language models (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b, GPT-4o) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used for the experiments.
Experiment Setup Yes We set the regularization strength λ = 1 (see Equation (5)) after performing 2-fold cross-validation for the values {0, 0.001, 0.01, 0.1, 1, 100, 1000} (see Table A1)... In total, we generated 1,134 essays across three models, nine policy issues, and combinations of six values of K {16, 32, 48, 64, 80, 96} values and seven values of α { 30, 20, 10, 0, 10, 20, 30}.