reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linear Representations of Political Perspective Emerge in Large Language Models

Authors: Junsol Kim, James Evans, Aaron Schein

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformerbased LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers ideology, we then show that the same probes can predict measures of news outlets slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance.
Researcher Affiliation	Collaboration	Junsol Kim University of Chicago EMAIL James Evans University of Chicago Google EMAIL EMAIL Aaron Schein University of Chicago EMAIL
Pseudocode	No	The paper describes steps in regular paragraph text and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The data and code for reproducing our results are available on Github5. 5https://github.com/Junsol Kim/Representation Political LLM
Open Datasets	Yes	We use the first dimension of DW-NOMINATE scores for all lawmakers associated with the 116th United States Congress (N=552).2... We use data from Ad Fontes Media, which scores U.S. news outlets on a 5-point scale from Left to Right... This analysis utilizes the Manifesto Project dataset, which provides ideological labels y for 411 political parties worldwide on a left-to-right continuum (from -50 = left to 50 = right) (Gemenis, 2013).
Dataset Splits	Yes	To evaluate the fit of each linear probe, we performed 2-fold cross-validation, using a random partition of lawmakers into two folds of equal size.
Hardware Specification	No	The paper mentions the large language models used (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b, Gemma-2-2b) but does not provide specific details about the hardware (e.g., GPU type, CPU type, memory) on which these models were run or trained for the experiments.
Software Dependencies	No	The paper mentions the use of various large language models (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b, GPT-4o) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used for the experiments.
Experiment Setup	Yes	We set the regularization strength λ = 1 (see Equation (5)) after performing 2-fold cross-validation for the values {0, 0.001, 0.01, 0.1, 1, 100, 1000} (see Table A1)... In total, we generated 1,134 essays across three models, nine policy issues, and combinations of six values of K {16, 32, 48, 64, 80, 96} values and seven values of α { 30, 20, 10, 0, 10, 20, 30}.