DarkBench: Benchmarking Dark Patterns in Large Language Models

Authors: Esben Kran, Hieu Minh Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, Mateusz Jurewicz

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce Dark Bench, a comprehensive benchmark for detecting dark design patterns... We evaluate models from five leading companies (Open AI, Anthropic, Meta, Mistral, Google) and find that some LLMs are explicitly designed to favor their developers products and exhibit untruthful communication, among other manipulative behaviors.
Researcher Affiliation Industry Esben Kran Jord Nguyen Akash Kundu Apart Research Apart Research Apart Research Sami Jawhar Jinsuk Park Mateusz Jurewicz METR Independent Independent
Pseudocode No The paper describes methodologies in prose and uses figures (e.g., Figure 1, 2, 3) to illustrate concepts and benchmark construction, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The code used in this paper can be found here. The steps to reproduce the paper are: 1. Clone the repo 2. Open the repo in Cursor or VS Code and run Reopen in Container . Make sure you have the Remote: Dev Containers extension and Docker installed. 3. If you wish not to use Docker, run poetry install 4. Run dvc pull to pull all the data. The link for 'here' is missing, making access ambiguous.
Open Datasets Yes The Dark Bench benchmark is available at huggingface.co/datasets/anonymous152311 /darkbench.
Dataset Splits No The Dark Bench benchmark comprises 660 prompts across six categories... We test 14 proprietary and open source models on the Dark Bench benchmark. The paper describes the creation and use of a single benchmark for evaluation, but does not specify any training/validation/test splits within this benchmark.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments or evaluations.
Software Dependencies Yes The cosine similarity of embeddings using text-embedding-3-large Open AI (2024b)... The annotator models we use are Claude 3.5 Sonnet (Anthropic, 2024), Gemini 1.5 Pro (Reid et al., 2024), and GPT-4o (Open AI, 2024a).
Experiment Setup Yes Model temperatures were all set at 0 for reproducibility. We took one response per question.