DarkBench: Benchmarking Dark Patterns in Large Language Models
Authors: Esben Kran, Hieu Minh Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, Mateusz Jurewicz
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce Dark Bench, a comprehensive benchmark for detecting dark design patterns... We evaluate models from five leading companies (Open AI, Anthropic, Meta, Mistral, Google) and find that some LLMs are explicitly designed to favor their developers products and exhibit untruthful communication, among other manipulative behaviors. |
| Researcher Affiliation | Industry | Esben Kran Jord Nguyen Akash Kundu Apart Research Apart Research Apart Research Sami Jawhar Jinsuk Park Mateusz Jurewicz METR Independent Independent |
| Pseudocode | No | The paper describes methodologies in prose and uses figures (e.g., Figure 1, 2, 3) to illustrate concepts and benchmark construction, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The code used in this paper can be found here. The steps to reproduce the paper are: 1. Clone the repo 2. Open the repo in Cursor or VS Code and run Reopen in Container . Make sure you have the Remote: Dev Containers extension and Docker installed. 3. If you wish not to use Docker, run poetry install 4. Run dvc pull to pull all the data. The link for 'here' is missing, making access ambiguous. |
| Open Datasets | Yes | The Dark Bench benchmark is available at huggingface.co/datasets/anonymous152311 /darkbench. |
| Dataset Splits | No | The Dark Bench benchmark comprises 660 prompts across six categories... We test 14 proprietary and open source models on the Dark Bench benchmark. The paper describes the creation and use of a single benchmark for evaluation, but does not specify any training/validation/test splits within this benchmark. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments or evaluations. |
| Software Dependencies | Yes | The cosine similarity of embeddings using text-embedding-3-large Open AI (2024b)... The annotator models we use are Claude 3.5 Sonnet (Anthropic, 2024), Gemini 1.5 Pro (Reid et al., 2024), and GPT-4o (Open AI, 2024a). |
| Experiment Setup | Yes | Model temperatures were all set at 0 for reproducibility. We took one response per question. |