Controlling Language and Diffusion Models by Transporting Activations
Authors: Pau Rodriguez, Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, marco cuturi, Xavier Suau
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that ACT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how ACT enables fine-grained style control and concept negation. |
| Researcher Affiliation | Industry | EMAIL Apple |
| Pseudocode | No | The paper describes methods using mathematical notation and prose, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code available at https://github.com/apple/ml-act |
| Open Datasets | Yes | We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We evaluate all methods on the Truthful QA multiple choice part that has been used in prior work (Lin et al., 2021; Li et al., 2024) [...] We mine the One Sec dataset (Scarlini et al., 2019) [...] We sample 2048 prompts from the COCO Captions (Chen et al., 2015) training set |
| Dataset Splits | Yes | We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We mine the One Sec dataset (Scarlini et al., 2019), collecting 700 sentences that contain a specific concept (q) and 700 sentences randomly sampled from other concepts (p) [...] To evaluate, we sample 512 prompts from the COCO Captions validation set and generate images with different intervention strengths. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components and models (e.g., ROBERTA-based classifier, Llama3-8B-instruct, Mistral-7B, Stable Diffusion XL), but does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | The degree of intervention can be controlled by a strength parameter λ between 0 (no transport) and 1 (full transport) [...] We intervene upon different layer types (layer column) and show the best layer per method [...] We use a distilled version of SDXL, which only requires 4 diffusion steps |