Controlling Language and Diffusion Models by Transporting Activations

Authors: Pau Rodriguez, Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, marco cuturi, Xavier Suau

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that ACT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how ACT enables fine-grained style control and concept negation.
Researcher Affiliation Industry EMAIL Apple
Pseudocode No The paper describes methods using mathematical notation and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Code available at https://github.com/apple/ml-act
Open Datasets Yes We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We evaluate all methods on the Truthful QA multiple choice part that has been used in prior work (Lin et al., 2021; Li et al., 2024) [...] We mine the One Sec dataset (Scarlini et al., 2019) [...] We sample 2048 prompts from the COCO Captions (Chen et al., 2015) training set
Dataset Splits Yes We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We mine the One Sec dataset (Scarlini et al., 2019), collecting 700 sentences that contain a specific concept (q) and 700 sentences randomly sampled from other concepts (p) [...] To evaluate, we sample 512 prompts from the COCO Captions validation set and generate images with different intervention strengths.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions several software components and models (e.g., ROBERTA-based classifier, Llama3-8B-instruct, Mistral-7B, Stable Diffusion XL), but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes The degree of intervention can be controlled by a strength parameter λ between 0 (no transport) and 1 (full transport) [...] We intervene upon different layer types (layer column) and show the best layer per method [...] We use a distilled version of SDXL, which only requires 4 diffusion steps