reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Controlling Language and Diffusion Models by Transporting Activations

Authors: Pau Rodriguez, Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, marco cuturi, Xavier Suau

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that ACT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how ACT enables fine-grained style control and concept negation.
Researcher Affiliation	Industry	EMAIL Apple
Pseudocode	No	The paper describes methods using mathematical notation and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1Code available at https://github.com/apple/ml-act
Open Datasets	Yes	We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We evaluate all methods on the Truthful QA multiple choice part that has been used in prior work (Lin et al., 2021; Li et al., 2024) [...] We mine the One Sec dataset (Scarlini et al., 2019) [...] We sample 2048 prompts from the COCO Captions (Chen et al., 2015) training set
Dataset Splits	Yes	We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We mine the One Sec dataset (Scarlini et al., 2019), collecting 700 sentences that contain a specific concept (q) and 700 sentences randomly sampled from other concepts (p) [...] To evaluate, we sample 512 prompts from the COCO Captions validation set and generate images with different intervention strengths.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions several software components and models (e.g., ROBERTA-based classifier, Llama3-8B-instruct, Mistral-7B, Stable Diffusion XL), but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	The degree of intervention can be controlled by a strength parameter λ between 0 (no transport) and 1 (full transport) [...] We intervene upon different layer types (layer column) and show the best layer per method [...] We use a distilled version of SDXL, which only requires 4 diffusion steps