reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gumbel Counterfactual Generation From Language Models

Authors: Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson, Ryan Cotterell

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.
Researcher Affiliation	Academia	1New York University 2ETH Zurich 3University of Copenhagen
Pseudocode	Yes	Algorithm 1 An algorithm that samples counterfactual strings given a factual string.
Open Source Code	Yes	Our code is available at https://github.com/shauli-ravfogel/lm-counterfactuals.
Open Datasets	Yes	We generate 500 sentences by using the first five words of randomly selected English Wikipedia sentences as prompts for the original model. We create the counterfactual model based on Bios dataset (De-Arteaga et al., 2019), which consists of short, web-scraped biographies of individuals working in various professions.
Dataset Splits	Yes	For each original and counterfactual model pair, we generate 500 sentences by using the first five words of randomly selected English Wikipedia sentences as prompts for the original model. We use 15,000 pairs of male and female biographies from the training set to fit the Mi Mi C optimal linear transformation
Hardware Specification	Yes	All models are run on 8 RTX-4096 GPUs and use 32-bit floating-point precision.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions models like GPT2-XL and LLa MA3-8b but no other software details.
Experiment Setup	Yes	We apply MEMIT on GPT2-XL model... we focus the intervention on layer 13 of the model... a KL factor of 0.0625, a weight decay of 0.5, and calculating the loss on layer 47. We fit the intervention on layer 16 of the residual steam of the model, chosen based on preliminary experiments, which showed promising results in changing the pronouns in text continuations from male to female.