reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct

Authors: Christopher Ackerman, Nina Panickssery

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our first experiment tested whether Llama3-8b-Instruct could achieve above-chance accuracy at self recognition in the Paired paradigm across a range of datasets. As shown in Figure 1a, the model can successfully distinguish its own output from that of humans in all four datasets.
Researcher Affiliation	Collaboration	Christopher Ackerman EMAIL Nina Panickssery EMAIL
Pseudocode	No	The paper describes methods textually, such as the contrastive pairs method, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository for the methodology described.
Open Datasets	Yes	The Summarization paradigm employed three datasets: CNN-Dailymail (CNN; Hermann et al. (2015)), Extreme Summarization (XSUM; Narayan et al. (2018)), and Data Bricks Dolly (DOLLY; Conover et al. (2023)). The Situational Awareness Dataset (SAD; Laine et al. (2024)) utilized in the Continuation paradigm consists of a compilation of texts extracted from The EU AI Act, Reddit, and other sources. ... In addition to the test set derived from the datasets described above, we employ a novel test set based on a Quora dataset of question and answer pairs (QA; (Datasets, 2021)).
Dataset Splits	No	In the results below, we use 1000 texts from each of the CNN, XSUM, and SAD datasets, and 1188 from the DOLLY dataset. ... To form the contrast vector, we identified 734 pairs of model and human-written texts from across the four datasets on which the model had given highly confident and correct self and other authorship judgments in the Individual presentation paradigm.
Hardware Specification	No	The paper mentions running experiments and accessing model activations and parameters but does not specify any particular hardware like GPU or CPU models, or cloud computing resources used for the experiments.
Software Dependencies	No	The paper mentions using specific models like Llama3-8b, GPT3.5, GPT4, and Claude 2 but does not provide details on the software environment or library versions used for their implementation or experiments.
Experiment Setup	No	The paper mentions 'Steering with multipliers in the 3 to 6 range on layers 14-16 was most effective' and 'A small amount of prompt engineering was used', but it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for model training or fine-tuning.