reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Checks-and-Balances Framework for Context-Aware Ethical AI Alignment

Authors: Edward Y Chang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental section evaluates our framework through three complementary studies. First, we assess whether emotion-mediated classification provides more effective ethical guardrails than direct behavior classification. Next, we examine Dike’s ability to independently evaluate and explain linguistic behaviors. Finally, we test how the adversarial Eris component enables cultural adaptability and prevents excessive censorship.
Researcher Affiliation	Academia	1Computer Science, Stanford University. Correspondence to: Edward Y. Chang <EMAIL>.
Pseudocode	Yes	Table 1: Checks-and-balances, adversarial review algorithm
Open Source Code	Yes	The datasets and code are publicly available at (Chang, 2024b).
Open Datasets	Yes	We therefore selected the Love Letters Collection (Kaggle, 2023) (9,700 communications) which: (1) spans the full emotional intensity spectrum, (2) contains cultural variation, (3) includes longer-form texts, and (4) remains processable by commercial LLMs.
Dataset Splits	Yes	We tasked GPT-4 with generating training data by rewriting 54 extensive letters from Kaggle’s Love Letters dataset, augmented with 12 celebrated love poems. We selected longer letters since most communications in the dataset were too brief for analysis, and set aside another 24 letters as testing data.
Hardware Specification	No	No specific hardware details (GPU models, CPU models, or memory specifications) were provided in the paper's text.
Software Dependencies	No	The paper repeatedly mentions the use of 'GPT-4' for various tasks (e.g., rewriting documents, emotion analysis) but does not specify any other software libraries, frameworks, or their version numbers required for replication beyond this model reference.
Experiment Setup	No	The paper describes the methodology for using GPT-4 (e.g., rewriting documents, performing emotion analysis, zero-shot classification) and the steps of the Dike self-supervised learning pipeline. However, it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for any models trained or fine-tuned by the authors.