Aligning Language Models with Demonstrated Feedback

Authors: Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Hyundong Cho, Michael Bernstein, Diyi Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that winrates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points.
Researcher Affiliation Academia Omar Shaikh Stanford University EMAIL Michelle S. Lam Stanford University EMAIL Joey Hejna Stanford University EMAIL Yijia Shao Stanford University Hyundong Cho USC Michael S. Bernstein Stanford University Diyi Yang Stanford University
Pseudocode Yes Algorithm 1: DITTO Input :LM πref, demos DE = {(xi, y E i )}i N, sample size M, sample frequency K Init :π0 SFT(πref, DE), t = 0 while not converged do Dt N i=1{(xi, yj πt( |xi)}M j=1 for k = 1, 2, 3, ..., K do Sample batch B = {(x, yw, yl)} of comparisons from induced ranking: DE Dt Dt 1 ... D0 πt DPO(πt, B) # Update policy t t + 1
Open Source Code Yes Equal Contribution 1Code: https://github.com/SALT-NLP/demonstrated-feedback
Open Datasets Yes We collect data from 20 distinct authors from two sources: (1) emails and blog posts from the CMCC dataset (Goldstein et al., 2008) that contain only one author and (2) news articles from the CCAT dataset (Lewis et al., 2004).
Dataset Splits Yes We randomly select 10 authors from each dataset, use 7 samples to train, and split the remainder into test and validation. Table 4 in the Appendix describes the finalized train/val/test counts across each benchmark.
Hardware Specification Yes All training was conducted on 1 A100 80GB GPU.
Software Dependencies No The paper mentions using Mistral Instruct v0.2 7B, Lo RA, DPO, and Adam W, but does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We run a random hyperparameter sweep over a single, randomly selected author from each corpus, using lr = {1e 4, 3e 4, 1e 5, 3e 5, 1e 6, 3e 6}, epoch = {10, 15, 20, 25, 30}, and β = {0.01, 0.05, 0.1}. We additionally tune how frequently DITTO samples negatives (K = {1, 5, 10}); and how many negatives DITTO samples (M = {1, 5, 10}). Finally, we tuned the replay / expert / intermodel fractions, selecting between 0.2 / 0.7 / 0.1, 0.25 / 0.5 / 0.25 and 0.1 / 0.7 / 0.2.