Selective Generation for Controllable Language Models

Authors: Minjae Lee, Kyungmin Kim, Taesoo Kim, Sangdon Park

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the efficacy of the SGen family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs.
Researcher Affiliation Academia Minjae Lee GSAI POSTECH EMAIL Kyungmin Kim GSAI POSTECH EMAIL Taesoo Kim SCS & SCP Ga Tech EMAIL Sangdon Park GSAI & CSE POSTECH EMAIL
Pseudocode Yes Algorithm 1 Entailment Set Learning with a False Entailment Rate (FER) Guarantee
Open Source Code Yes Code and datasets are provided at https://github.com/ml-postech/selective-generation.
Open Datasets Yes We use two GLMs, GPT-3.5-Turbo and Alpaca-7B, alongside the Natural Questions (NQ) dataset to annotate entailment labels for question-answer pairs. [...] we create a dataset on textual entailment using the Natural Questions (NQ) dataset [17] for each GLM.
Dataset Splits Yes Approximately 7.3k (7,374) and 4.6k (4,595) samples are labeled for Alpaca-7B and GPT-3.5-Turbo, respectively, and both are split into calibration and test data at an 8:2 ratio.
Hardware Specification Yes Our system environment consists of 4 NVIDIA A100 80GB with 128 CPUs.
Software Dependencies No The paper mentions models like 'GPT-3.5-Turbo and Alpaca-7B' and 'deberta-v2-xxlarge-mnli' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes To control an FDR-E, we use two user-specified parameters (ε, δ), where we use (0.25, 0.02) unless specified. For our methods (i.e., SGen Semi, SGen Semi No MS, and SGen Semi-Sup No MS ), we have five control parameters (εS, δS, δE, δW ), where we maps as follows: εS = ε, δS = (δ δW )/2, δE = (δ δW )/2, δW = 10 5.