reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selective Generation for Controllable Language Models

Authors: Minjae Lee, Kyungmin Kim, Taesoo Kim, Sangdon Park

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the efficacy of the SGen family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs.
Researcher Affiliation	Academia	Minjae Lee GSAI POSTECH EMAIL Kyungmin Kim GSAI POSTECH EMAIL Taesoo Kim SCS & SCP Ga Tech EMAIL Sangdon Park GSAI & CSE POSTECH EMAIL
Pseudocode	Yes	Algorithm 1 Entailment Set Learning with a False Entailment Rate (FER) Guarantee
Open Source Code	Yes	Code and datasets are provided at https://github.com/ml-postech/selective-generation.
Open Datasets	Yes	We use two GLMs, GPT-3.5-Turbo and Alpaca-7B, alongside the Natural Questions (NQ) dataset to annotate entailment labels for question-answer pairs. [...] we create a dataset on textual entailment using the Natural Questions (NQ) dataset [17] for each GLM.
Dataset Splits	Yes	Approximately 7.3k (7,374) and 4.6k (4,595) samples are labeled for Alpaca-7B and GPT-3.5-Turbo, respectively, and both are split into calibration and test data at an 8:2 ratio.
Hardware Specification	Yes	Our system environment consists of 4 NVIDIA A100 80GB with 128 CPUs.
Software Dependencies	No	The paper mentions models like 'GPT-3.5-Turbo and Alpaca-7B' and 'deberta-v2-xxlarge-mnli' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	To control an FDR-E, we use two user-specified parameters (ε, δ), where we use (0.25, 0.02) unless specified. For our methods (i.e., SGen Semi, SGen Semi No MS, and SGen Semi-Sup No MS ), we have five control parameters (εS, δS, δE, δW ), where we maps as follows: εS = ε, δS = (δ δW )/2, δE = (δ δW )/2, δW = 10 5.