reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Authors: Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we test the empirical performance of our framework in text generation tasks. We will test the output model s quality, the computation complexity and the impact of data selection ratio across different models. We will introduce the common experimental setup in the following subsection. The results on different models are reported in Tables 1 and 2 and Figures 3 and 4.
Researcher Affiliation	Collaboration	Han Shen1 Pin-Yu Chen2 Payel Das2 Tianyi Chen1 1Rensselaer Polytechnic Institute 2 IBM Research 1EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Bilevel Data Selector Training ... Algorithm 2 Bilevel Data Selector Training (memory-efficient)
Open Source Code	Yes	Our code is available on github https://github.com/hanshen95/SEAL.
Open Datasets	Yes	We introduce our choice of datasets: 1) ANTHROPIC HELPFUL AND HARMLESS (HH) (Bai et al., 2022); 2) ORCA (Longpre et al., 2023; Mukherjee et al., 2023); 3) HEX-PHI (Qi et al., 2023); 4) ALPACA-CLEANED (Wang et al., 2022).
Dataset Splits	Yes	For fair comparison, all data selection methods select 80% of the fine-tuning data. ... In this section, we use the REDORCA dataset as the fine-tuning dataset, and use a withheld subset (112k data points) of SLIMORCA dataset as the safe dataset in SEAL and the target dataset in DSIR. ... We use a subset of 49.9k samples from ALPACA-CLEANED as the safe dataset, and use a subset of 49.9k samples from OPENORCA as the benign fine-tuning dataset. ... and use a data selection percent of 90%.
Hardware Specification	Yes	Table 3: Wall-clock runtime and GPU memory usage on one NVIDIA A6000 in the group of four.
Software Dependencies	No	The paper mentions software components like "Lo RA" and "Adam optimizer" but does not provide specific version numbers for any libraries, programming languages, or other software dependencies.
Experiment Setup	Yes	For Lo RA training on LLAMA2-7B-CHAT-HF, LLAMA-3-8B-INSTRUCT and MERLINITE-7B, we use Lo RA weights of rank 16, α = 16 without dropout on all the query and value projection matrices... We use Adam optimizer in all tests. For SEAL s data selector training, we set σ(ω) as the softmax function. ... we train for 3 epochs using a batch size of 64, and a learning rate of 1 10 5 for the model parameter and 5 10 3 for the data selector. The penalty strength γ increase from 0 by 3 10 2 for each epoch. When fine-tuning the model, we use a batch size of 64 and a learning rate of 1 10 5. We fine-tune the model for 2 epochs.