SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Authors: Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we test the empirical performance of our framework in text generation tasks. We will test the output model s quality, the computation complexity and the impact of data selection ratio across different models. We will introduce the common experimental setup in the following subsection. The results on different models are reported in Tables 1 and 2 and Figures 3 and 4. |
| Researcher Affiliation | Collaboration | Han Shen1 Pin-Yu Chen2 Payel Das2 Tianyi Chen1 1Rensselaer Polytechnic Institute 2 IBM Research 1EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Bilevel Data Selector Training ... Algorithm 2 Bilevel Data Selector Training (memory-efficient) |
| Open Source Code | Yes | Our code is available on github https://github.com/hanshen95/SEAL. |
| Open Datasets | Yes | We introduce our choice of datasets: 1) ANTHROPIC HELPFUL AND HARMLESS (HH) (Bai et al., 2022); 2) ORCA (Longpre et al., 2023; Mukherjee et al., 2023); 3) HEX-PHI (Qi et al., 2023); 4) ALPACA-CLEANED (Wang et al., 2022). |
| Dataset Splits | Yes | For fair comparison, all data selection methods select 80% of the fine-tuning data. ... In this section, we use the REDORCA dataset as the fine-tuning dataset, and use a withheld subset (112k data points) of SLIMORCA dataset as the safe dataset in SEAL and the target dataset in DSIR. ... We use a subset of 49.9k samples from ALPACA-CLEANED as the safe dataset, and use a subset of 49.9k samples from OPENORCA as the benign fine-tuning dataset. ... and use a data selection percent of 90%. |
| Hardware Specification | Yes | Table 3: Wall-clock runtime and GPU memory usage on one NVIDIA A6000 in the group of four. |
| Software Dependencies | No | The paper mentions software components like "Lo RA" and "Adam optimizer" but does not provide specific version numbers for any libraries, programming languages, or other software dependencies. |
| Experiment Setup | Yes | For Lo RA training on LLAMA2-7B-CHAT-HF, LLAMA-3-8B-INSTRUCT and MERLINITE-7B, we use Lo RA weights of rank 16, α = 16 without dropout on all the query and value projection matrices... We use Adam optimizer in all tests. For SEAL s data selector training, we set σ(ω) as the softmax function. ... we train for 3 epochs using a batch size of 64, and a learning rate of 1 10 5 for the model parameter and 5 10 3 for the data selector. The penalty strength γ increase from 0 by 3 10 2 for each epoch. When fine-tuning the model, we use a batch size of 64 and a learning rate of 1 10 5. We fine-tune the model for 2 epochs. |