reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively.
Researcher Affiliation	Collaboration	Tong Wu1 Shujian Zhang2 Kaiqiang Song2 Silei Xu2 Sanqiang Zhao2 Ravi Agrawal2 Sathish Indurthi2 Chong Xiang1 Prateek Mittal1 Wenxuan Zhou2 1Princeton University 2Zoom Video Communications
Pseudocode	Yes	A DETAILS OF IMPLEMENTING INSTRUCTIONAL SEGMENT EMBEDDING Here s an example of implementing Instructional Segment Embedding with a few lines of Python/Pytorch code. The additional code is highlighted in bold blue.
Open Source Code	Yes	We release our code at https://github.com/tongwu2020/ISE.
Open Datasets	Yes	Empirically, we conduct comprehensive experiments on two benchmarks: Structured Query (Chen et al., 2024) and Instruction Hierarchy (Wallace et al., 2024), which are constructed based on the Alpaca (Taori et al., 2023) and Ultrachat (Ding et al., 2023) datasets, respectively.
Dataset Splits	Yes	For the Adversarial Alpaca dataset, we incorporate instructions drawn from other samples (either directly or with a fabricated response) into the data and train the model to ignore such instructions. More details are available in Section B.1. For the Ultra Chat Baseline dataset, we use the Ultra Chat-200K dataset (Ding et al., 2023) and employ GPT-4o to decompose 10K prompts into three components: system instructions, user instructions, and data inputs.
Hardware Specification	No	The paper does not explicitly mention specific hardware details like GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	Appendix A provides a PyTorch code snippet, but it does not specify the version of PyTorch or any other software dependencies with their version numbers.
Experiment Setup	Yes	We employ supervised fine-tuning to update all model parameters for all baseline and ISE methods with three epochs. A learning rate of 2e-5 and a cosine learning schedule are used. During inference, we use top-p sampling methods with the model s default settings.