reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

Authors: Mintong Kang, Chejian Xu, Bo Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on multiple advanced ALMs demonstrate that Adv Wave outperforms baseline methods, achieving a 40% higher average jailbreak attack success rate. Both audio stealthiness metrics and human evaluations confirm that adversarial audio generated by Adv Wave is indistinguishable from natural sounds.
Researcher Affiliation	Academia	Mintong Kang & Chejian Xu & Bo Li University of Illinois at Urbana Champaign EMAIL
Pseudocode	No	The paper describes methods through text and mathematical formulations (e.g., Equations 1-7) and high-level diagrams (Figure 1), but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	As Adv Bench (Zou et al., 2023) is widely used for jailbreak evaluations in text domain (Liu et al., 2023a; Chao et al., 2023; Mehrotra et al., 2023), we adapted its text-based queries into audio format using Open AI s TTS APIs, creating the Adv Bench-Audio dataset. Adv Bench Audio contains 520 audio queries, each requesting instructions on unethical or illegal activities.
Dataset Splits	No	Adv Bench Audio contains 520 audio queries, each requesting instructions on unethical or illegal activities. The paper does not provide specific training/validation/test splits for this dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments.
Software Dependencies	No	The paper mentions using GPT-4o and Open AI's TTS APIs as tools, and the Qwen2-Audio model for classification, but does not provide specific software dependency names with version numbers (e.g., libraries, frameworks, or programming language versions) for its own implementation.
Experiment Setup	Yes	We implement the adversarial loss Ladv as the Cross-Entropy loss between ALM output likelihoods and the adaptively searched adversarial targets. We fix the slack margin α as 1.0 for in the alignment loss Lalign. We use Qwen2Audio model to implement the audio classifier to impose classifier guidance Lstealth following the prompts in Appendix A.3. For Adv Wave optimization, we set a maximum of 3000 epochs, with an early stopping criterion if the loss falls below 0.1. We optimize the adversarial noise towards the sound of car horn by default, but we also evaluate diverse environmental noises in Section 4.4.