Position: AI Safety Must Embrace an Antifragile Perspective

Authors: Ming Jin, Hyunin Lee

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This position paper contends that modern AI research must adopt an antifragile perspective on safety one in which the system s capacity to guarantee long-term AI safety such as handling rare or out-of-distribution (OOD) events expands over time. ... In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of antifragile solutions to manage rare events. ... We adapt three theorems from (Lee et al., 2025), which show that in complex multi-step settings, a non-zero gap is unavoidable. Theorem 3.2 (Trivial Cases Without Black Swan). ... Theorem 3.3 (Multi-State, Multi-Step Gaps). ... Corollary 3.4 (Robustness Gap Lower Bound).
Researcher Affiliation Academia Ming Jin 1 Hyunin Lee 2 1Virginia Tech 2UC Berkeley. Correspondence to: Ming Jin <EMAIL>.
Pseudocode No The paper describes theoretical concepts and arguments. While it refers to algorithms and methods from other works (e.g., Algorithm of Thoughts, BSAFE), it does not present its own structured pseudocode or algorithm blocks for the antifragile framework it proposes.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described in this paper, nor does it provide links to any code repositories.
Open Datasets No This is a position paper that does not conduct experiments requiring its own datasets. It mentions various benchmarks and datasets used in the broader field (e.g., Adversarial NLI, Dynabench) to illustrate points, but these are not datasets generated or explicitly made available by this paper for its own research.
Dataset Splits No This is a position paper focusing on theoretical concepts and arguments, and does not involve experimental results with specific datasets. Therefore, information regarding training/test/validation splits is not applicable and not provided.
Hardware Specification No This is a position paper presenting theoretical arguments and guidelines, not empirical experiments. As such, there is no mention of specific hardware used for running experiments.
Software Dependencies No This is a position paper focusing on theoretical concepts and arguments, and does not describe experiments requiring specific software dependencies with version numbers.
Experiment Setup No This is a position paper that outlines a theoretical framework and ethical guidelines. It does not present experimental results, and therefore, details about hyperparameters, training configurations, or other experimental setup parameters are not provided.