A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety

Authors: Hyunin Lee, Chanwoo Park, David Abel, Ming Jin

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper challenges that the standard view is incomplete and claims that high-risk, statistically rare events can also occur in unchanging environments due to human misperception of events values and likelihoods, which we refer to as S-BLACK SWAN . We first carefully categorize black swan events, focusing on S-BLACK SWAN , and mathematically formalize the definition of black swan events. We hope these definitions can pave the way for the development of algorithms to prevent such events by rationally correcting limitations in perception. Our work begins with a case study on how S-BLACK SWANS emerge and cause suboptimality gaps in various MDP settings, such as bandit (Theorem 1), small state spaces (Theorem 2), and large state spaces (Theorem 3). ... Our main finding (Theorem 4) shows that while the HEMDP value function asymptotically converges to that of the HMDP over longer horizons, the gap between HMDP and GMDP has a lower bound... Finally, Theorem 5 examines S-BLACK SWAN hitting time...
Researcher Affiliation Collaboration Hyunin Lee1 Chanwoo Park2 David Abel3 Ming Jin4 1UC Berkeley, 2MIT, 3Google Deep Mind, 4Virgina Tech
Pseudocode Yes Algorithm 1 (Black Swan Classification: S-BLACK SWAN ). For a given (possibly non-stationary) M, suppose (s,a,tbs) is a black swan event. If (s,a,t) is a black swan event for t [T], then we classify (s,a,tbs) as a black swan that originates from environment s stationarity (S-BLACK SWAN ).
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper presents theoretical concepts and mathematical formalizations, and uses real-world events as illustrative examples, but does not conduct experiments on specific datasets nor provide access information for any datasets used in such experiments.
Dataset Splits No The paper is theoretical and does not present experiments requiring dataset splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments or computations.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers used for its own work. It mentions other software and algorithms in the related works section, but not for its own implementation or evaluation.
Experiment Setup No The paper is theoretical and does not describe an experimental setup, hyperparameters, or training configurations.