The Limits of Predicting Agents from Behaviour

Authors: Alexis Bellot, Jonathan Richens, Tom Everitt

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our contribution is the derivation of novel bounds on the agent s behaviour in new (unseen) deployment environments, which represent a theoretical limit for predicting intentional agents from behavioural data alone. We discuss the implications of these results for several research areas including fairness and safety. The main result of this paper is to offer a new perspective on this problem by showing that: With an assumption of competence and optimality, the behaviour of AI systems partially determines their actions in novel environments. We provide a precise answer to this question under the assumption that the agent s behaviour is guided by a world model. Our contribution is the derivation of novel bounds on the agent s behaviour in new (unseen) deployment environments, which represent a theoretical limit for predicting intentional agents from behavioural data alone. We discuss the implications of these results for several research areas including fairness and safety. All proofs of statements are given in Appendix C and that the derivations of examples are given in Appendix A.
Researcher Affiliation Industry 1Google Deep Mind. Correspondence to: Alexis Bellot <EMAIL>.
Pseudocode No The paper primarily presents theoretical derivations, theorems, and conceptual examples. It does not contain any explicitly labeled pseudocode or algorithm blocks, nor does it present procedural steps in a code-like format.
Open Source Code No The paper does not contain any explicit statements about releasing source code, providing links to code repositories, or including code in supplementary materials.
Open Datasets No The paper uses conceptual examples like 'The Uncertain Medical AI' and 'The Shifted Medical AI' to illustrate theoretical points. These examples do not refer to actual public datasets, and no concrete access information for any dataset is provided.
Dataset Splits No The paper is theoretical and does not describe experiments performed on datasets. Therefore, there is no mention of dataset splits such as training, validation, or test sets.
Hardware Specification No The paper is theoretical in nature, focusing on mathematical derivations and bounds. It does not describe any experiments that would require specific hardware, and thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not discuss the implementation of any algorithms or models. Consequently, no specific software dependencies or version numbers are mentioned.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and conceptual frameworks rather than empirical experiments. As such, there is no experimental setup described, nor are any hyperparameters or training settings mentioned.