Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Authors: Junyan Liu, Lillian J. Ratliff

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies the repeated principal-agent bandit game, where the principal indirectly explores an unknown environment by incentivizing an agent to play arms. We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon T, achieving regret bounds of e O(T) and e O(T 2/3), respectively.
Researcher Affiliation Academia 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA USA 2Electrical & Computer Engineering, University of Washington, Seattle, WA USA.
Pseudocode Yes Algorithm 1 Proposed algorithm for i.i.d. reward
Open Source Code No The paper does not contain any statement about code availability or links to code repositories.
Open Datasets No The paper describes a theoretical framework for principal-agent bandit games and does not use any specific dataset for empirical evaluation.
Dataset Splits No The paper focuses on theoretical algorithm design and regret bounds for bandit games, and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification No The paper presents theoretical algorithms and regret analysis for bandit games, without conducting empirical experiments that would require specific hardware specifications.
Software Dependencies No The paper focuses on theoretical algorithm design and provides mathematical analysis, without detailing specific software or library versions used for implementation or simulation.
Experiment Setup No The paper presents a theoretical study of principal-agent bandit games, focusing on algorithm design and regret bounds, and therefore does not describe any experimental setup or hyperparameters.