Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals

Authors: Junyan Liu, Arnab Maiti, Artin Tajdini, Kevin Jamieson, Lillian J. Ratliff

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We initiate the study of a repeated principal-agent problem over a finite horizon T, where a principal sequentially interacts with K 2 types of agents arriving in an adversarial order... Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, ...we propose an algorithm that achieves a regret bound of O(min{ KT log N, K T}) and provide a matching lower bound... In the second setting, ...we show that there is an algorithm with a regret bound of e O((LN)1/3T 2/3) and establish a matching lower bound... All our results are summarized in Table 1.
Researcher Affiliation Academia 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA 2Department of Electrical and Computer Engineering, University of Washington, Seattle, USA. Correspondence to: Junyan Liu <EMAIL>, Arnab Maiti <EMAIL>, Artin Tajdini <EMAIL>.
Pseudocode No The paper describes algorithms such as 'EXP3 algorithm for linear bandits' and 'Tsallis-INF algorithm' and 'ADVERSARIALZOOMING' but does not present them in structured pseudocode blocks or figures within the main text.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide links to any code repositories.
Open Datasets No The paper focuses on a theoretical 'repeated principal-agent problem' model and does not mention the use of any specific publicly available or open datasets for empirical evaluation. Therefore, no concrete access information for datasets is provided.
Dataset Splits No The paper does not utilize any specific datasets for experimental evaluation, thus there is no mention of training/test/validation splits.
Hardware Specification No The paper is theoretical in nature, focusing on algorithm design, regret bounds, and lower bounds. It does not describe any experimental setups or the hardware used to run experiments.
Software Dependencies No The paper references theoretical algorithms like 'EXP3 algorithm for linear bandits' and 'Tsallis-INF algorithm' from other works, but it does not specify any particular software, libraries, or their version numbers that would be required to implement or reproduce the authors' own described methodologies.
Experiment Setup No The paper is a theoretical work that focuses on algorithm design and mathematical analysis, including regret bounds and lower bounds. It does not describe any empirical experiments, and therefore, no experimental setup details, hyperparameters, or training configurations are provided.