A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs

Authors: Kihyuk Hong, Ambuj Tewari

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we introduce a value iteration method with efficient clipping operation that only requires computing the minimum of value functions over the set of states visited by the algorithm. Our algorithm enjoys the same regret bound as the previous work while being computationally efficient, with computational complexity that is independent of the size of the state space. The paper includes numerous proofs, lemmas (e.g., Lemma 3.3, Lemma 3.4, Lemma 3.5), and a main theorem (Theorem 3.6), indicating a theoretical focus without empirical evaluation.
Researcher Affiliation Academia 1Department of Statistics, University of Michigan, Ann Arbor, US. Correspondence to: Ambuj Tewari <EMAIL>. Both authors are affiliated with the University of Michigan, an academic institution.
Pseudocode Yes The paper includes 'Algorithm 2 γ-DC-LSCVI-UCB' and 'Algorithm 3 γ-LSCVI-UCB+ without Deviation Control', which are clearly structured algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links regarding the availability of open-source code for the described methodology. The 'Impact Statement' section does not mention code release either.
Open Datasets No The paper focuses on theoretical algorithm design for linear MDPs and does not describe experiments using specific datasets, thus no information on open datasets is provided.
Dataset Splits No The paper describes theoretical work and does not involve experimental evaluation on datasets, hence no dataset split information is provided.
Hardware Specification No The paper describes theoretical algorithm design and analysis, and does not report on any experiments. Therefore, no hardware specifications are provided.
Software Dependencies No The paper presents theoretical research and does not include an experimental implementation or evaluation. Therefore, no specific software dependencies with version numbers are listed.
Experiment Setup No The paper is focused on theoretical contributions and does not include experimental results. Therefore, no experimental setup details or hyperparameters are specified.