Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
Authors: Jongmin Lee, Ernest Ryu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we conduct refined non-asymptotic analyses of average-reward MDPs, obtaining a collection of convergence results that advance our understanding of the setup. Among our new results, most notable are the O(1/k)-rates of Anchored Value Iteration on the Bellman error under the multichain setup and the span-based complexity lower bound that matches the O(1/k) upper bound up to a constant factor of 8 in the weakly communicating and unichain setups. |
| Researcher Affiliation | Academia | Jongmin Lee Seoul National University Department of Mathematical Sciences EMAIL Ernest K. Ryu UCLA Department of Mathematics EMAIL |
| Pseudocode | No | The Relaxed Value Iteration (Rx-VI) is V k = λk V k 1 + (1 λk)TV k 1 (Rx-VI) for k = 1, 2, . . . , where T is the Bellman optimality operator, V 0 Rn is a starting point, and 0 λk < 1 for k = 0, 1, . . . . πk is a greedy policy satisfying T πk V k = TV k for k = 0, 1, . . . . The Anchored Value Iteration is V k = λk V 0 + (1 λk)TV k 1 (Anc-VI) for k = 1, 2, . . . , where T is the Bellman optimality operator, V 0 Rn is a starting point, and 0 λk < 1 for k = 0, 1, . . . . |
| Open Source Code | No | No explicit statement about code release, repository link, or code in supplementary materials for the methodology described in this paper was found. |
| Open Datasets | No | The paper focuses on theoretical analysis of average-reward MDPs and does not report on experiments using specific datasets, therefore no dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets; therefore, no dataset splits are discussed. |
| Hardware Specification | No | The paper presents theoretical results and does not describe any experimental setup or specific hardware used for computations. |
| Software Dependencies | No | The paper is purely theoretical, focusing on mathematical analysis and proofs, and does not mention any software dependencies with version numbers for experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not include an experimental section or describe any specific experimental setup, hyperparameters, or training configurations. |