reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy-Regret Minimization in Markov Games with Function Approximation

Authors: Thanh Nguyen-Tang, Raman Arora

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a general algorithmic framework that achieves the optimal O(T) policy regret for a wide class of large-scale problems characterized by an Eluder-type condition... We show that BOVL attains a policy regret bound of V (m + H) d EγT, where V is the scale of the value functions, m the memory of the adversary, H the episode length, d E and γ the Eluder-type and covering-type complexities of the function classes, respectively, and T the total number of episodes. The full proof appears in Appendix A.
Researcher Affiliation	Academia	1Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. Correspondence to: Raman Arora <EMAIL>.
Pseudocode	Yes	Algorithm 1 BOVL(F, Ψ, ΠA , T, K) Batching and Optimism based on Value and Likelihood fitting
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not conduct experiments on any specific dataset. It discusses problem settings like 'large state and action spaces' or 'linear Markov game' as abstract models rather than concrete datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical evaluation with datasets, thus no dataset splits are discussed.
Hardware Specification	No	The paper is theoretical and does not describe experimental results, therefore no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies or their version numbers for implementation or experimentation.
Experiment Setup	No	The paper is theoretical, presenting an algorithmic framework and its theoretical guarantees. It does not include an experimental section or details on hyperparameter settings or system-level configurations.