reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Authors: Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show that the regret of our proposed method is upper bounded by e O(T 2/3) and further devise a lower bound to show that our algorithm is eﬃcient, incurring the same Ω(T 2/3) regret as the lower bound, where T is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model.
Researcher Affiliation	Academia	Shuang Qiu EMAIL The Hong Kong University of Science and Technology Hong Kong, China Boxiang Lyu EMAIL The University of Chicago Chicago, IL, USA Qinglin Meng EMAIL Purdue University West Lafayette, IN, USA Zhaoran Wang EMAIL Northwestern University Evanston, IL, USA Zhuoran Yang EMAIL Yale University New Haven, CT, USA Michael I. Jordan EMAIL University of California Berkeley, CA, USA
Pseudocode	Yes	Algorithm 1 VCG-Lin MDP Algorithm 2 Exploration Algorithm 3 Exploitation: Planning Algorithm 4 Exploitation: Policy Eval Algorithm 5 Est-Q: One-Step Optimistic/Pessimistic Estimation of Q-Function
Open Source Code	No	The paper does not contain any explicit statement about releasing code or a link to a code repository.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret analysis for an 'unknown Markov Decision Process (MDP)'. It does not describe any experiments that would involve publicly available datasets or provide access information for any data.
Dataset Splits	No	The paper is theoretical and does not describe any experimental evaluation using datasets, therefore no dataset splits are provided.
Hardware Specification	No	The paper is theoretical and focuses on algorithms and regret bounds. It does not contain any description of hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and describes algorithms and their mathematical properties. It does not mention any specific software dependencies with version numbers required to replicate the work.
Experiment Setup	No	The paper describes algorithms and analyzes their theoretical properties (e.g., regret bounds). While it mentions hyperparameters like ζ1, ζ2, ζ3, and K in the context of algorithm design, these are parameters of the proposed learning framework, not specific settings for empirical experiments. There are no details regarding a typical experimental setup (e.g., learning rates, batch sizes, optimizers, epochs) for empirical validation.