reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-objective Linear Reinforcement Learning with Lexicographic Rewards

Authors: Bo Xue, Dake Bu, Ji Cheng, Yuanyu Wan, Qingfu Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	To bridge this gap, we examine MORL under lexicographic reward structures... We introduce the ﬁrst MORL algorithm with provable regret guarantees... Our work provides a comprehensive theoretical analysis of regret bounds... This work establishes the ﬁrst theoretical regret guarantee for MORL.
Researcher Affiliation	Academia	1Department of Computer Science, City University of Hong Kong, Hong Kong, China 2School of Software Technology, Zhejiang University, Ningbo, China. Correspondence to: Qingfu Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Lexicographic Linear Reinforcement Learning Algorithm 2 Lexicographic Action Elimination
Open Source Code	No	The paper does not contain any explicit statements or links indicating that open-source code for the described methodology is available.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret analysis. It does not conduct empirical studies using specific datasets that would be made publicly available.
Dataset Splits	No	The paper is theoretical and does not involve experiments on datasets, therefore no dataset splits are provided.
Hardware Specification	No	The paper describes a theoretical algorithm and provides proofs; it does not include any experimental evaluation requiring specific hardware, so no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not present experimental results, therefore it does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm development and regret analysis. It does not describe an experimental setup with hyperparameters or training configurations.