reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning

Authors: Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, Jiang Bian

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments). Our code is available at https://github.com/Ruoh Liuq/C-MORL. 1 INTRODUCTION
Researcher Affiliation	Collaboration	Ruohong Liu University of Oxford Oxford, UK EMAIL Yuxin Pan The Hong Kong University of Science and Technology Hong Kong, China EMAIL Linjie Xu Queen Mary University of London London, UK EMAIL Lei Song & Jiang Bian Microsoft Research Asia Beijing, China EMAIL Pengcheng You Peking University Beijing, China EMAIL Yize Chen University of Alberta Edmonton, Canada EMAIL
Pseudocode	Yes	Algorithm 1 Policy Selection Algorithm 2 C-MORL
Open Source Code	Yes	Our code is available at https://github.com/Ruoh Liuq/C-MORL.
Open Datasets	Yes	In this Section, we validate the design of our proposed algorithm using both popular discrete and continuous MORL benchmarks from MO-Gymnasium (Felten et al., 2023a) and Sustain Gym (Yeh et al., 2024). These benchmarks include five comprehensive domains: (i) Grid World includes Fruit Tree, a discrete benchmark with six objectives. (ii) Classic Control includes MO-Lunar-Lander, a discrete benchmark with four objectives. (iii) Miscellaneous includes Minecart, a discrete benchmark with four objectives. (iv) Robotics Control includes five Mu Jo Co tasks with continuous action space based on Mu Jo Co simulator (Todorov et al., 2012; Xu et al., 2020). (v) Sustainable Energy Systems includes two building heating supply tasks.
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits in the conventional sense for fixed datasets. Instead, it details the training duration within environments and the methodology for evaluating policies over a sampled preference space. For example: "Each of the baselines are trained for 5 * 10^5 time steps for discrete benchmarks. Continuous benchmarks with two, three, and nine objectives are trained for 1.5 * 10^6, 2 * 10^6, and 2.5 * 10^6 steps, respectively." and "For metrics evaluation, we evenly generate an evaluation preference set in a systematic manner with specified intervals = 0.01, = 0.1, and = 0.5 for benchmarks with two objectives, three or four objectives, and six or nine objectives, respectively." These describe training interaction and evaluation strategy, not dataset partitioning.
Hardware Specification	Yes	We run all the experiments on a cloud server including CPU Intel Xeon Processor and GPU Tesla T4.
Software Dependencies	No	The paper mentions algorithms like PPO and uses libraries like MORL-baselines, but does not provide specific version numbers for software components (e.g., Python, PyTorch, or the MORL-baselines library itself). For example: "In the Pareto initialization stage, we use PPO algorithm implemented by Kostrikov (2018)." and "For Envelope (Yang et al., 2019), CAPQL (Lu et al., 2022), GPILS (Alegre et al., 2023), and MORL/D (Felten et al., 2024), we utilize the implementations available in the MORL-baselines library (Felten et al., 2023a), adapting them as necessary to align with our experimental setup."
Experiment Setup	Yes	The PPO parameters are reported in Table 7 and Table 8. For constrained optimization, we adopt C-MORL-IPO method. [...] The hyperparameters of C-MORL-IPO include: Number of initial policy M: the number of initial policies. [...] The parameters we used are provided in Table 9 and Table 10.