reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pareto Set Learning for Multi-Objective Reinforcement Learning

Authors: Erlong Liu, Yu-Chang Wu, Xiaobin Huang, Chengrui Gao, Ren-Jian Wang, Ke Xue, Chao Qian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.
Researcher Affiliation	Academia	National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China School of Artificial Intelligence, Nanjing University, Nanjing 210023, China EMAIL
Pseudocode	Yes	Algorithm 1: PSL-MORL Input: Preference distribution Λ, environment E, number E of episodes, number K of weights per episode, replay buffer D, batch size N Output: Hypernetwork parameters ϕ and primary policy network parameters θ1
Open Source Code	No	The paper does not contain any explicit statements about code release, nor does it provide any links to a code repository.
Open Datasets	Yes	MO-Mu Jo Co is a popular MORL benchmark based on the Mo Joc Co physics simulation environment, consisting of several continuous control tasks. We conduct experiments under five different environments, including MO-Half Cheetah-v2, MO-Hopper-v2, MO-Ant-v2, MO-Swimmer-v2, and MO-Walker-v2. The number of objectives is two for all environments in MO-Mu Jo Co. FTN is a discrete MORL benchmark with six objectives, whose goal is to navigate the tree to harvest fruit to optimize six nutritional values on specific preferences.
Dataset Splits	No	The paper describes using continuous simulation environments (MO-Mu Jo Co) and a discrete navigation task (FTN) for Reinforcement Learning, where agents interact with the environment rather than using static datasets with predefined train/test/validation splits. It mentions sampling preferences and using a replay buffer but does not specify dataset splits in the conventional sense for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general experimental settings without hardware specifics.
Software Dependencies	No	The paper mentions using specific algorithms like Double Deep Q-Network (DDQN) and Twin Delayed Deep Deterministic Policy Gradient (TD3), and techniques like Hindsight Experience Replay (HER). However, it does not specify any software names with their version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions) that would be needed for replication.
Experiment Setup	No	The paper mentions general experimental settings such as "number E of episodes, number K of weights per episode, replay buffer D, batch size N" in Algorithm 1, and that "For the choice of α, we pick the value that owns the best performance in grid search experiments, as shown in Appendix A.6." and "The full details of the MLP model can be found in Appendix A in the full version." However, it defers concrete hyperparameter values or detailed configurations to an appendix not provided, or only describes them generically without specific values in the main text. The main text only states that "The reference points for hypervolume calculation are set to (0, 0)" and that experiments are run "with six different random seeds", which are not comprehensive setup details.