Pareto Set Learning for Multi-Objective Reinforcement Learning
Authors: Erlong Liu, Yu-Chang Wu, Xiaobin Huang, Chengrui Gao, Ren-Jian Wang, Ke Xue, Chao Qian
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China School of Artificial Intelligence, Nanjing University, Nanjing 210023, China EMAIL |
| Pseudocode | Yes | Algorithm 1: PSL-MORL Input: Preference distribution Λ, environment E, number E of episodes, number K of weights per episode, replay buffer D, batch size N Output: Hypernetwork parameters ϕ and primary policy network parameters θ1 |
| Open Source Code | No | The paper does not contain any explicit statements about code release, nor does it provide any links to a code repository. |
| Open Datasets | Yes | MO-Mu Jo Co is a popular MORL benchmark based on the Mo Joc Co physics simulation environment, consisting of several continuous control tasks. We conduct experiments under five different environments, including MO-Half Cheetah-v2, MO-Hopper-v2, MO-Ant-v2, MO-Swimmer-v2, and MO-Walker-v2. The number of objectives is two for all environments in MO-Mu Jo Co. FTN is a discrete MORL benchmark with six objectives, whose goal is to navigate the tree to harvest fruit to optimize six nutritional values on specific preferences. |
| Dataset Splits | No | The paper describes using continuous simulation environments (MO-Mu Jo Co) and a discrete navigation task (FTN) for Reinforcement Learning, where agents interact with the environment rather than using static datasets with predefined train/test/validation splits. It mentions sampling preferences and using a replay buffer but does not specify dataset splits in the conventional sense for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general experimental settings without hardware specifics. |
| Software Dependencies | No | The paper mentions using specific algorithms like Double Deep Q-Network (DDQN) and Twin Delayed Deep Deterministic Policy Gradient (TD3), and techniques like Hindsight Experience Replay (HER). However, it does not specify any software names with their version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions) that would be needed for replication. |
| Experiment Setup | No | The paper mentions general experimental settings such as "number E of episodes, number K of weights per episode, replay buffer D, batch size N" in Algorithm 1, and that "For the choice of α, we pick the value that owns the best performance in grid search experiments, as shown in Appendix A.6." and "The full details of the MLP model can be found in Appendix A in the full version." However, it defers concrete hyperparameter values or detailed configurations to an appendix not provided, or only describes them generically without specific values in the main text. The main text only states that "The reference points for hypervolume calculation are set to (0, 0)" and that experiments are run "with six different random seeds", which are not comprehensive setup details. |