Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

Authors: Florian Felten, El-Ghazali Talbi, Grégoire Danoy

JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. This section shows that the MORL/D framework is not limited to theoretical analysis but is also fit for application. MORL/D has been instantiated to tackle two contrasting multi-objective benchmarks (Alegre et al., 2022), showing how versatile the framework can be. Indeed, the two studied environments contain concave and convex PF, and involve continuous and discrete observations and actions. Each set of experiments has been run 10 times over various seeds for statistical robustness. Additionally, to give a reference, we compare MORL/D results against state-of-the-art methods.
Researcher Affiliation Academia Florian Felten EMAIL Sn T, University of Luxembourg El-Ghazali Talbi EMAIL CNRS/CRISt AL, University of Lille FSTM/DCS, University of Luxembourg Gr egoire Danoy EMAIL FSTM/DCS, University of Luxembourg Sn T, University of Luxembourg
Pseudocode Yes Algorithm 1 Reinforcement learning process. Algorithm 2 Population based MOO/D, high-level framework. Algorithm 3 MORL/D high-level framework.
Open Source Code Yes Our code can be found in the MORL-Baselines repository (Felten et al., 2023). The code used for experiments is available in MORL-Baselines https://github.com/Lucas Alegre/ morl-baselines.
Open Datasets Yes Environment implementations are available in MO-Gymnasium (Alegre et al., 2022). The first problem, mo-halfcheetah-v4 (Figure 11a), presents a multi-objective adaptation of the well-known Mujoco problems (Todorov, Erez, & Tassa, 2012). The second problem of interest is referred to as deep-sea-treasure-concave-v0 (Figure 11b), wherein the agent assumes control of a submarine navigating through a grid-world environment (Vamplew et al., 2011).
Dataset Splits No The environments are interactive simulation environments, not static datasets. Therefore, traditional training/test/validation splits are not applicable. The paper mentions 'Episodes for policy evaluation 5' in the hyperparameters, indicating an evaluation phase for policies within the dynamic environment rather than a partitioning of a fixed dataset.
Hardware Specification No Our experiments have been run on the high-performance computer of the University of Luxembourg (Varrette et al., 2014). This mentions a computing resource but lacks specific details like GPU models, CPU models, or memory specifications.
Software Dependencies No Our code can be found in the MORL-Baselines repository (Felten et al., 2023) and Pymoo projects (Blank & Deb, 2020). Practically, the SAC implementation from Huang et al. (2022) was modified by including multi-objective critics and adding a scalarization function. While specific projects and implementations are mentioned, specific version numbers for these software dependencies (e.g., PyTorch, TensorFlow, or specific versions of MORL-Baselines, Pymoo, or the SAC implementation) are not provided.
Experiment Setup Yes Appendix A. Reproducibility This section presents the hyperparameter values used in our experiments. Table 1: Hyperparameters for MORL/D on mo-halfcheetah-v4. Table 2: Hyperparameters for MORL/D on deep-sea-treasure-concave-v0. These tables list specific values for various parameters like population size, total environment steps, buffer size, batch size, learning rates, network architecture (hidden neurons, activation function), and more.