Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Authors: Dohyeong Kim, Mineui Hong, Jeongho Park, Songhwai Oh
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks. The proposed method has been evaluated across diverse environments, including multi-objective tasks with and without constraints. The experimental results confirmed that avoiding gradient conflicts is effective in preventing convergence to local optima. |
| Researcher Affiliation | Academia | Dohyeong Kim, Mineui Hong, Jeongho Park, and Songhwai Oh Dep. of Electrical and Computer Engineering and ASRI, Seoul National University. Corresponding author: EMAIL. |
| Pseudocode | Yes | Algorithm 1 Policy Update Using Co MOGA |
| Open Source Code | No | The paper references third-party libraries like 'qpsolvers' and 'Stable-Baselines3' but does not provide a statement or link for the source code specific to the methodology described in this paper. |
| Open Datasets | Yes | This section evaluates the proposed method and baselines across various tasks with and without constraints. First, we explain how methods are evaluated on the tasks and then present the CMORL baselines. We utilize single-agent and multi-agent goal tasks in the Safety Gymnasium (Ji et al., 2023). The legged robot locomotion tasks (Kim et al., 2023) are to control a quadrupedal or bipedal robot to follow randomly given commands while satisfying three constraints. We conduct experiments in the Multi-Objective (MO) Gymnasium (Alegre et al., 2022), which is a well-known MORL environment, to examine whether the proposed method also performs well on unconstrained MORL tasks. |
| Dataset Splits | No | The paper describes using various simulation environments (Safety Gymnasium, Legged Robot Locomotion, MO Gymnasium) where agents interact with dynamic environments. It mentions elements being 'randomly spawned' or 'randomly sampled commands', indicating dynamic environment generation rather than fixed train/test/validation splits for a static dataset. No specific dataset split percentages, counts, or methodologies are provided. |
| Hardware Specification | Yes | In all experiments, we used a PC equipped with an Intel Xeon CPU E5-2680 and an NVIDIA TITAN Xp GPU. |
| Software Dependencies | No | The paper mentions 'qpsolvers: Quadratic Programming Solvers in Python, 2024' and 'Stable-Baselines3: Reliable reinforcement learning implementations (Raffin et al., 2021)'. While these indicate software used, they do not provide specific version numbers for libraries or frameworks like 'Python 3.8' or 'PyTorch 1.9'. |
| Experiment Setup | Yes | C.3.4 HYPERPARAMETER SETTINGS We report the hyperparameter settings for the CMORL tasks (Safety-Gymnasium, Locomotion) in Table 4 and the settings for the MORL tasks (MO-Gymnasium) in Table 5. |