Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

Authors: Dohyeong Kim, Mineui Hong, Jeongho Park, Songhwai Oh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks. The proposed method has been evaluated across diverse environments, including multi-objective tasks with and without constraints. The experimental results confirmed that avoiding gradient conflicts is effective in preventing convergence to local optima.
Researcher Affiliation Academia Dohyeong Kim, Mineui Hong, Jeongho Park, and Songhwai Oh Dep. of Electrical and Computer Engineering and ASRI, Seoul National University. Corresponding author: EMAIL.
Pseudocode Yes Algorithm 1 Policy Update Using Co MOGA
Open Source Code No The paper references third-party libraries like 'qpsolvers' and 'Stable-Baselines3' but does not provide a statement or link for the source code specific to the methodology described in this paper.
Open Datasets Yes This section evaluates the proposed method and baselines across various tasks with and without constraints. First, we explain how methods are evaluated on the tasks and then present the CMORL baselines. We utilize single-agent and multi-agent goal tasks in the Safety Gymnasium (Ji et al., 2023). The legged robot locomotion tasks (Kim et al., 2023) are to control a quadrupedal or bipedal robot to follow randomly given commands while satisfying three constraints. We conduct experiments in the Multi-Objective (MO) Gymnasium (Alegre et al., 2022), which is a well-known MORL environment, to examine whether the proposed method also performs well on unconstrained MORL tasks.
Dataset Splits No The paper describes using various simulation environments (Safety Gymnasium, Legged Robot Locomotion, MO Gymnasium) where agents interact with dynamic environments. It mentions elements being 'randomly spawned' or 'randomly sampled commands', indicating dynamic environment generation rather than fixed train/test/validation splits for a static dataset. No specific dataset split percentages, counts, or methodologies are provided.
Hardware Specification Yes In all experiments, we used a PC equipped with an Intel Xeon CPU E5-2680 and an NVIDIA TITAN Xp GPU.
Software Dependencies No The paper mentions 'qpsolvers: Quadratic Programming Solvers in Python, 2024' and 'Stable-Baselines3: Reliable reinforcement learning implementations (Raffin et al., 2021)'. While these indicate software used, they do not provide specific version numbers for libraries or frameworks like 'Python 3.8' or 'PyTorch 1.9'.
Experiment Setup Yes C.3.4 HYPERPARAMETER SETTINGS We report the hyperparameter settings for the CMORL tasks (Safety-Gymnasium, Locomotion) in Table 4 and the settings for the MORL tasks (MO-Gymnasium) in Table 5.