reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

Authors: Dohyeong Kim, Mineui Hong, Jeongho Park, Songhwai Oh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through various experiments, we have confirmed that preventing gradient conflicts is critical, and the proposed method achieves constraint satisfaction across all tasks. The proposed method has been evaluated across diverse environments, including multi-objective tasks with and without constraints. The experimental results confirmed that avoiding gradient conflicts is effective in preventing convergence to local optima.
Researcher Affiliation	Academia	Dohyeong Kim, Mineui Hong, Jeongho Park, and Songhwai Oh Dep. of Electrical and Computer Engineering and ASRI, Seoul National University. Corresponding author: EMAIL.
Pseudocode	Yes	Algorithm 1 Policy Update Using Co MOGA
Open Source Code	No	The paper references third-party libraries like 'qpsolvers' and 'Stable-Baselines3' but does not provide a statement or link for the source code specific to the methodology described in this paper.
Open Datasets	Yes	This section evaluates the proposed method and baselines across various tasks with and without constraints. First, we explain how methods are evaluated on the tasks and then present the CMORL baselines. We utilize single-agent and multi-agent goal tasks in the Safety Gymnasium (Ji et al., 2023). The legged robot locomotion tasks (Kim et al., 2023) are to control a quadrupedal or bipedal robot to follow randomly given commands while satisfying three constraints. We conduct experiments in the Multi-Objective (MO) Gymnasium (Alegre et al., 2022), which is a well-known MORL environment, to examine whether the proposed method also performs well on unconstrained MORL tasks.
Dataset Splits	No	The paper describes using various simulation environments (Safety Gymnasium, Legged Robot Locomotion, MO Gymnasium) where agents interact with dynamic environments. It mentions elements being 'randomly spawned' or 'randomly sampled commands', indicating dynamic environment generation rather than fixed train/test/validation splits for a static dataset. No specific dataset split percentages, counts, or methodologies are provided.
Hardware Specification	Yes	In all experiments, we used a PC equipped with an Intel Xeon CPU E5-2680 and an NVIDIA TITAN Xp GPU.
Software Dependencies	No	The paper mentions 'qpsolvers: Quadratic Programming Solvers in Python, 2024' and 'Stable-Baselines3: Reliable reinforcement learning implementations (Raffin et al., 2021)'. While these indicate software used, they do not provide specific version numbers for libraries or frameworks like 'Python 3.8' or 'PyTorch 1.9'.
Experiment Setup	Yes	C.3.4 HYPERPARAMETER SETTINGS We report the hyperparameter settings for the CMORL tasks (Safety-Gymnasium, Locomotion) in Table 4 and the settings for the MORL tasks (MO-Gymnasium) in Table 5.