reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

C2IQL: Constraint-Conditioned Implicit Q-learning for Safe Offline Reinforcement Learning

Authors: Zifan Liu, Xinran Li, Jun Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results on DSRL benchmarks demonstrate the superiority of C2IQL compared to baseline methods in achieving higher rewards while guaranteeing safety constraints under different threshold conditions. We evaluate C2IQL in Bullet Safety-Gym (Gronauer, 2022) and Safety Gymnasium (Ji et al., 2023) with DSRL datasets under different threshold conditions.
Researcher Affiliation	Academia	1Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China. Correspondence to: Jun Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Cost Reconstruction Model and Algorithm 2 C2IQL are presented in the paper, outlining structured steps for the proposed methods.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	Environments and Datasets. We evaluate C2IQL in Bullet Safety-Gym (Gronauer, 2022)... We use the DSRL (Liu et al., 2023a) dataset, which follows the D4RL (Fu et al., 2020) benchmark format.
Dataset Splits	No	The paper mentions using DSRL datasets but does not explicitly provide details about specific training, validation, or test splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	Yes	Experiments are carried out on NVIDIA Ge Force RTX 3080 GPUs.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9), which are necessary for replication.
Experiment Setup	Yes	For C2IQL, the structure and most hyperparameters follow IQL (Kostrikov et al., 2022). Table 4 shows the hyperparameters of our algorithms. The discount factor of the reward is fixed at 0.99 and the number of discount factors for the cost is 3. ... For the cost reconstruction model, we use a 5-layer MLP with hidden dimensions of 512 for each layer. ... We pre-train the reconstruction model for 1e6 epochs for each environment. Table 4. Hyperparameters Hyperparameters Value κ1 0.7 κ2 0.9 γ (reward) 0.99 m 3 Batch size 512 Learning rate of V 1e-3 Learning rate of Q 1e-3 Learning rate of π 3e-4 Training steps 4e5 Testing frequency 5e3