reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning

Authors: Zijian Guo, Weichao Zhou, Shengao Wang, Wenchao Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on the DSRL benchmarks show that CCAC significantly outperforms existing methods for learning adaptive, safe, and high-reward policies. The paper includes a dedicated section '5 EXPERIMENTS' and various performance tables and figures like 'Table 1: Evaluation results of the normalized reward and cost.', 'Figure 2: Evaluation results of reward and cost in Run and Circle tasks with different percentages of datasets being used for training.', and ablation studies in 'Figure 5: Ablation study: average performance of CCAC and its variants in Run and Circle tasks.' and 'Figure 6: Ablation study: Qc-values plots.'.
Researcher Affiliation	Academia	All authors are affiliated with 'Boston University' as indicated by '1Division of Systems Engineering, Boston University 2Department of Electrical and Computer Engineering, Boston University' and their email addresses use the '@bu.edu' domain, which is characteristic of an academic institution.
Pseudocode	Yes	The paper states, 'The overall method is summarized in Algorithm 1 in Appendix C.1.' Appendix C.1 contains 'Algorithm 1 Cost-Conditioned Actor-Critic (CCAC)', which provides a structured pseudocode block for the proposed method.
Open Source Code	Yes	The abstract explicitly states, 'The code is available at https://github.com/BU-DEPENDLab/CCAC.'
Open Datasets	Yes	The paper mentions using public benchmarks: 'Tasks. The Bullet-Safety-Gym (Gronauer, 2022) and Safety-Gymnasium (Ji et al., 2023) are public benchmarks... and DSRL (Liu et al., 2023a), a comprehensive benchmark specialized for offline safe RL, provides the offline datasets.' Additionally, the Reproducibility Statement confirms, 'The datasets used are provided from a publicly available benchmark that uses simulated dynamical control environments...'
Dataset Splits	Yes	The paper discusses specific dataset manipulations for experiments: 'To assess the effect of OOD states and actions, we use different percentages of data to train policies and then evaluate their performance.' and clarifies in Table 4, 'p = 1.0/0.75/0.5/0.25 means 100%, 75%, 50%, and 25% of the offline data is used during training respectively.' It also mentions using 'data density filter' and 'partial data filter' for creating modified datasets, as shown in Figure 8 and described in Appendix B.2.
Hardware Specification	No	No specific hardware details (such as CPU, GPU models, or memory specifications) used for running the experiments are mentioned in the paper. The 'Reproducibility Statement' only mentions 'simulated dynamical control environments' but does not specify the computing hardware used.
Software Dependencies	No	The paper refers to using existing implementations and frameworks for baselines, e.g., 'we use the OSRL1 implementation' and 'We adopt the CQL-Saute from this CQL implementation2'. However, it does not provide specific version numbers for software components like Python, PyTorch, or other libraries essential for replication.
Experiment Setup	Yes	Appendix C.2, titled 'HYPERPARAMETERS', provides a detailed 'Table 3' listing specific values for various parameters used in the experiments, including 'Actor hidden size [256, 256]', 'Critic hidden size [256, 256]', 'VAE/CVAE hidden size [512, 512, 64, 512, 512]', 'Episode length', 'Batch size', 'Training steps', 'γ 0.99', 'Actor learning rate 1e-4', 'Critic learning rate 1e-3', 'VAE/CVAE learning rate 1e-3', and 'Critic ensemble 4'. It also mentions PID parameters for certain baselines.