reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning

Authors: Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our approach has significantly fewer constraint violations while achieving similar or better quality in several control tasks than previous best methods. Section 4 Experimental Results: We evaluate our approach on four Mu Jo Co (Todorov, Erez, and Tassa 2012) continuous control environments... Reward comparisons: Evaluation returns are computed by running five episodes per random seed every 5k training steps. Figure 3 shows that our approach SAC+CVFlow achieves comparable results... Table 1: The percentage of constraint violations during RL training.
Researcher Affiliation	Academia	School of Computing and Information Systems, Singapore Management University. EMAIL, EMAIL
Pseudocode	Yes	The pseudo-code of our proposed approach to training the CV-Flows is provided in Algorithm 1. Algorithm 1: CV-Flows Pretraining Algorithm
Open Source Code	Yes	Code https://github.com/rlr-smu/cv-flow
Open Datasets	Yes	We evaluate our approach on four Mu Jo Co (Todorov, Erez, and Tassa 2012) continuous control environments: Reacher (R), Hopper (H), Walker2D (W), and Half Cheetah (HC). We evaluate our approach on four continuous control tasks with state-wise constraints: Ball1D, Ball3D, Space-Corridor, and Space-Arena, as proposed in previous work (Dalal et al. 2018).
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, and test dataset splits. It mentions 'running five episodes per random seed every 5k training steps' for evaluation, which describes experiment execution rather than data partitioning.
Hardware Specification	No	Runtime: ... Results for timesteps per second on other tasks can be found in Figure 8 of the supplementary material, along with computing infrastructure details. The main text itself does not specify hardware details.
Software Dependencies	No	The paper mentions several software components like PyTorch, SAC, DDPG, and various environments, but it does not specify any version numbers for these software dependencies in the main text.
Experiment Setup	No	Each algorithm is trained with 10 random seeds, capped at 48 hours per run, using hyperparameters and architectures from (Kasaura et al. 2023) (details in supplementary material). The specific hyperparameter values are referred to an external paper and supplementary material, not detailed in the main text.