reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Select before Act: Spatially Decoupled Action Repetition for Continuous Control

Authors: Buqing Nie, Yangqing Fu, Yue Gao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work. The training curves are illustrated in Fig. 3, and the AUC scores are shown in Table 1. The results of episode return, APR, and AFR are shown in Table 2.
Researcher Affiliation	Academia	Buqing Nie1, Yangqing Fu1, Yue Gao1,2 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Shanghai Innovation Institute, Shanghai, P.R. China EMAIL
Pseudocode	Yes	Algorithm 1 Spatially Decoupled Action Repetition (SDAR) Algorithm
Open Source Code	No	The paper mentions using SAC implementation and hyper-parameter settings proposed in Clean RL (Huang et al., 2022)1, and that Tempo RL and UTE are implemented based on their official repositories, and TAAC is implemented using its official implementation. However, there is no explicit statement or link provided for the open-source code of the SDAR methodology developed in this paper.
Open Datasets	Yes	Tasks: In this work, we conduct experiments on multiple continuous control tasks, which are categorized into the following three types of scenarios. More details are given in Appendix B.2. (a) Classic Control: Several control tasks with small observation and action spaces, including Mountain Car Continuous, Lunar Lander Continuous, and Bipedal Walker. (b) Locomotion: Locomotion tasks based on the Mu Jo Co (Todorov et al., 2012) simulation environment: Walker2d, Hopper, Half Cheetah, Humanoid, and Ant. (c) Manipulation tasks including Pusher, Reacher, and Fetch Reach. All tasks are constructed based on Gymnasium (Plappert et al., 2018). The Fetch Pickand Place and Fetch Reach tasks are implemented by Gymnasium-Robotics (Plappert et al., 2018).
Dataset Splits	No	This study trains each method on various tasks using multiple random seeds over a range of 100K to 3M steps, depending on the complexity of the task. More settings including hyperparameters settings are described in Appendix B.1. The paper does not explicitly mention training/test/validation dataset splits, which are typically less applicable to reinforcement learning environments where data is generated through interaction rather than pre-split static datasets.
Hardware Specification	Yes	In this work, we conduct all experiment utilizing NVIDIA RTX 3090 GPU and Pytorch 2.1 with CUDA 12.2.
Software Dependencies	Yes	In this work, we conduct all experiment utilizing NVIDIA RTX 3090 GPU and Pytorch 2.1 with CUDA 12.2.
Experiment Setup	Yes	Table 4: Hyper-parameter settings for SDAR algorithm. Parameter Setting Learning rate (π) 3 10 4 Learning rate (β) 3 10 4 Learning rate (Q) 1 10 3 Learning rate (α) 1 10 3 Optimizer Adam Discount factor γ 0.99 Batch size 256 Policy delay 2 Soft update τ 0.005 Sample number (b) 10. In addition, we need to tune the target entropies Hβ and Hπ to improve the efficiency of the entropy-based exploration described in Eq. (10).