Select before Act: Spatially Decoupled Action Repetition for Continuous Control

Authors: Buqing Nie, Yangqing Fu, Yue Gao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted on various continuous control scenarios, demonstrating the effectiveness of spatially decoupled repetition design proposed in this work. The training curves are illustrated in Fig. 3, and the AUC scores are shown in Table 1. The results of episode return, APR, and AFR are shown in Table 2.
Researcher Affiliation Academia Buqing Nie1, Yangqing Fu1, Yue Gao1,2 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2Shanghai Innovation Institute, Shanghai, P.R. China EMAIL
Pseudocode Yes Algorithm 1 Spatially Decoupled Action Repetition (SDAR) Algorithm
Open Source Code No The paper mentions using SAC implementation and hyper-parameter settings proposed in Clean RL (Huang et al., 2022)1, and that Tempo RL and UTE are implemented based on their official repositories, and TAAC is implemented using its official implementation. However, there is no explicit statement or link provided for the open-source code of the SDAR methodology developed in this paper.
Open Datasets Yes Tasks: In this work, we conduct experiments on multiple continuous control tasks, which are categorized into the following three types of scenarios. More details are given in Appendix B.2. (a) Classic Control: Several control tasks with small observation and action spaces, including Mountain Car Continuous, Lunar Lander Continuous, and Bipedal Walker. (b) Locomotion: Locomotion tasks based on the Mu Jo Co (Todorov et al., 2012) simulation environment: Walker2d, Hopper, Half Cheetah, Humanoid, and Ant. (c) Manipulation tasks including Pusher, Reacher, and Fetch Reach. All tasks are constructed based on Gymnasium (Plappert et al., 2018). The Fetch Pickand Place and Fetch Reach tasks are implemented by Gymnasium-Robotics (Plappert et al., 2018).
Dataset Splits No This study trains each method on various tasks using multiple random seeds over a range of 100K to 3M steps, depending on the complexity of the task. More settings including hyperparameters settings are described in Appendix B.1. The paper does not explicitly mention training/test/validation dataset splits, which are typically less applicable to reinforcement learning environments where data is generated through interaction rather than pre-split static datasets.
Hardware Specification Yes In this work, we conduct all experiment utilizing NVIDIA RTX 3090 GPU and Pytorch 2.1 with CUDA 12.2.
Software Dependencies Yes In this work, we conduct all experiment utilizing NVIDIA RTX 3090 GPU and Pytorch 2.1 with CUDA 12.2.
Experiment Setup Yes Table 4: Hyper-parameter settings for SDAR algorithm. Parameter Setting Learning rate (π) 3 10 4 Learning rate (β) 3 10 4 Learning rate (Q) 1 10 3 Learning rate (α) 1 10 3 Optimizer Adam Discount factor γ 0.99 Batch size 256 Policy delay 2 Soft update τ 0.005 Sample number (b) 10. In addition, we need to tune the target entropies Hβ and Hπ to improve the efficiency of the entropy-based exploration described in Eq. (10).