Rollout Total Correlation for Deep Reinforcement Learning
Authors: Bang You, Huaping Liu, Jan Peters, Oleg Arenz
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations on a set of challenging image-based simulated control tasks show that our method achieves better sample efficiency, and robustness to both white noise and natural video backgrounds compared to leading baselines. |
| Researcher Affiliation | Academia | Bang You EMAIL School of Information Engineering Wuhan University of Technology Huaping Liu EMAIL Department of Computer Science and Technology Tsinghua University Jan Peters EMAIL Intelligent Autonomous Systems Technische Universität Darmstadt German Research Center for AI (DFKI) Hessian Centre for Artificial Intelligence (Hessian.AI) Centre for Cognitive Science (Cog Sci) Oleg Arenz EMAIL Intelligent Autonomous Systems Technische Universität Darmstadt |
| Pseudocode | Yes | B.10 Algorithm The training procedure of MTC is presented in Algorithm 1. Algorithm 1: Training Algorithm for ROTOC |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the ROTOC methodology described. It mentions using a "publicly released standard Pytorch implementation (Yarats et al., 2021b) of SAC" for baselines, but not for their own work. |
| Open Datasets | Yes | We evaluate the ROTOC on a set of challenging standard Mujoco tasks from the Deepmind control suite (Tassa et al., 2018)...the background of the Mujoco tasks is replaced by natural videos (Zhang et al., 2020) sampled from the Kinetics dataset (Kay et al., 2017). |
| Dataset Splits | No | The paper describes using tasks from the Deepmind control suite in standard, noisy, and natural video settings for training and evaluation. It does not specify explicit train/test/validation splits for a static dataset in the traditional sense, nor does it detail how the Kinetics dataset videos are split for background usage in experiments beyond being 'sampled'. |
| Hardware Specification | No | The paper mentions "a hardware donation by NVIDIA through the Academic Grant Program" in the acknowledgments, but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using a "publicly released standard Pytorch implementation" but does not specify the version number of PyTorch or any other software libraries used (e.g., Python, CUDA, numpy, etc.). |
| Experiment Setup | Yes | All hyperparameters of SAC are fixed across tasks and shown in Table B.1. Table B.1: Shared hyperparameters across tasks [lists Replay buffer capacity 100 000, Optimizer Adam, Critic Learning rate 10 3, Critic Q-function EMA 0.01, Critic target update freq 2, Actor learning rate 10 3, Actor update frequency 2, Actor log stddev bounds [-10 2], Temperature learning rate 10 3, Initial steps 1000, Discount 0.99, Initial temperature 0.1, Learning rate for ϕo, go, qψ, dυ and fo 10 4, Encoder and projection model EMA τ 0.05, Coefficient α 0.1, Coefficient λ 0.001, Chunk length 2]. Table B.2: Task-specific hyperparameters [lists Action Repeats and Batchsize for each task]. |