Rollout Total Correlation for Deep Reinforcement Learning

Authors: Bang You, Huaping Liu, Jan Peters, Oleg Arenz

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations on a set of challenging image-based simulated control tasks show that our method achieves better sample efficiency, and robustness to both white noise and natural video backgrounds compared to leading baselines.
Researcher Affiliation Academia Bang You EMAIL School of Information Engineering Wuhan University of Technology Huaping Liu EMAIL Department of Computer Science and Technology Tsinghua University Jan Peters EMAIL Intelligent Autonomous Systems Technische Universität Darmstadt German Research Center for AI (DFKI) Hessian Centre for Artificial Intelligence (Hessian.AI) Centre for Cognitive Science (Cog Sci) Oleg Arenz EMAIL Intelligent Autonomous Systems Technische Universität Darmstadt
Pseudocode Yes B.10 Algorithm The training procedure of MTC is presented in Algorithm 1. Algorithm 1: Training Algorithm for ROTOC
Open Source Code No The paper does not provide an explicit statement or link to the source code for the ROTOC methodology described. It mentions using a "publicly released standard Pytorch implementation (Yarats et al., 2021b) of SAC" for baselines, but not for their own work.
Open Datasets Yes We evaluate the ROTOC on a set of challenging standard Mujoco tasks from the Deepmind control suite (Tassa et al., 2018)...the background of the Mujoco tasks is replaced by natural videos (Zhang et al., 2020) sampled from the Kinetics dataset (Kay et al., 2017).
Dataset Splits No The paper describes using tasks from the Deepmind control suite in standard, noisy, and natural video settings for training and evaluation. It does not specify explicit train/test/validation splits for a static dataset in the traditional sense, nor does it detail how the Kinetics dataset videos are split for background usage in experiments beyond being 'sampled'.
Hardware Specification No The paper mentions "a hardware donation by NVIDIA through the Academic Grant Program" in the acknowledgments, but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using a "publicly released standard Pytorch implementation" but does not specify the version number of PyTorch or any other software libraries used (e.g., Python, CUDA, numpy, etc.).
Experiment Setup Yes All hyperparameters of SAC are fixed across tasks and shown in Table B.1. Table B.1: Shared hyperparameters across tasks [lists Replay buffer capacity 100 000, Optimizer Adam, Critic Learning rate 10 3, Critic Q-function EMA 0.01, Critic target update freq 2, Actor learning rate 10 3, Actor update frequency 2, Actor log stddev bounds [-10 2], Temperature learning rate 10 3, Initial steps 1000, Discount 0.99, Initial temperature 0.1, Learning rate for ϕo, go, qψ, dυ and fo 10 4, Encoder and projection model EMA τ 0.05, Coefficient α 0.1, Coefficient λ 0.001, Chunk length 2]. Table B.2: Task-specific hyperparameters [lists Action Repeats and Batchsize for each task].