Maximum Total Correlation Reinforcement Learning
Authors: Bang You, Puze Liu, Huaping Liu, Jan Peters, Oleg Arenz
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our algorithm on simulated robotic control tasks and show that the learned policies induce more periodic and better compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks. ... 5. Experimental Evaluation |
| Researcher Affiliation | Academia | 1School of Information Engineering, Wuhan University of Technology, Wuhan, China 2Department of Computer Science, Tsinghua University, Beijing, China 3Intelligent Autonomous Systems Lab, Technische Universität Darmstadt, Darmstadt, Germany 4Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany 5Hessian Centre for Artificial Intelligence (Hessian.AI) 6Centre for Cognitive Science (Cog Sci). Correspondence to: Huaping Liu <EMAIL>. |
| Pseudocode | No | The paper describes the proposed algorithm, MTC-RL, and its components, objective functions, and optimization strategies in detail within Sections 3 and 4, and Appendix A. However, it does not present a formal pseudocode block or algorithm box. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ Bang You01/MTC. |
| Open Datasets | Yes | We performed experiments to investigate how our total correlation objective compares to vanilla soft-actor critic (Haarnoja et al., 2018) and the closely related alternative methods RPC (Eysenbach et al., 2021), LZSAC (Saanum et al., 2023) and SPAC (Saanum et al., 2023) in terms of performance on the original RL objective (Sec. 5.1 and Sec. 5.4), robustness to noise, dynamics mismatch and spurious correlation (Sec. 5.2), and consistency of the resulting trajectories (Sec. 5.3). ... eight continuous control tasks from the Deep Mind Control (DMC) (Tassa et al., 2018), ... eight robotic manipulation tasks from the Metaworld benchmark (Yu et al., 2020). ... six image-based DMC tasks from the Planet benchmark (Hafner et al., 2019). |
| Dataset Splits | Yes | We initialize the replay buffer with 5000 samples from the initial policy and train all agents for 1 million steps. We evaluate the agent every 20000 steps. ... For each task, the episode length is set to 1000 steps, and the action vector is bounded into [-1, 1]. ... Each run includes 30 evaluation trajectories. ... For each run, we collect 10 evaluation episodes. |
| Hardware Specification | Yes | We performed every experiment on an Intel(R) Xeon(R) E5-2620 CPU with Ge Force GTX 2080 Ti graphics card and used approximately one day for training. |
| Software Dependencies | No | We implement our algorithm on top of the common Py Torch implementation of the SAC algorithm (Yarats et al., 2021). We use the official implementation provided by Saanum et al. (2023) to obtain the results for LZ-SAC, since the official implementation is based on the same codebase of SAC and the hyperparameters has been tuned to achieve good results on DMC tasks. ... The LSTM module is implemented using the common nn.LSTM class provided by Py Torch. ... We measure the compressibility of trajectories using the bzip2 algorithm, which is easily available by installing the common bz2 python package. While PyTorch and bzip2 are mentioned, specific version numbers for these software components or Python itself are not provided. |
| Experiment Setup | Yes | We use the default hyperparameters from that implementation unless specified otherwise. Detailed descriptions of the SAC implementation are available in (Yarats et al., 2021). ... Table 2. Hyperparameters used in MTC. ... Table 3. Hyperparameters used in Image-based tasks. ... B.2. Implementation Details |