Episodic Novelty Through Temporal Distance
Authors: Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo XU, Chongjie Zhang, Qianchuan Zhao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various benchmark tasks demonstrate that ETD significantly outperforms state-of-the-art methods, highlighting its effectiveness in enhancing exploration in sparse reward CMDPs. ... We validated this observation using the Mini Gird Door Key16x16 experiment, as shown in Figure 1. ... Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency. |
| Researcher Affiliation | Academia | 1Tsinghua University 2The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences 3Washington University in St. Louis |
| Pseudocode | Yes | Algorithm 1 Episodic Novelty through Temporal Distance |
| Open Source Code | Yes | Code is availabe at https://github.com/Jackory/ETD. |
| Open Datasets | Yes | Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency. ... We further conduct experiments in Deep Mind Control (Tassa et al., 2018) (DMC) and Meta World (Yu et al., 2020) and Half Cheetah Vel Sparse. |
| Dataset Splits | No | The paper mentions using procedurally generated environments for MiniGrid and MiniWorld, and |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It focuses on software implementation and hyperparameters. |
| Software Dependencies | Yes | In the experiments, all methods are implemented based on PPO. We primarily follow the implementation of DEIR*, which is based on Stable Baselines 3 (version 1.1.0). |
| Experiment Setup | Yes | E.2 HYPERPARAMETERS We found that applying batch normalization to all non-RNN layers could significantly boost the learning speed, especially in environments with stable observations, a finding also noted in the DEIR paper. We use Adam optimizer with ̑̑ = 1e 5, ̑1 = 0.9, ̑2 = 0.999. ...Tables 2-11 provide detailed hyperparameters for ETD and baselines in Mini Grid, Crafter, Mini World, DMC, Meta World, and Half Cheetah environments, including PPO rollout steps, learning rates, mini-batch sizes, entropy coefficients, and intrinsic reward coefficients. |