Episodic Novelty Through Temporal Distance

Authors: Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo XU, Chongjie Zhang, Qianchuan Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various benchmark tasks demonstrate that ETD significantly outperforms state-of-the-art methods, highlighting its effectiveness in enhancing exploration in sparse reward CMDPs. ... We validated this observation using the Mini Gird Door Key16x16 experiment, as shown in Figure 1. ... Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency.
Researcher Affiliation Academia 1Tsinghua University 2The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences 3Washington University in St. Louis
Pseudocode Yes Algorithm 1 Episodic Novelty through Temporal Distance
Open Source Code Yes Code is availabe at https://github.com/Jackory/ETD.
Open Datasets Yes Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency. ... We further conduct experiments in Deep Mind Control (Tassa et al., 2018) (DMC) and Meta World (Yu et al., 2020) and Half Cheetah Vel Sparse.
Dataset Splits No The paper mentions using procedurally generated environments for MiniGrid and MiniWorld, and
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It focuses on software implementation and hyperparameters.
Software Dependencies Yes In the experiments, all methods are implemented based on PPO. We primarily follow the implementation of DEIR*, which is based on Stable Baselines 3 (version 1.1.0).
Experiment Setup Yes E.2 HYPERPARAMETERS We found that applying batch normalization to all non-RNN layers could significantly boost the learning speed, especially in environments with stable observations, a finding also noted in the DEIR paper. We use Adam optimizer with ̑̑ = 1e 5, ̑1 = 0.9, ̑2 = 0.999. ...Tables 2-11 provide detailed hyperparameters for ETD and baselines in Mini Grid, Crafter, Mini World, DMC, Meta World, and Half Cheetah environments, including PPO rollout steps, learning rates, mini-batch sizes, entropy coefficients, and intrinsic reward coefficients.