reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Episodic Novelty Through Temporal Distance

Authors: Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo XU, Chongjie Zhang, Qianchuan Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various benchmark tasks demonstrate that ETD significantly outperforms state-of-the-art methods, highlighting its effectiveness in enhancing exploration in sparse reward CMDPs. ... We validated this observation using the Mini Gird Door Key16x16 experiment, as shown in Figure 1. ... Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency.
Researcher Affiliation	Academia	1Tsinghua University 2The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences 3Washington University in St. Louis
Pseudocode	Yes	Algorithm 1 Episodic Novelty through Temporal Distance
Open Source Code	Yes	Code is availabe at https://github.com/Jackory/ETD.
Open Datasets	Yes	Through extensive experiments on various CMDP benchmark tasks, including Mini Grid (Chevalier-Boisvert et al., 2023), Crafter (Hafner, 2022), and Mini World (Chevalier-Boisvert et al., 2023), we show that ETD significantly outperforms state-of-the-art methods, improving exploration efficiency. ... We further conduct experiments in Deep Mind Control (Tassa et al., 2018) (DMC) and Meta World (Yu et al., 2020) and Half Cheetah Vel Sparse.
Dataset Splits	No	The paper mentions using procedurally generated environments for MiniGrid and MiniWorld, and
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It focuses on software implementation and hyperparameters.
Software Dependencies	Yes	In the experiments, all methods are implemented based on PPO. We primarily follow the implementation of DEIR*, which is based on Stable Baselines 3 (version 1.1.0).
Experiment Setup	Yes	E.2 HYPERPARAMETERS We found that applying batch normalization to all non-RNN layers could significantly boost the learning speed, especially in environments with stable observations, a finding also noted in the DEIR paper. We use Adam optimizer with ̑̑ = 1e 5, ̑1 = 0.9, ̑2 = 0.999. ...Tables 2-11 provide detailed hyperparameters for ETD and baselines in Mini Grid, Crafter, Mini World, DMC, Meta World, and Half Cheetah environments, including PPO rollout steps, learning rates, mini-batch sizes, entropy coefficients, and intrinsic reward coefficients.