Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Authors: Kaiyan Zhao, Yiming Wang, Yuyang Chen, Yan Li, Leong Hou U, Xiaoguang Niu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on robotic manipulation tasks in Mu Jo Co, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments. |
| Researcher Affiliation | Academia | 1School of Computer Science, Wuhan University, Wuhan, China 2State Key Laboratory of Internet of Things for Smart City, University of Macau, Macao, China 3School of Professional Education, Northwestern University, USA 4School of Artificial Intelligence, Shenzhen Polytechnic University, China EMAIL,EMAIL,EMAIL EMAIL,EMAIL |
| Pseudocode | Yes | Algorithm 1 EDER |
| Open Source Code | No | Details are available at https://arxiv.org/abs/2410.20487. (Explanation: This is a link to an arXiv preprint, not a code repository, and there is no explicit statement about code release.) |
| Open Datasets | Yes | Extensive experiments are conducted on robotic manipulation tasks in Mu Jo Co, Atari games, and realistic indoor environments in Habitat. We evaluate EDER in three environments from the Habitat-Matterport 3D Research Dataset (HM3D) [Ramakrishnan et al., 2021] |
| Dataset Splits | No | The paper uses reinforcement learning environments (Mujoco, Atari, Habitat) where data is generated through agent-environment interaction. It does not provide specific training/test/validation splits for a static dataset. |
| Hardware Specification | No | Work partially performed on the supercomputing system at the Supercomputing Center of Wuhan University and at SICC (supported by SKL-IOTSC, University of Macau). (Explanation: This statement is too general, mentioning "supercomputing system" without specific hardware details like GPU/CPU models.) |
| Software Dependencies | No | The paper mentions several algorithms (DDPG, DQN) and environments (Mujoco, Atari, Habitat) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In our framework, we partition T into multiple partial trajectories of length b, denoted as τj, each covering a state transition from t = js to t = js + b 1, where s represents the sliding step length. The trajectories are quantified by sliding the window of length s = b, where the meticulous segmentation allows us to analyze and understand the behavioral patterns of intelligent agents at different stages. The specific formula is as follows: ... For clarity, we set a sliding window of b = 2 in this part, while other values are explored in the ablation studies. |