Dynamic Bottleneck for Robust Self-Supervised Exploration

Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang

NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments. We evaluate SSE-DB on Atari games. We conduct experiments to compare the following methods.
Researcher Affiliation Collaboration Chenjia Bai Harbin Institute of Technology EMAIL Lingxiao Wang Northwestern University EMAIL Lei Han Tencent Robotics X EMAIL Animesh Garg University of Toronto, Vector Institute, NVIDIA EMAIL Jianye Hao Tianjin University EMAIL Peng Liu Harbin Institute of Technology EMAIL Zhaoran Wang Northwestern University EMAIL
Pseudocode Yes We refer to Appendix B for the pseudocode of training DB model. Algorithm 1 SSE-DB
Open Source Code Yes The codes are available at https://github.com/Baichenjia/DB.
Open Datasets Yes We evaluate all methods on Atari games with high-dimensional observations. The selected 18 games are frequently used in previous approaches for efficient exploration.
Dataset Splits No The paper uses Atari games for evaluation but does not explicitly specify exact training/validation/test splits, only that results are from training without extrinsic rewards and evaluated on those.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments. It mentions 'computation resources' in the acknowledgements, but no specific models or specifications.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper discusses the overall approach and model architecture but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It refers to Appendix D for implementation details, but these are not provided in the main paper for analysis.