Contrastive Active Inference

Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt

NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare the contrastive AIF method to likelihood-based AIF and MBRL in high-dimensional image-based settings. Our experimentation aims to answer the following questions: (i) is it possible to achieve high-dimensional goals with AIF-based methods? (ii) what is the difference in performance between RL-based and AIF-based methods? (iii) does contrastive AIF perform better than likelihood-based AIF? (iv) in what contexts contrastive methods are more desirable than likelihood-based methods? (v) are AIF-based methods resilient to variations in the environment background?
Researcher Affiliation Academia Pietro Mazzaglia IDLab Ghent University EMAIL Tim Verbelen IDLab Ghent University EMAIL Bart Dhoedt IDLab Ghent University EMAIL
Pseudocode Yes The training routine, which alternates updates to the models with data collection, is shown in Algorithm 1.
Open Source Code No The paper mentions external resources like gym-minigrid and DeepMind Control Suite, but does not provide a link or explicit statement for its own source code.
Open Datasets Yes We performed experiments on the Empty 6 6 and the Empty 8 8 environments from the Mini Grid suite [8]... We performed continuous-control experiments on the Reacher Easy and Hard tasks from the Deep Mind Control (DMC) Suite [48] and on Reacher Easy from the Distracting Control Suite [47].
Dataset Splits No The paper describes how data is collected during training episodes and how performance is evaluated on trajectories, but does not specify fixed train/validation/test dataset splits in terms of percentages or counts for reproducibility.
Hardware Specification No Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix.
Software Dependencies No Relevant parameterization for the experiments can be found in the next section, while hyperparameters and a detailed description of each network are left to the Appendix.
Experiment Setup Yes For the 6 6 task, the world model is trained by sampling B = 50 trajectories of length L = 7, while the behavior model is trained by imagining H = 6 steps long trajectories. For the 8 8 task, we increased the length L to 11 and the imagination horizon H to 10. For both tasks, we first collected R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after collecting a new trajectory. ... For both tasks, the world model is trained by sampling B = 30 trajectories of length L = 30, while the behavior model is trained by imagining H = 10 steps long trajectories. We first collect R = 50 random episodes, to populate the replay buffer, and train for U = 100 steps after every new trajectory.