Influence-Augmented Online Planning for Complex Environments
Authors: Jinke He, Miguel Suau de Castro, Frans Oliehoek
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment. We perform online planning experiments with the POMCP planner (Silver and Veness, 2010) |
| Researcher Affiliation | Academia | Jinke He Department of Intelligent Systems Delft University of Technology EMAIL Miguel Suau Department of Intelligent Systems Delft University of Technology M.Suaude EMAIL Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology EMAIL |
| Pseudocode | Yes | Algorithm 1: Influence-Augmented Online Planning |
| Open Source Code | Yes | Our codebase was implemented in C++, including a POMCP planner and several benchmarking domains available at https://github.com/INFLUENCEorg/IAOP |
| Open Datasets | No | The paper describes creating datasets by sampling from a global simulator ('To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal'), but does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions training an RNN ('train a variant of RNN called Gated Recurrent Units (GRU) on D until convergence') but does not provide specific details on dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper states 'We ran each of our experiments for many times on a computer cluster with the same amount of computational resources' but does not provide specific hardware details such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions 'Our codebase was implemented in C++' and training a 'Gated Recurrent Units (GRU)' but does not provide specific version numbers for any software libraries, frameworks, or compilers used. |
| Experiment Setup | Yes | We perform planning with different simulators in games of {5, 9, 17, 33, 65, 129} agents for a horizon of 10 steps, where a fixed number of 1000 Monte Carlo simulations are performed per step. To obtain an approximate influence predictor ˆIθ, we sample a dataset D of 1000 episodes from the global simulator Gglobal with a uniform random policy and train a variant of RNN called Gated Recurrent Units (GRU) (Cho et al., 2014) on D until convergence. The traffic light in the center is controlled by planning, with the goal to minimize the total number of vehicles in this intersection for a horizon of 30 steps. We train an influence predictor with a RNN and evaluate the performance of all three simulators Grandom IALM , Gθ IALM and Gglobal in settings where the allowed planning time is fixed per step. |