Accelerated Quality-Diversity through Massive Parallelism
Authors: Bryan Lim, Maxime Allard, Luca Grillotti, Antoine Cully
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments aim to answer the following questions: (1) How does massive parallelization affect performance of Quality-Diversity (MAP-Elites) algorithms? (2) How significant is the number of iterations/learning steps in QD algorithms? (3) What magnitude of speed-up does massive parallelization offer over existing implementations? (4) How does this differ across different hardware accelerators? Figure 3: Performance metrics across domains. The plots show the QD-Score against the number of evaluations, iterations and run-times for each batch size. The rightmost column shows the final archives. The Ant and Hopper Uni results are presented in the Appendix. The bold lines and shaded areas represent the median and interquartile range over 10 replications respectively. |
| Researcher Affiliation | Academia | Bryan Lim EMAIL Department of Computing Imperial College London Maxime Allard EMAIL Department of Computing Imperial College London Luca Grillotti EMAIL Department of Computing Imperial College London Antoine Cully EMAIL Department of Computing Imperial College London |
| Pseudocode | Yes | Algorithm 1 MAP-Elites (NB: Batch size) 1: for iteration œ J1, IK do 2: if first iteration then 3: B Ω random solutions 4: else 5: B Ω select solutions from archive A 6:  B = ( j)jœJ1,NBK Ω variation(B)a 7: for j œ J1, NBK do 8: run episode of fi j , get return R( j) and traj. ( j) 9: cell Ω get grid cell of descriptor Âd( ( j)) 10: cell Ω get content of cell 11: if cell is None then 12: Add j to cell 13: else if R( j) > R( cell) then 14: Replace cell with j in cell 15: else 16: Discard j return archive A |
| Open Source Code | Yes | (3) We release QDax, an open-source accelerated Python framework for Quality-Diversity algorithms (MAP-Elites) which enables massive parallelization on a single machine. This makes QD algorithms more accessible to a wider range of practitioners and researchers. The source code of QDax is available at https://github.com/adaptive-intelligent-robotics/QDax. |
| Open Datasets | Yes | We use the Hopper, Walker2D, Ant and Humanoid gym locomotion environments made available on Brax (Freeman et al., 2021) on these tasks. |
| Dataset Splits | No | The paper specifies a fixed number of evaluations for different tasks (e.g., "5 million evaluations for all QD-RL environments and 20 million evaluations for rastrigin and sphere") rather than defining explicit dataset splits for training, validation, or testing. The environments themselves (Gym, Brax) provide the tasks, not pre-split datasets. |
| Hardware Specification | Yes | We use a single A100 GPU to perform our experiments. We also test our implementation on two different GPU devices, a more accessible RTX2080 local device and a higher-performance A100 on Google Cloud. |
| Software Dependencies | No | The paper mentions software components like "Python", "JAX (Bradbury et al., 2018)", and "Brax (Freeman et al., 2021)" but does not provide specific version numbers for these or any other libraries or frameworks used. |
| Experiment Setup | Yes | We use fully connected neural network controllers with two hidden layers of size 64 and tanh output activation functions as policies across all QD-RL environments and tasks. We use 5 million evaluations for all QD-RL environments and 20 million evaluations for rastrigin and sphere. In all our experiments, we use the iso-line variation operator (Vassiliades & Mouret, 2018) (Appendix Algo. 2). |