Learning Compositional Neural Programs with Recursive Tree Search and Planning
Authors: Thomas PIERROT, Guillaume Ligner, Scott E. Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas
NeurIPS 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that Alpha NPI can sort as well as previous strongly supervised NPI variants. The Alpha NPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search. |
| Researcher Affiliation | Collaboration | Thomas Pierrot Insta Deep EMAIL Guillaume Ligner Insta Deep EMAIL Scott Reed Deep Mind EMAIL Olivier Sigaud Sorbonne Université EMAIL Nicolas Perrin CNRS, Sorbonne Université EMAIL Alexandre Laterre Insta Deep EMAIL David Kas Insta Deep EMAIL Karim Beguir EMAIL Nando de Freitas Deep Mind EMAIL |
| Pseudocode | Yes | The search approach is depicted in Figure 3 for a Tower of Hanoi example, see also the corresponding Figure 2 of Silver et al. [2017]. A detailed description of the search process, including pseudo-code, appears in Appendix A. |
| Open Source Code | Yes | 1The code is available at https://github.com/instadeepai/Alpha NPI |
| Open Datasets | No | The paper discusses experiments on "sorting tasks" (Bubble Sort) and "Tower of Hanoi puzzle" using instances of varying lengths/disks (e.g., "lists of length 2 to 7", "problem instances with 2 disks"). However, it does not provide concrete access information (links, DOIs, formal citations) to specific publicly available datasets used for these problems. |
| Dataset Splits | No | The paper states "We validated on lists of length 7" and "After each Adam update, we perform validation on all tasks for nval episodes." and mentions using "randomly generated lists" for testing. However, it does not provide specific dataset split information (e.g., exact percentages or sample counts) for fixed training, validation, and test sets, as the data is generated during training/validation episodes. |
| Hardware Specification | No | The paper discusses training models but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and "LSTM" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | During a training iteration, the agent selects a program i to learn. It plays nep episodes (See Appendix E for specific values) using the tree search in exploration mode with a large budget of simulations. ... The agent is trained with the Adam optimizer on this data... We trained Alpha NPI to learn the sorting library of programs on lists of length 2 to 7. ... We validated on lists of length 7 and stopped when the minimum averaged validation reward, among all programs, reached curr. |