Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning
Authors: Wilka Torrico Carvalho, Andrew Kyle Lampinen, Kyriacos Nikiforou, Felix Hill, Murray Shanahan
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study task suites in both 2D and 3D environments and find that FARM better generalizes compared to competing architectures that leverage attention or multiple modules. We present the training and generalization success rates in Figure 4. |
| Researcher Affiliation | Collaboration | Wilka Carvalho EMAIL Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University Andrew K. Lampinen EMAIL Kyriacos Nikiforou EMAIL Felix Hill EMAIL Murray Shanahan EMAIL Google Deep Mind |
| Pseudocode | No | The paper provides a schematic overview of the architecture in Figure 2 and summarizes computations with equations (1)-(6), but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | Yes | Work done during internship. Codebase: https://github.com/wcarvalho/farm. |
| Open Datasets | Yes | We study this with the Ballet grid-world (Lampinen et al., 2021) shown in Figure 1 (a). Here, we study the 3D Unity environment from Hill et al. (2020) shown in Figure 1 (b). To study this, we create the key Box environment depicted in Figure 1 (c). |
| Dataset Splits | Yes | Training tasks always consists of seeing m = {2, 4} dancers; testing tasks always consists of seeing m = {8} dancers. During training the agent sees A D and B C in a 4m 4m room with 4 distractors, along with A C and B D in a 3m 3m room with 0 distractors. We test the agent on A C and B D in a 4m 4m room with 4 distractors. Learning tasks include levels 1 to nmax = 10. Test tasks only use levels 2nmax and 3nmax. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, cloud/cluster specifications) are mentioned in the paper for running the experiments. |
| Software Dependencies | No | The paper mentions using ResNet, Convolutional LSTM, LSTM, multihead-attention, IMPALA algorithm, and Adam optimizer, along with citations to their original works. However, it does not specify version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used for implementation. |
| Experiment Setup | Yes | We tune hyperparameters for all architectures with the Place X next to Y task from the Baby AI environment (Chevalier-Boisvert et al., 2019) ( B.2). We expand on implementation details in D. For details on hyperparameters, see E. All agents learn with a sample budget of 2 billion frames. |