reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning

Authors: Wilka Torrico Carvalho, Andrew Kyle Lampinen, Kyriacos Nikiforou, Felix Hill, Murray Shanahan

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study task suites in both 2D and 3D environments and ﬁnd that FARM better generalizes compared to competing architectures that leverage attention or multiple modules. We present the training and generalization success rates in Figure 4.
Researcher Affiliation	Collaboration	Wilka Carvalho EMAIL Kempner Institute for the Study of Natural and Artiﬁcial Intelligence Harvard University Andrew K. Lampinen EMAIL Kyriacos Nikiforou EMAIL Felix Hill EMAIL Murray Shanahan EMAIL Google Deep Mind
Pseudocode	No	The paper provides a schematic overview of the architecture in Figure 2 and summarizes computations with equations (1)-(6), but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	Work done during internship. Codebase: https://github.com/wcarvalho/farm.
Open Datasets	Yes	We study this with the Ballet grid-world (Lampinen et al., 2021) shown in Figure 1 (a). Here, we study the 3D Unity environment from Hill et al. (2020) shown in Figure 1 (b). To study this, we create the key Box environment depicted in Figure 1 (c).
Dataset Splits	Yes	Training tasks always consists of seeing m = {2, 4} dancers; testing tasks always consists of seeing m = {8} dancers. During training the agent sees A D and B C in a 4m 4m room with 4 distractors, along with A C and B D in a 3m 3m room with 0 distractors. We test the agent on A C and B D in a 4m 4m room with 4 distractors. Learning tasks include levels 1 to nmax = 10. Test tasks only use levels 2nmax and 3nmax.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, cloud/cluster specifications) are mentioned in the paper for running the experiments.
Software Dependencies	No	The paper mentions using ResNet, Convolutional LSTM, LSTM, multihead-attention, IMPALA algorithm, and Adam optimizer, along with citations to their original works. However, it does not specify version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used for implementation.
Experiment Setup	Yes	We tune hyperparameters for all architectures with the Place X next to Y task from the Baby AI environment (Chevalier-Boisvert et al., 2019) ( B.2). We expand on implementation details in D. For details on hyperparameters, see E. All agents learn with a sample budget of 2 billion frames.