reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Grounding Language for Transfer in Deep Reinforcement Learning

Authors: Karthik Narasimhan, Regina Barzilay, Tommi Jaakkola

JAIR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.
Researcher Affiliation	Academia	Karthik Narasimhan Department of Computer Science Princeton University 35 Olden Street, Princeton, NJ 08540 USA; Regina Barzilay Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology 32 Vassar Street, Cambridge, MA 02139 USA; Tommi Jaakkola Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology 32 Vassar Street, Cambridge, MA 02139 USA
Pseudocode	Yes	Algorithm 1 MULTITASK TRAIN (E) Algorithm 2 EPS-GREEDY (s, Q, Z, ϵ)
Open Source Code	Yes	Code for the experiments in this paper is available at https://github.com/karthikncode/Grounded-RL-Transfer.
Open Datasets	Yes	We perform experiments on a series of 2-D environments within the GVGAI framework (Perez Liebana et al., 2016), which is used in an annual video game AI competition.
Dataset Splits	No	The paper discusses 'source tasks' and 'target tasks' for transfer learning, and 'level variants' for different map layouts (e.g., 'These three games have ﬁve level variants each'). However, it does not provide specific training/test/validation dataset splits in terms of percentages or sample counts for any single dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. It only implies the use of computational resources for deep learning models.
Software Dependencies	No	The paper mentions using the Adam optimization scheme and describes neural network architectures (CNNs, DQN, VIN) but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch, CUDA).
Experiment Setup	Yes	For all models, we set γ = 0.8, \|D\|= 250k, and the embedding size d = 10. We used the Adam (Kingma & Ba, 2014) optimization scheme with a learning rate of 10 4, annealed linearly to 5 10 5. The minibatch size was set to 32. ϵ was annealed from 1 to 0.1 in the source tasks and set to 0.1 in the target tasks. For the value iteration module (VIN), we experimented with different levels of recurrence, k {1, 2, 3, 5} and found k = 1 or k = 3 to work best. For DQN, we used two convolutional layers followed by a single fully connected layer, with Re LU non-linearities. The CNNs in the VIN had ﬁlters and strides of length 3. The CNNs in the model-free component used ﬁlters of sizes {4, 2} and corresponding strides of size {3, 2}. All embeddings are initialized at random.