Grounding Language for Transfer in Deep Reinforcement Learning
Authors: Karthik Narasimhan, Regina Barzilay, Tommi Jaakkola
JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively. |
| Researcher Affiliation | Academia | Karthik Narasimhan Department of Computer Science Princeton University 35 Olden Street, Princeton, NJ 08540 USA; Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 32 Vassar Street, Cambridge, MA 02139 USA; Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 32 Vassar Street, Cambridge, MA 02139 USA |
| Pseudocode | Yes | Algorithm 1 MULTITASK TRAIN (E) Algorithm 2 EPS-GREEDY (s, Q, Z, ϵ) |
| Open Source Code | Yes | Code for the experiments in this paper is available at https://github.com/karthikncode/Grounded-RL-Transfer. |
| Open Datasets | Yes | We perform experiments on a series of 2-D environments within the GVGAI framework (Perez Liebana et al., 2016), which is used in an annual video game AI competition. |
| Dataset Splits | No | The paper discusses 'source tasks' and 'target tasks' for transfer learning, and 'level variants' for different map layouts (e.g., 'These three games have five level variants each'). However, it does not provide specific training/test/validation dataset splits in terms of percentages or sample counts for any single dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. It only implies the use of computational resources for deep learning models. |
| Software Dependencies | No | The paper mentions using the Adam optimization scheme and describes neural network architectures (CNNs, DQN, VIN) but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch, CUDA). |
| Experiment Setup | Yes | For all models, we set γ = 0.8, |D|= 250k, and the embedding size d = 10. We used the Adam (Kingma & Ba, 2014) optimization scheme with a learning rate of 10 4, annealed linearly to 5 10 5. The minibatch size was set to 32. ϵ was annealed from 1 to 0.1 in the source tasks and set to 0.1 in the target tasks. For the value iteration module (VIN), we experimented with different levels of recurrence, k {1, 2, 3, 5} and found k = 1 or k = 3 to work best. For DQN, we used two convolutional layers followed by a single fully connected layer, with Re LU non-linearities. The CNNs in the VIN had filters and strides of length 3. The CNNs in the model-free component used filters of sizes {4, 2} and corresponding strides of size {3, 2}. All embeddings are initialized at random. |