Online Multi-Task Learning for Policy Gradient Methods
Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor
ICML 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and show that PG-ELLA outperforms standard policy gradients both in the initial and final performance. |
| Researcher Affiliation | Academia | Haitham Bou Ammar EMAIL Eric Eaton EMAIL University of Pennsylvania, Computer and Information Science Department, Philadelphia, PA 19104 USA Paul Ruvolo EMAIL Olin College of Engineering, Needham, MA 02492 USA Matthew E. Taylor EMAIL Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164 USA |
| Pseudocode | Yes | Algorithm 1 PG-ELLA (k, λ, µ) |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for their methodology is open-source or publicly available. |
| Open Datasets | No | The paper describes benchmark dynamical systems and how tasks were generated by varying parameters, but it does not provide concrete access information (link, DOI, citation with author/year, or mention of a standard public dataset with access details) for a publicly available or open dataset used for training. For example, it states: 'We first generated 30 tasks for each domain by varying the system parameters over the ranges given in Table 1.' |
| Dataset Splits | No | The paper mentions 'The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks.' However, it does not provide specific data splits (percentages or counts) for training, validation, or testing for the main experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The paper mentions 'e NAC' but does not specify a version number for this or any other software dependencies. |
| Experiment Setup | Yes | At each learning session, PG-ELLA was limited to 50 trajectories (for SM & CP) or 20 trajectories (for 3CP) with 150 time steps each to perform the update. Learning ceased once PG-ELLA had experienced at least one session with each task. To configure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks. The stepsize for each task domain was determined by a line search after gathering 10 trajectories of length 150. |