reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Multi-Task Learning for Policy Gradient Methods

Authors: Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PG-ELLA on four dynamical systems, including an application to quadrotor control, and show that PG-ELLA outperforms standard policy gradients both in the initial and ﬁnal performance.
Researcher Affiliation	Academia	Haitham Bou Ammar EMAIL Eric Eaton EMAIL University of Pennsylvania, Computer and Information Science Department, Philadelphia, PA 19104 USA Paul Ruvolo EMAIL Olin College of Engineering, Needham, MA 02492 USA Matthew E. Taylor EMAIL Washington State University, School of Electrical Engineering and Computer Science, Pullman, WA 99164 USA
Pseudocode	Yes	Algorithm 1 PG-ELLA (k, λ, µ)
Open Source Code	No	The paper does not include any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets	No	The paper describes benchmark dynamical systems and how tasks were generated by varying parameters, but it does not provide concrete access information (link, DOI, citation with author/year, or mention of a standard public dataset with access details) for a publicly available or open dataset used for training. For example, it states: 'We first generated 30 tasks for each domain by varying the system parameters over the ranges given in Table 1.'
Dataset Splits	No	The paper mentions 'The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks.' However, it does not provide specific data splits (percentages or counts) for training, validation, or testing for the main experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	To conﬁgure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The paper mentions 'e NAC' but does not specify a version number for this or any other software dependencies.
Experiment Setup	Yes	At each learning session, PG-ELLA was limited to 50 trajectories (for SM & CP) or 20 trajectories (for 3CP) with 150 time steps each to perform the update. Learning ceased once PG-ELLA had experienced at least one session with each task. To conﬁgure PG-ELLA, we used e NAC (Peters & Schaal, 2008) as the base policy gradient learner. The dimensionality k of the latent basis L was chosen independently for each domain via cross-validation over 10 tasks. The stepsize for each task domain was determined by a line search after gathering 10 trajectories of length 150.