reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy Search with High-Dimensional Context Variables

Authors: Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on three problems. We start by studying C-MORE behavior in a scenario where we know the true reward model and the true low-dimensional context. Subsequently, we focus our attention on two simulated robotic ball hitting tasks. In the ﬁrst task, a toy 2-Do F planar robot arm has to hit a ball placed on a plane. In the second task, a simulated 6-Do F robot arm has to hit a ball placed in a three-dimensional space.
Researcher Affiliation	Academia	Voot Tangkaratt The University of Tokyo, 113-0033 Tokyo, Japan EMAIL Herke van Hoof Mc Gill University, 3480 Rue University, Montreal, Canada Technical University of Darmstadt, 64289 Darmstadt, Germany Simone Parisi Technical University of Darmstadt, 64289 Darmstadt, Germany EMAIL Gerhard Neumann University of Lincoln, LN6 7TS Lincoln, United Kingdom Technical University of Darmstadt, 64289 Darmstadt, Germany EMAIL Jan Peters MPI for Intelligent Systems, 72076 Tuebingen, Germany Technical University of Darmstadt, 64289 Darmstadt, Germany EMAIL Masashi Sugiyama The University of Tokyo, 277-8561 Chiba, Japan RIKEN AIP Center, 351-0198 Saitama, Japan EMAIL
Pseudocode	Yes	Algorithm 1: C-MORE
Open Source Code	No	The paper does not provide any specific links or statements about the availability of its source code.
Open Datasets	No	The paper uses a "synthetic task with known ground truth" and "robotic ball hitting tasks based on camera images" where the images were collected or generated by the authors. No concrete access information (link, DOI, formal citation to a public dataset) is provided for these datasets.
Dataset Splits	Yes	For C-MORE Nuc. Norm, C-MORE LASSO and C-MORE PCA, we perform 5-fold cross-validation every 100 policy updates to choose the values of regularization parameter for nuclear norm, regularization parameter for ℓ1 norm, and dimension dz, respectively.
Hardware Specification	No	The paper describes simulated robot arms and tasks but does not specify the hardware (e.g., CPU, GPU models) on which these simulations were run.
Software Dependencies	No	The paper mentions software like IPOPT and APG but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set γ = 0.99 and H0 = 150. The sampling Gaussian distribution is initialized with random mean and covariance Q = 10,000I. For learning, we collect 35 new samples and keeps track of the samples collected during the last 20 iterations to stabilize the policy update. The learning is performed for a maximum of 100 iterations. If the KL divergence is lower than 0.1, then the learning is considered to be converged and the policy is not updated anymore.