reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy Gradient with Kernel Quadrature

Authors: Satoshi Hayakawa, Tetsuro Morimura

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the theoretical background of this procedure as well as its numerical illustrations in Mu Jo Co tasks. ... To demonstrate the effectiveness of our proposed methods, we conducted experiments on Mu Jo Co tasks since they are widely recognized as standard benchmarks in RL, even though the reward calculation for them is lightweight.
Researcher Affiliation	Collaboration	Satoshi Hayakawa EMAIL Mathematical Institute, University of Oxford Tetsuro Morimura EMAIL Cyber Agent, Inc.
Pseudocode	Yes	Algorithm 1 Policy gradient Algorithm 2 Vanilla PGKQ Algorithm 3 PGKQ with non-centered GP
Open Source Code	No	The paper mentions using a third-party library ('machina3 library' with a GitHub link) and the implementation of specific components ('mψ and kψ') but does not provide a direct link or explicit statement about the availability of the authors' own source code for the PGKQ methodology described.
Open Datasets	Yes	We used Mu Jo Co (v2.1.0, Todorov et al., 2012) with the Gymnasium API (Towers et al., 2023).
Dataset Splits	No	The paper mentions batch sizes (N=64 and n=8 episodes) and maximum episode length (1000) but does not provide specific details on how datasets were split into training, validation, or testing sets for experimental reproduction.
Hardware Specification	Yes	All the experiments with Mu Jo Co were conducted with a Google Cloud Vertex AI notebook with an NVIDIA T4 (16-core v CPU, 60 GB RAM).
Software Dependencies	No	All the experiments were conducted by using Py Torch (Paszke et al., 2019) and Adam (Kingma & Ba, 2015). ... We used the implementation of the machina3 library.
Experiment Setup	Yes	The learning rates of the policy, baseline, and GP-related networks were all set to 3 × 10−4. ... The discount rate was γ = 0.995. ... In all the experiments, we used three-layer fully connected Re LU neural networks (NNs) for each of mψ and kψ, where kψ(z, z ) was computed by passing the NN-embeddings of state-action pairs z and z to the Gaussian kernel with additional scale and noise parameters.