reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

Authors: François G. Ged, Maria Han Veiga

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a proof of concept, we evaluate numerically MPG on standard test benchmarks. Numerically, we successfully train agents on standard simple tasks without relying on RL tricks, and conﬁrm our theoretical ﬁndings (see Section 4).
Researcher Affiliation	Academia	Fran cois G. Ged1,2 EMAIL 1Chair of Statistical Field Theory Ecole Polytechnique F ed erale de Lausanne Lausanne, Switzerland 2Dynamical Systems in Biomathematics University of Vienna Vienna, Austria Maria Han Veiga EMAIL Department of Mathematics The Ohio State University Columbus, USA All authors are affiliated with universities: Ecole Polytechnique Fédérale de Lausanne, University of Vienna, and The Ohio State University.
Pseudocode	Yes	Algorithm 1 MPG implementation for N horizon task
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or provide a link to a code repository for the methodology described. It refers to numerical experiments but does not offer access to their implementation.
Open Datasets	Yes	Then, we study two benchmarks from Open AI: the Frozen Lake game and the Cart Pole.
Dataset Splits	No	The paper mentions "Number of episodes: 1000." for training the models on the Frozen Lake and Cart Pole tasks. However, it does not specify explicit training/testing/validation splits for the data, which is typical for simulated reinforcement learning environments where data is generated dynamically.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using a "deep neural network" and "Re LU activation function" and describes the model architecture. However, it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers.
Experiment Setup	Yes	Input: initial temperature τ0, initial learning rate η0, ﬁnal temperature τT , ﬁnal learning rate ηT τ τ0 η η0 for t = 1, ..., episodes do generate trajectory from policies {πn t , πn 1 t , ..., π1 t }: {(si, si+1, ai, ri)}n 1 i=0 for i = 1, , n do Ci = Pn 1 ℓ=n i rℓ τ log π(n ℓ) t θ(i) t+1 = θ(i) t + ηCi log π(i) t (an i\|sn i) end for decay τ, η using dτ = τT 1/episodes and dη = ηT 1/episodes Furthermore, the paper provides tables detailing the "Hyper-parameters for Frozen lake" and "Hyper-parameters for balancing cart pole task" including initial and terminal learning rates and temperatures.