reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Policy Gradient Algorithms Implicitly Optimize by Continuation

Authors: Adrien Bolland, Gilles Louppe, Damien Ernst

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper provides a new theoretical interpretation and justification of these algorithms. Our main contributions are twofold. First, we define a continuation for the return of policies and formulate direct policy optimization in the optimization by continuation framework. Second, based on this framework, we study different formulations, i.e., policy parameterization and entropy regularization, of direct policy optimization.
Researcher Affiliation	Academia	Adrien Bolland EMAIL Montefiore Institute, University of Liège Gilles Louppe EMAIL Montefiore Institute, University of Liège Damien Ernst EMAIL Montefiore Institute, University of Liège LTCI, Telecom Paris, Institut Polytechnique de Paris
Pseudocode	Yes	Algorithm 1 Optimization by Continuation 1: Provide a sequence of I functions p0 p1 p I 1 2: Provide an initial variable value x 0 X for the local search 3: for i = 0, 1, . . . , I 1 do 4: x i+1 Optimize the continuation f pi by local search initialized at x i 5: end for 6: return x I
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There are no explicit statements about code release or links to repositories.
Open Datasets	No	The paper describes a 'car environment' in Appendix B to illustrate theoretical concepts, but it does not use or provide access to any publicly available dataset for empirical evaluation.
Dataset Splits	No	The paper does not use any specific datasets with defined splits for empirical evaluation. The 'car environment' is a theoretical example, not a dataset.
Hardware Specification	No	The paper does not provide specific hardware details used for running experiments. The work is theoretical and uses a conceptual environment.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names). The work is theoretical.
Experiment Setup	No	The paper does not provide specific experimental setup details such as hyperparameter values or training configurations. The work is theoretical and uses a conceptual environment for illustration.