Policy Gradient Algorithms Implicitly Optimize by Continuation

Authors: Adrien Bolland, Gilles Louppe, Damien Ernst

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper provides a new theoretical interpretation and justification of these algorithms. Our main contributions are twofold. First, we define a continuation for the return of policies and formulate direct policy optimization in the optimization by continuation framework. Second, based on this framework, we study different formulations, i.e., policy parameterization and entropy regularization, of direct policy optimization.
Researcher Affiliation Academia Adrien Bolland EMAIL Montefiore Institute, University of Liège Gilles Louppe EMAIL Montefiore Institute, University of Liège Damien Ernst EMAIL Montefiore Institute, University of Liège LTCI, Telecom Paris, Institut Polytechnique de Paris
Pseudocode Yes Algorithm 1 Optimization by Continuation 1: Provide a sequence of I functions p0 p1 p I 1 2: Provide an initial variable value x 0 X for the local search 3: for i = 0, 1, . . . , I 1 do 4: x i+1 Optimize the continuation f pi by local search initialized at x i 5: end for 6: return x I
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There are no explicit statements about code release or links to repositories.
Open Datasets No The paper describes a 'car environment' in Appendix B to illustrate theoretical concepts, but it does not use or provide access to any publicly available dataset for empirical evaluation.
Dataset Splits No The paper does not use any specific datasets with defined splits for empirical evaluation. The 'car environment' is a theoretical example, not a dataset.
Hardware Specification No The paper does not provide specific hardware details used for running experiments. The work is theoretical and uses a conceptual environment.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names). The work is theoretical.
Experiment Setup No The paper does not provide specific experimental setup details such as hyperparameter values or training configurations. The work is theoretical and uses a conceptual environment for illustration.