reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Authors: Junyu Zhang, Chengzhuo Ni, zheng Yu, Csaba Szepesvari, Mengdi Wang

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Numerical Experiments In this experiment, we aim to evaluate the performance of the TSIVR-PG algorithm for maximizing the cumulative sum of reward. As the benchmarks, we also implement the SVRPG [49], the SRVRPG [48], the HSPGA [33], and the REINFORCE [47] algorithms. Our experiment is performed on benchmark RL environments including the Frozen Lake, Acrobot and Cartpole that are available from Open AI gym [8],
Researcher Affiliation	Academia	Junyu Zhang Department of Industrial Systems Engineering and Management National University of Singapore Singapore, 119077 EMAIL Chengzhuo Ni Department of Electrical and Computer Engineering Princeton University Princeton, NJ, 08544 EMAIL Zheng Yu Department of Electrical and Computer Engineering Princeton University Princeton, NJ, 08544 EMAIL Csaba Szepesvari Department of Computer Science University of Alberta Edmonton, Alberta, Canada T6G 2E8 EMAIL Mengdi Wang Department of Electrical and Computer Engineering Princeton University Princeton, NJ, 08544 EMAIL
Pseudocode	Yes	Algorithm 1: The TSIVR-PG Algorithm
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Our experiment is performed on benchmark RL environments including the Frozen Lake, Acrobot and Cartpole that are available from Open AI gym [8], which is a well-known toolkit for developing and comparing reinforcement learning algorithms.
Dataset Splits	No	The paper describes using standard RL environments but does not provide specific details on train/validation/test dataset splits, percentages, or methodologies for partitioning data to reproduce the experiment's data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'Open AI gym [8]' but does not provide specific version numbers for this or any other software dependencies, which would be necessary for reproducibility.
Experiment Setup	Yes	For all the algorithms, their batch sizes are chosen according to their theory. In details, let ϵ be any target accuracy. For both TSIVR-PG and SRVR-PG, we set N = Θ(ϵ 2), B = m = Θ(ϵ 1). For SVRPG, we set N = Θ(ϵ 2), B = Θ(ϵ 4/3) and m = Θ(ϵ 2/3). For HSPGA, we set B = Θ(ϵ 1), other parameters are calculated according to formulas in [33] given B. For REINFORCE, we set the batchsize to be N = Θ(ϵ 2). The parameter ε and the stepsize/learning rate are tuned for each individual algorithm using a grid search. For both environments, we use a neural network with two hidden layers with width 64 for both layers to model the policy. We choose σ = 0.125 in our experiment.