Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Authors: Arsenii Mustafin, Alex Olshevsky, Ioannis Paschalidis

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical findings are supported by a set of experiments. We conducted experimental studies demonstrating that our theoretically derived batch size and learning rate achieve geometric convergence and outperform other algorithms that rely on parameters selected via grid search, as detailed in Section 6 and Appendix J.
Researcher Affiliation Academia Arsenii Mustafin EMAIL Department of Computer Science Boston University Alex Olshevsky EMAIL Department of Electrical and Computer Engineering Boston University Ioannis Ch. Paschalidis EMAIL Department of Electrical and Computer Engineering Boston University
Pseudocode Yes Algorithm 1 TD-SVRG for the finite sample case
Open Source Code Yes Instructions and code for reproducing the experiments can be found in our github repository.
Open Datasets Yes Databases of size 5,000 are generated from 4 environments: Random MDP (Dann et al., 2014), and the Acrobot, Cart Pole and Mountain car Open AI Gym environments (Brockman et al., 2016).
Dataset Splits No The paper discusses how samples are drawn *during* the algorithm run (e.g., "first, a trajectory of length N is drawn from an MDP following Markovian sampling and forms dataset D = {(st, at, rt, st+1)}N t=1. Then TD(0) proceeds by drawing samples from this dataset"), but it does not provide specific predefined training/test/validation splits for a fixed dataset for overall model evaluation reproducibility.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) are mentioned in the paper for running experiments.
Software Dependencies No No specific software dependencies with version numbers are provided in the paper. It mentions Open AI Gym and DQN algorithm but without version details.
Experiment Setup Yes Hyperparamters for the algorithms are selected as follows: for TD-SVRG our theoretically justified parameters are selected, the learning rate is set to α = 1/8 and the update batch size to M = 16/λA; for GTD2 the best performing parameters were: α = 0.125 and β = 0.25; for vanilla TD a decreasing learning rate is set to α = 1/√t; for PD-SVRG the parameters are set to σθ = 0.1/(Lρκ( ˆC)), σw = 0.1/λmax(C) and the batch size is twice the size of the dataset, i.e., M = 2N. Each algorithm for each setting was run 10 times and the geometric average performance is presented.