Revisiting a Design Choice in Gradient Temporal Difference Learning

Authors: Xiaochi Qian, Shangtong Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 EXPERIMENTS We now empirically compare (A t TD) with a few other TD algorithms with linear function approximation... We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995)... We report the square root of the mean squared projected Bellman error (RMSPBE) at each time step.
Researcher Affiliation Academia Xiaochi Qian Department of Computer Science University of Oxford EMAIL Shangtong Zhang Department of Computer Science University of Virginia EMAIL
Pseudocode No The algorithms (1), (3), (GTD), and (A t TD) are presented as mathematical equations (e.g., "wt+1 .= wt + αt Rt+1 + γx t+1wt x t wt. (1)", "wt+1 .= wt + αtρt+f(t) xt+f(t) γxt+f(t)+1 x t+f(t)ρtδtxt. (A t TD)"). There are no explicit "Algorithm" or "Pseudocode" blocks.
Open Source Code No We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines, not the authors' own code for A t TD. No other explicit statement or link for their code is provided.
Open Datasets Yes We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020).
Dataset Splits No We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020)... For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step)." No explicit dataset split information is provided in the main text.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned for running the experiments. The mention of "3 GHz CPU" is within a theoretical argument about memory cost, not the experimental setup.
Software Dependencies No We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines and doesn't provide specific software versions for the authors' own work or the environment.
Experiment Setup Yes For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step).