Revisiting a Design Choice in Gradient Temporal Difference Learning
Authors: Xiaochi Qian, Shangtong Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 EXPERIMENTS We now empirically compare (A t TD) with a few other TD algorithms with linear function approximation... We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995)... We report the square root of the mean squared projected Bellman error (RMSPBE) at each time step. |
| Researcher Affiliation | Academia | Xiaochi Qian Department of Computer Science University of Oxford EMAIL Shangtong Zhang Department of Computer Science University of Virginia EMAIL |
| Pseudocode | No | The algorithms (1), (3), (GTD), and (A t TD) are presented as mathematical equations (e.g., "wt+1 .= wt + αt Rt+1 + γx t+1wt x t wt. (1)", "wt+1 .= wt + αtρt+f(t) xt+f(t) γxt+f(t)+1 x t+f(t)ρtδtxt. (A t TD)"). There are no explicit "Algorithm" or "Pseudocode" blocks. |
| Open Source Code | No | We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines, not the authors' own code for A t TD. No other explicit statement or link for their code is provided. |
| Open Datasets | Yes | We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020). |
| Dataset Splits | No | We consider two benchmark tasks, Boyan s chain (Boyan, 2002) and Baird s counterexample (Baird, 1995), which are also used in Ghiassian et al. (2020)... For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step)." No explicit dataset split information is provided in the main text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned for running the experiments. The mention of "3 GHz CPU" is within a theoretical argument about memory cost, not the experimental setup. |
| Software Dependencies | No | We base our implementation on the open-sourced implementation from Ghiassian et al. (2020)." This refers to the baselines and doesn't provide specific software versions for the authors' own work or the environment. |
| Experiment Setup | Yes | For each algorithm, we tune its learning rate in 2^-20, . . . , 2^-1, 1 and report the results with the best learning rate (in terms of minimizing RMSPBE at the last step). |