Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
Authors: Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach over Truncated-BPTT on a prediction benchmark inspired by animal learning and by doing policy evaluation of pre-trained policies for Atari 2600 games. We evaluate our algorithms on two partially observable benchmarks to estimate values (prediction). |
| Researcher Affiliation | Academia | Khurram Javed EMAIL Haseeb Shah EMAIL Richard S. Sutton EMAIL Martha White EMAIL Alberta Machine Intelligence Institute (Amii) Department of Computing Science, University of Alberta, Edmonton, AB, Canada |
| Pseudocode | Yes | Algorithm 1 TD(λ) for online prediction Require: A differentiable learner vθ Require: Step-size parameters α Require: Discount factor γ Initialize x as first observation Initialize eligibility z to 0 y = vθ(x) while true do Observe next x and the cumulant c y = vθ(x ) δ = c + γy y z = λγz + vθ(x) θ = θ + αδz y = y |
| Open Source Code | No | The paper describes implementation details but does not provide a statement of open-sourcing the code or a link to a repository: "We implement all methods in C++. For columnar, constructive, and CCN approaches, we use the update equations derived in Appendix B. We verify the correctness of the gradients computed by our derived equations, and our implementation of T-BPTT by comparing them to the gradients computed by Py Torch for networks initialized to have the same parameters." |
| Open Datasets | Yes | First, we use an existing animal-learning benchmark (Rafiee et al., 2022), which has low-dimensional inputs and a focus on the need for memory the only way to make accurate predictions is to remember information from many steps in the past. Second, to test the algorithms in more complex image-based environments, we make a new benchmark based on ALE (Arcade Learning Environment) (Bellemare et al. 2013). |
| Dataset Splits | No | The paper describes an online learning setting and continuous evaluation rather than traditional dataset splits: "Traditionally, learning performance is evaluated on a held-out test set. While the train and test distinction is important in offline learning, when the learner has access to the complete data set, it is unnecessary when it sees the data online and is always evaluated on the next unseen data point before using it for learning." |
| Hardware Specification | No | The paper mentions clusters but lacks specific hardware models: "We run all experiments on large CPU clusters. A single run of the trace patterning task for 10 million steps takes around 1 minutes on a single CPU, whereas a single run on Atari for 30 million steps takes around 2 hours. Both experiments take less than 2 GB of ram per run. We used 1,000 CPUs spread across the Cedar, Narval, and Beluga clusters provided by The Digital Research Alliance of Canada for running the experiments." |
| Software Dependencies | No | The paper mentions software but without specific version numbers: "We implement all methods in C++. For columnar, constructive, and CCN approaches, we use the update equations derived in Appendix B. We verify the correctness of the gradients computed by our derived equations, and our implementation of T-BPTT by comparing them to the gradients computed by Py Torch for networks initialized to have the same parameters." |
| Experiment Setup | Yes | We use TD(λ) (Sutton 1984, 1988; Tesauro 1995) for learning. The full algorithm is in Appendix A.2. We set the per-step compute budget to 4,000 floating point operations and treat multiplication, addition, division, and subtraction as one operation each. We use λ = 0.99, and γ = 0.90 and report the learning curves for 10 million steps. At each point in the curve, we plot the error over the previous 100,000 data-points we plot L(t 100, 000, t) as a function of t. For each method, we individually tune the step-size, ϵ, steps-per-stage, features-per-stage, and the truncation length; we report the results for the best performing configuration. Details of hyperparameter tuning are in Appendix A.1. |