reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

Authors: Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the eﬀectiveness of our approach over Truncated-BPTT on a prediction benchmark inspired by animal learning and by doing policy evaluation of pre-trained policies for Atari 2600 games. We evaluate our algorithms on two partially observable benchmarks to estimate values (prediction).
Researcher Affiliation	Academia	Khurram Javed EMAIL Haseeb Shah EMAIL Richard S. Sutton EMAIL Martha White EMAIL Alberta Machine Intelligence Institute (Amii) Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Pseudocode	Yes	Algorithm 1 TD(λ) for online prediction Require: A diﬀerentiable learner vθ Require: Step-size parameters α Require: Discount factor γ Initialize x as ﬁrst observation Initialize eligibility z to 0 y = vθ(x) while true do Observe next x and the cumulant c y = vθ(x ) δ = c + γy y z = λγz + vθ(x) θ = θ + αδz y = y
Open Source Code	No	The paper describes implementation details but does not provide a statement of open-sourcing the code or a link to a repository: "We implement all methods in C++. For columnar, constructive, and CCN approaches, we use the update equations derived in Appendix B. We verify the correctness of the gradients computed by our derived equations, and our implementation of T-BPTT by comparing them to the gradients computed by Py Torch for networks initialized to have the same parameters."
Open Datasets	Yes	First, we use an existing animal-learning benchmark (Raﬁee et al., 2022), which has low-dimensional inputs and a focus on the need for memory the only way to make accurate predictions is to remember information from many steps in the past. Second, to test the algorithms in more complex image-based environments, we make a new benchmark based on ALE (Arcade Learning Environment) (Bellemare et al. 2013).
Dataset Splits	No	The paper describes an online learning setting and continuous evaluation rather than traditional dataset splits: "Traditionally, learning performance is evaluated on a held-out test set. While the train and test distinction is important in oﬄine learning, when the learner has access to the complete data set, it is unnecessary when it sees the data online and is always evaluated on the next unseen data point before using it for learning."
Hardware Specification	No	The paper mentions clusters but lacks specific hardware models: "We run all experiments on large CPU clusters. A single run of the trace patterning task for 10 million steps takes around 1 minutes on a single CPU, whereas a single run on Atari for 30 million steps takes around 2 hours. Both experiments take less than 2 GB of ram per run. We used 1,000 CPUs spread across the Cedar, Narval, and Beluga clusters provided by The Digital Research Alliance of Canada for running the experiments."
Software Dependencies	No	The paper mentions software but without specific version numbers: "We implement all methods in C++. For columnar, constructive, and CCN approaches, we use the update equations derived in Appendix B. We verify the correctness of the gradients computed by our derived equations, and our implementation of T-BPTT by comparing them to the gradients computed by Py Torch for networks initialized to have the same parameters."
Experiment Setup	Yes	We use TD(λ) (Sutton 1984, 1988; Tesauro 1995) for learning. The full algorithm is in Appendix A.2. We set the per-step compute budget to 4,000 ﬂoating point operations and treat multiplication, addition, division, and subtraction as one operation each. We use λ = 0.99, and γ = 0.90 and report the learning curves for 10 million steps. At each point in the curve, we plot the error over the previous 100,000 data-points we plot L(t 100, 000, t) as a function of t. For each method, we individually tune the step-size, ϵ, steps-per-stage, features-per-stage, and the truncation length; we report the results for the best performing conﬁguration. Details of hyperparameter tuning are in Appendix A.1.