reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Foundations of Multivariate Distributional Reinforcement Learning

Authors: Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Mark Rowland

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, with the aid of our technical results and simulations, we identify tradeoffs between distribution representations that influence the performance of multivariate distributional RL in practice. and 6.1 Simulations: Distributional Successor Features
Researcher Affiliation	Collaboration	Harley Wiltzer Mila Québec AI Institute Mc Gill University EMAIL Jesse Farebrother Mila Québec AI Institute Mc Gill Unversity EMAIL Arthur Gretton Google Deep Mind Gatsby Unit, University College London EMAIL Mark Rowland Google Deep Mind EMAIL
Pseudocode	Yes	Algorithm 1 Projected Categorical Dynamic Programming
Open Source Code	No	The NeurIPS Paper Checklist states 'Code will be provided.', which is a future promise, not a current release of the code for the work described in the paper.
Open Datasets	No	The paper describes using '100 random MDPs, with transitions drawn from Dirichlet priors and 2-dimensional cumulants drawn from uniform priors.' This indicates custom-generated data rather than a specific, named, publicly available dataset with a concrete access link or formal citation.
Dataset Splits	No	The paper does not explicitly provide details about training/test/validation dataset splits, nor does it reference predefined splits or cross-validation setups for the MDP data used in experiments.
Hardware Specification	Yes	TD-learning experiments were conducted on a NVidia A100 80G GPU to parallelize experiments.
Software Dependencies	No	The paper mentions software like 'Jax [BFH+18]' and 'Jax Opt [BBC+21]' and the 'Julia programming language [BEKS17]', but it does not provide specific version numbers for these software components (e.g., 'Jax 0.x' or 'Julia 1.x').
Experiment Setup	Yes	SGD was used for optimization, using an annealed learning rate schedule (λk)k 0 with λk = k 3/5, satisfying the conditions of Lemma 10.