reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Authors: Tyler Kastner, Mark Rowland, Yunhao Tang, Murat A Erdogdu, Amir-Massoud Farahmand

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.
Researcher Affiliation	Collaboration	1University of Toronto 2Vector Institute 3Google Deep Mind 4Meta Platforms, Inc. Work done while at Google Deepmind 5Polytechnique Montr eal 6Mila. Correspondence to: Tyler Kastner <EMAIL>, Mark Rowland <EMAIL>.
Pseudocode	No	The paper describes algorithms through mathematical equations like Equation (5) and (6) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository or mention code in supplementary materials.
Open Datasets	Yes	Garnet domain: We sample a sparse Garnet MDP transition structure (Archibald et al., 1995).
Dataset Splits	No	The empirical evaluation section describes different MDP environments and simulation parameters (e.g., '1,000 asynchronous updates', '10,000 independent seeds') but does not discuss training/test/validation splits of a dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running experiments.
Software Dependencies	Yes	We simulate the cumulative effect of expected KL-CTD updates with small learning rate by numerically solving the flow tϕt = (T π I)pϕt , using the default scipy.integrate.solve ivp method (Virtanen et al., 2020).
Experiment Setup	Yes	Cram er-CTD and KL-CTD are both run using 40 atoms uniformly spaced on [ 30, 30], a learning rate of 4 10 3 was used for TD and Cram er-CTD, and a learning rate of 1 10 1 was used for KL-CTD.