reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

Authors: Adwait Datar, Nihat Ay

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical results with empirical studies in Section 5, extending the analysis to practical settings where only finite samples from the target distribution are available.
Researcher Affiliation	Academia	1Institute for Data Science Foundations Hamburg University of Technology 21073 Hamburg, Germany 2Santa Fe Institute Santa Fe, NM 87501, USA 3Leipzig University 04109 Leipzig, Germany
Pseudocode	No	The dynamics described by equation 30, equation 31 and equation 32 can be written in the general form x(k + 1) = (I αQ) x(k) + αQx where Q is a symmetric positive definite matrix. This is a mathematical description of the dynamics, not structured pseudocode or an algorithm block.
Open Source Code	No	No concrete access to source code is provided. The paper does not contain any statements about releasing code or links to repositories.
Open Datasets	No	Given a data sequence D sampled with respect to q, we can estimate this expectation using the empirical mean... No specific public datasets or access details are provided for the data sequence D.
Dataset Splits	No	This section aims to bridge the gap between our theoretical analysis and practical implementations by investigating empirical versions of the KL divergence and their associated optimization dynamics. ... where at each iteration k, a mini-batch Dk D is drawn uniformly at random... The paper refers to mini-batching for SGD, but does not provide specific training/test/validation dataset splits.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or specific computer configurations) are mentioned in the paper.
Software Dependencies	No	No specific software versions or libraries are mentioned in the paper.
Experiment Setup	Yes	This is illustrated in Figure 8, where we set αη = αθ = αng = 0.01 for n = 2 (left) and αη = αθ = αng = 0.001 for n = 10 (right). ... Table 1: Optimal Learning Rates and Convergence Times from Figure 9 optimal learning rate optimal convergence time η coordinates αη = 0.0036 k = 29 natural gradient αng [0.7141, 1.16] k = 2 θ coordinates αθ [11.96, 14.24] k = 12