reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Density Estimation in Infinite Dimensional Exponential Families

Authors: Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage of the proposed estimator grows as d increases. (from Abstract) Section 6. Numerical Simulations We have proposed an estimator of p0 that is obtained by minimizing the regularized empirical Fisher divergence and presented its consistency along with convergence rates. As discussed in Section 1, however one can simply ignore the structure of P and estimate p0 in a completely non-parametric fashion, for example using the kernel density estimator (KDE). In fact, consistency and convergence rates of KDE are also well-studied (Tsybakov, 2009, Chapter 1) and the kernel density estimator is very simple to compute requiring only O(n) computations compared to the proposed estimator, which is obtained by solving a linear system of size nd nd. This raises questions about the applicability of the proposed estimator in practice, though it is very well known that KDE performs poorly for moderate to large d (Wasserman, 2006, Section 6.5). In this section, we numerically demonstrate that the proposed score matching estimator performs signiﬁcantly better than the KDE, and in particular, that the advantage with the proposed estimator grows as d gets large.
Researcher Affiliation	Academia	Bharath Sriperumbudur EMAIL Department of Statistics, Pennsylvania State University University Park, PA 16802, USA. Kenji Fukumizu EMAIL The Institute of Statistical Mathematics 10-3 Midoricho, Tachikawa, Tokyo 190-8562 Japan. Arthur Gretton EMAIL ORCID 0000-0003-3169-7624 Gatsby Computational Neuroscience Unit, University College London Sainsbury Wellcome Centre, 25 Howland Street, London W1T 4JG, UK Aapo Hyv arinen EMAIL Gatsby Computational Neuroscience Unit, University College London Sainsbury Wellcome Centre, 25 Howland Street, London W1T 4JG, UK Revant Kumar EMAIL College of Computing, Georgia Institute of Technology 801 Atlantic Drive, Atlanta, GA 30332, USA.
Pseudocode	No	The paper describes steps for an estimator like fλ,n = ( ˆC + λI) 1ˆξ and refers to solving linear systems or QCQP problems, but it does not present these procedures in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper does not provide an explicit statement about the release of open-source code for the methodology described, nor does it include a direct link to a code repository. It mentions that the proposed estimator has been used by other works, but this does not constitute releasing the code for this paper's methodology.
Open Datasets	No	The paper uses synthetic data generated from a 'standard normal distribution on Rd, N(0, Id) and mixture of Gaussians, 2φd(x; α1n, Id) + 1 2φd(x; β1n, Id)' for its numerical simulations. It does not provide access information (links, DOIs, or citations) for any publicly available or open dataset.
Dataset Splits	No	The paper mentions using 'i.i.d. random samples (Xa)n a=1 drawn from an unknown density p0' for estimation and evaluates accuracy 'based on 10000 random samples drawn i.i.d. from p0(x)'. However, it does not provide details on how the dataset used for estimation is split into training, testing, or validation sets for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, processor types, memory amounts) used for running its numerical simulations in Section 6 or elsewhere.
Software Dependencies	No	The paper mentions using 'non-parametric kernel density estimator (KDE)' and 'Gaussian kernel' but does not specify any particular software packages, libraries, or their version numbers used for implementation.
Experiment Setup	Yes	In our simulations, we chose r = 0.1, c = 0.5, α = 4 and β = 4. The base measure of the exponential family is N(0, 102Id). The bandwidth parameter σ is chosen by cross-validation (CV) of the objective function ˆJλ (see Theorem 4(iv)) within the parameter set {0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6} σ , where σ is the median of pairwise distances of data, and the regularization parameter λ is set as λ = 0.1 n 1/3 with sample size n. For KDE, the Gaussian kernel is used for the smoothing kernel, and the bandwidth parameter is chosen by CV from {0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0} σ ; where for both the methods, 5-fold CV is applied.