reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An asymptotic analysis of distributed nonparametric methods

Authors: Botond Szabó, Harry van Zanten

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To see that interesting things can happen it is exemplifying to compare the results of a distributed and a non-distributed (Bayesian) analysis of simulated data. Concretely, we consider a true signal θ consisting of the Fourier coeﬃcients of the function shown in Figure 1. For this signal we simulate data according to (2.2), with σ = 1, m = 40 and n = 120 × 40 = 4800. Then for every local observer a Bayesian procedure is carried out with a Gaussian prior on θ, postulating that the coordinates θi are independent and N(0, i−1−2α)-distributed. The hyperparameter α, which describes the regularity of the prior, is determined using a distributed version of maximum marginal likelihood, as described in Section 4. This analysis leads to m = 40 local posterior distributions. These are then combined to produce an overall posterior distribution for the signal. The precise procedure is described in Section 4. The resulting estimator for the signal, together with pointwise 95% credible intervals, is shown in the left plot in Figure 2. The corresponding non-distributed result is obtained by ﬁrst aggregating all local data as in (2.1) and then carrying out the same Bayesian procedure on these complete data. The resulting non-distributed reconstruction of the signal is shown on the right in Figure 2. ... Simulations further illustrate the theoretical results.
Researcher Affiliation	Academia	Botond Szab o EMAIL Mathematical Institute Leiden University 2333 CA Leiden The Netherlands Harry van Zanten EMAIL Korteweg-de Vries Institute for Mathematics University of Amsterdam Science Park 105-107 1098 XG Amsterdam The Netherlands
Pseudocode	No	The paper describes various statistical methods and their mathematical properties but does not contain explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide any explicit statements about the availability of source code, nor does it include links to repositories or supplementary materials for code.
Open Datasets	No	The paper analyzes a 'distributed version of the classical signal-in-Gaussian-white-noise model' and states 'For this signal we simulate data according to (2.2)'. The data used in the paper's examples are simulated, not derived from a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes theoretical analysis and simulations based on a signal-in-Gaussian-white-noise model. It does not mention training, validation, or test dataset splits, as it does not involve machine learning model training on pre-existing datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to conduct its simulations or analyses, such as CPU/GPU models, memory, or specific computing environments.
Software Dependencies	No	The paper focuses on mathematical analysis and theoretical performance of statistical methods. It does not mention any specific software, libraries, or frameworks with version numbers that would be required to reproduce the work.
Experiment Setup	Yes	To see that interesting things can happen it is exemplifying to compare the results of a distributed and a non-distributed (Bayesian) analysis of simulated data. Concretely, we consider a true signal θ consisting of the Fourier coeﬃcients of the function shown in Figure 1. For this signal we simulate data according to (2.2), with σ = 1, m = 40 and n = 120 × 40 = 4800. Then for every local observer a Bayesian procedure is carried out with a Gaussian prior on θ, postulating that the coordinates θi are independent and N(0, i−1−2α)-distributed. The hyperparameter α, which describes the regularity of the prior, is determined using a distributed version of maximum marginal likelihood, as described in Section 4. ... Simulations further illustrate the theoretical results. We have considered a true signal θ consisting of the Fourier coeﬃcients of the function shown in the left panel of Figure 3. This is a signal which has regularity β = 1 in the sense of (3.3). For this signal we simulated data according to (2.2), with σ = 1, n = 4800 and m = 40, i.e. we considered a distributed setting with m = 40 machines.