Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

Authors: Rajarshi Guhaniyogi, Cheng Li, Terrance D. Savitsky, Sanvesh Srivastava

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section evaluates the performance of methods based on the divide-and-conquer technique for inference and predictions in Bayesian VCMs using a simulation study and a real data analysis.
Researcher Affiliation Academia Rajarshi Guhaniyogi EMAIL Department of Statistics Texas A & M University College Station, TX 77843-3143, USA, Cheng Li EMAIL Department of Statistics and Data Science National University of Singapore Singapore 117546, Singapore, Terrance D. Savitsky EMAIL U.S. Bureau of Labor Statistics Office of Survey Methods Research Washington, DC 20212, USA, Sanvesh Srivastava EMAIL Department of Statistics and Actuarial Science University of Iowa Iowa City, IA 52242, USA
Pseudocode Yes In summary, the sampling algorithm for drawing from the posterior distribution of (α, Γ, τ 2, θ1, . . . , θq) starts from an initial value of parameters (α(0), Γ(0), τ 2(0), θ(0) 1 , . . . , θ(0) q ) and cycles through the following four steps for t = 0, 1, . . . , : (a) draw ν(t+1) given y, α(t), Γ(t), τ 2(t), θ(t) 1 , . . . , θ(t) q from N(µν, Σν), where µν, Σν are defined in (28); (b) draw τ 2(t+1) given y, ν(t+1) using (32); (c) draw (α(t+1), γ(t+1)) given y, ν(t+1), τ 2(t+1) using (33) and the vectorization of γ(t+1) is reversed to obtain Γ(t+1); and (d) draw θ(t+1) 1 , . . . , θ(t+1) q given ν(t+1) using ESS, the likelihood in (36), and the relation between θa and θa in (35).
Open Source Code No Since there is no open source implementation available for the Bayesian VCMs with bivariate response vector, we implement it by ourselves following the DA-type algorithm discussed in Section 2.3. Additionally, Finley and Banerjee (2020) offer sp SVC in the sp Bayes R package for fitting spatial VCMs, which are special cases of (1) with d = 2, and fits to our simulation settings.
Open Datasets Yes The data on SST and SSS are obtained from the Hadley center observations under the met office in UK (www.metoffice.gov.uk/hadobs, more description available in Kennedy et al. (2011)).
Dataset Splits Yes In both simulations, the cardinality of the set of indexes U , where the function estimation and prediction are evaluated, is set at 300.
Hardware Specification No The paper mentions computational efficiency of algorithms and uses terms like 'massive data applications' and 'computationally challenging' but does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions 'coda R package (Plummer, 2003)' for computing effective sample size and 'sp Bayes R package' as a competitor's tool. However, it does not provide specific version numbers for these or any other software dependencies used in their own experimental setup.
Experiment Setup Yes The sampling algorithm in each subset runs for 10,000 iterations, and the Markov chain is thinned by collecting every fifth posterior sample after discarding the first 5,000 posterior samples as burn-in. ... The sampling algorithm uses a sparse GP based on the FITC approximation with r = 400 inducing points (Qui nonero-Candela and Rasmussen, 2005; Alvarez et al., 2012).