Divide-and-Conquer Fusion

Authors: Ryan S.Y. Chan, Murray Pollock, Adam M. Johansen, Gareth O. Roberts

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we consider a number of models applied to a variety datasets, and suppose the dataset is randomly split into C (disjoint) subsets. We compare the performance of our Fusion methodologies (GBF and D&C-Fusion) with other established (approximate) methodologies. To compare performance, we consider their computational run-times and Integrated Absolute Distance (IAD).
Researcher Affiliation Academia Ryan S.Y. Chan EMAIL The Alan Turing Institute London, NW1 2DB, UK, Murray Pollock EMAIL School of Mathematics, Statistics and Physics Newcastle University Newcastle, NE1 7RU, UK, Adam M. Johansen EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK, Gareth O. Roberts EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK
Pseudocode Yes Algorithm 1 gbf(C, {{x(c) 0,i, w(c) i }M i=1, Λc}c C, N, P): Generalised Bayesian Fusion (GBF)., Algorithm 2 D&C-Fusion(v, N, P): Divide-and-Conquer Fusion (D&C-Fusion)., Algorithm 3 Computing regular mesh P.
Open Source Code Yes All code developed for producing numerical results can be found on Git Hub at https://github.com/rchan26/DCFusion.
Open Datasets Yes The data used in this paper are openly available from the following links: the code to reproduce the simulated data used for the simulation studies in Sections 5.1 and 5.4.1 can be found at https://github.com/rchan26/DCFusion; the Combined Cycle Power Plant dataset (Section 5.2) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant; the Bike Sharing dataset (Section 5.3) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset; the nycflights13 dataset (Section 5.4.2) is available through the nycflights13 R package on CRAN available at https://github.com/tidyverse/nycflights13.
Dataset Splits No The paper describes splitting the data among C cores (e.g., 'each of which is assigned a random split of the data') for generating sub-posteriors in a distributed computing context, rather than providing explicit training/test/validation splits for model evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies Yes Statistical computations for this paper were written in R (R Core Team, 2022), C++ and Rcpp (Eddelbuettel, 2013). As a benchmark for the target f we use Stan (Carpenter et al., 2017) to implement an MCMC sampler for the target posterior distribution using the full dataset. The established methodologies we consider are Consensus Monte Carlo (CMC) (Scott et al., 2016) (implemented using the parallel MCMCcombine package in R available from https://github.com/cran/parallel MCMCcombine (Miroshnikov and Conlon, 2014)).
Experiment Setup Yes In implementing D&C-Fusion we set N = 10000, ζ = 0.5, ζ = 0.05, and consider both the regular and adaptive mesh variants of the temporal partition, P. We resample if the ESS falls below 0.5N. (similar for other examples). Also prior settings like 'with µj = 0 and σ2 βj = 10C for j = 0, . . . , p'.