Divide-and-Conquer Fusion
Authors: Ryan S.Y. Chan, Murray Pollock, Adam M. Johansen, Gareth O. Roberts
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we consider a number of models applied to a variety datasets, and suppose the dataset is randomly split into C (disjoint) subsets. We compare the performance of our Fusion methodologies (GBF and D&C-Fusion) with other established (approximate) methodologies. To compare performance, we consider their computational run-times and Integrated Absolute Distance (IAD). |
| Researcher Affiliation | Academia | Ryan S.Y. Chan EMAIL The Alan Turing Institute London, NW1 2DB, UK, Murray Pollock EMAIL School of Mathematics, Statistics and Physics Newcastle University Newcastle, NE1 7RU, UK, Adam M. Johansen EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK, Gareth O. Roberts EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK |
| Pseudocode | Yes | Algorithm 1 gbf(C, {{x(c) 0,i, w(c) i }M i=1, Λc}c C, N, P): Generalised Bayesian Fusion (GBF)., Algorithm 2 D&C-Fusion(v, N, P): Divide-and-Conquer Fusion (D&C-Fusion)., Algorithm 3 Computing regular mesh P. |
| Open Source Code | Yes | All code developed for producing numerical results can be found on Git Hub at https://github.com/rchan26/DCFusion. |
| Open Datasets | Yes | The data used in this paper are openly available from the following links: the code to reproduce the simulated data used for the simulation studies in Sections 5.1 and 5.4.1 can be found at https://github.com/rchan26/DCFusion; the Combined Cycle Power Plant dataset (Section 5.2) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant; the Bike Sharing dataset (Section 5.3) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset; the nycflights13 dataset (Section 5.4.2) is available through the nycflights13 R package on CRAN available at https://github.com/tidyverse/nycflights13. |
| Dataset Splits | No | The paper describes splitting the data among C cores (e.g., 'each of which is assigned a random split of the data') for generating sub-posteriors in a distributed computing context, rather than providing explicit training/test/validation splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | Yes | Statistical computations for this paper were written in R (R Core Team, 2022), C++ and Rcpp (Eddelbuettel, 2013). As a benchmark for the target f we use Stan (Carpenter et al., 2017) to implement an MCMC sampler for the target posterior distribution using the full dataset. The established methodologies we consider are Consensus Monte Carlo (CMC) (Scott et al., 2016) (implemented using the parallel MCMCcombine package in R available from https://github.com/cran/parallel MCMCcombine (Miroshnikov and Conlon, 2014)). |
| Experiment Setup | Yes | In implementing D&C-Fusion we set N = 10000, ζ = 0.5, ζ = 0.05, and consider both the regular and adaptive mesh variants of the temporal partition, P. We resample if the ESS falls below 0.5N. (similar for other examples). Also prior settings like 'with µj = 0 and σ2 βj = 10C for j = 0, . . . , p'. |