reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Divide-and-Conquer Fusion

Authors: Ryan S.Y. Chan, Murray Pollock, Adam M. Johansen, Gareth O. Roberts

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we consider a number of models applied to a variety datasets, and suppose the dataset is randomly split into C (disjoint) subsets. We compare the performance of our Fusion methodologies (GBF and D&C-Fusion) with other established (approximate) methodologies. To compare performance, we consider their computational run-times and Integrated Absolute Distance (IAD).
Researcher Affiliation	Academia	Ryan S.Y. Chan EMAIL The Alan Turing Institute London, NW1 2DB, UK, Murray Pollock EMAIL School of Mathematics, Statistics and Physics Newcastle University Newcastle, NE1 7RU, UK, Adam M. Johansen EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK, Gareth O. Roberts EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK
Pseudocode	Yes	Algorithm 1 gbf(C, {{x(c) 0,i, w(c) i }M i=1, Λc}c C, N, P): Generalised Bayesian Fusion (GBF)., Algorithm 2 D&C-Fusion(v, N, P): Divide-and-Conquer Fusion (D&C-Fusion)., Algorithm 3 Computing regular mesh P.
Open Source Code	Yes	All code developed for producing numerical results can be found on Git Hub at https://github.com/rchan26/DCFusion.
Open Datasets	Yes	The data used in this paper are openly available from the following links: the code to reproduce the simulated data used for the simulation studies in Sections 5.1 and 5.4.1 can be found at https://github.com/rchan26/DCFusion; the Combined Cycle Power Plant dataset (Section 5.2) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant; the Bike Sharing dataset (Section 5.3) can be accessed via the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset; the nycflights13 dataset (Section 5.4.2) is available through the nycflights13 R package on CRAN available at https://github.com/tidyverse/nycflights13.
Dataset Splits	No	The paper describes splitting the data among C cores (e.g., 'each of which is assigned a random split of the data') for generating sub-posteriors in a distributed computing context, rather than providing explicit training/test/validation splits for model evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	Yes	Statistical computations for this paper were written in R (R Core Team, 2022), C++ and Rcpp (Eddelbuettel, 2013). As a benchmark for the target f we use Stan (Carpenter et al., 2017) to implement an MCMC sampler for the target posterior distribution using the full dataset. The established methodologies we consider are Consensus Monte Carlo (CMC) (Scott et al., 2016) (implemented using the parallel MCMCcombine package in R available from https://github.com/cran/parallel MCMCcombine (Miroshnikov and Conlon, 2014)).
Experiment Setup	Yes	In implementing D&C-Fusion we set N = 10000, ζ = 0.5, ζ = 0.05, and consider both the regular and adaptive mesh variants of the temporal partition, P. We resample if the ESS falls below 0.5N. (similar for other examples). Also prior settings like 'with µj = 0 and σ2 βj = 10C for j = 0, . . . , p'.