Bayesian Distance Clustering

Authors: Leo L. Duan, David B. Dunson

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering, Mixture model, Model-based clustering, Model misspecification, Pairwise distance matrix, Partial likelihood
Researcher Affiliation Academia Leo L Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA David B Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708, USA
Pseudocode Yes Algorithm 1: The pseudocode of the No-U-Turn Hamiltonian Monte Carlo sampler for the Bayesian distance clustering.
Open Source Code No No explicit statement or link for the open-sourcing of the code described in this paper is provided. The paper mentions the use of 'hamiltorch package (Cobb and Jalaian, 2020)' which is a third-party tool, not the authors' own code for their methodology.
Open Datasets Yes To assess the performance, we use the MNIST data of hand-written digits of 0–9, with each image having p = 28 × 28 pixels.
Dataset Splits No For the MNIST data, 'In each experiment, we take n = 10,000 random samples to fit the clustering models, among which each digit has approximately 1000 samples, and we repeat experiments 10 times.' No specific train/test/validation splits or random seeds are provided for reproducibility of the exact sample partitioning. For the brain data, the paper states 'We take the mid-coronal section of 41 × 58 voxels. Excluding the empty ones outside the brain, they have a sample size of n = 1781.' which describes the data selection but not experimental splits.
Hardware Specification Yes To provide some running time, using a quad-core i7 CPU, at n = 1000, the HMC algorithm takes about 20 minutes for running 10,000 iterations.
Software Dependencies No The paper mentions 'BFGS optimization algorithm (implemented in the PyTorch package)' and 'No-U-Turn Sampler (NUTS-HMC) algorithm (Hoffman and Gelman, 2014) implemented in the hamiltorch package (Cobb and Jalaian, 2020)'. However, specific version numbers for PyTorch or hamiltorch are not provided.
Experiment Setup Yes To favor small values for the mode while accommodating a moderate degree of uncertainty, we use a Gamma prior αh Gamma(1.5, 1.0). For conjugacy, we choose an inverse-gamma prior for σh with E(σh) = βh, σh Inverse-Gamma(2, βσ), βσ = 1. In this article, we use t = 0.1 as a balance between the approximation accuracy and the numeric stability of the algorithm. To run the HMC sampler, we use the No-U-Turn Sampler (NUTS-HMC) algorithm (...) implemented in the hamiltorch package (Cobb and Jalaian, 2020), which also automatically tunes the other two working parameters ϵ and L. For clustering, we use an over-fitted mixture with k = 20 and small Dirichlet concentration parameter α = 1/20.