Bayesian Distance Clustering
Authors: Leo L. Duan, David B. Dunson
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering, Mixture model, Model-based clustering, Model misspecification, Pairwise distance matrix, Partial likelihood |
| Researcher Affiliation | Academia | Leo L Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA David B Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708, USA |
| Pseudocode | Yes | Algorithm 1: The pseudocode of the No-U-Turn Hamiltonian Monte Carlo sampler for the Bayesian distance clustering. |
| Open Source Code | No | No explicit statement or link for the open-sourcing of the code described in this paper is provided. The paper mentions the use of 'hamiltorch package (Cobb and Jalaian, 2020)' which is a third-party tool, not the authors' own code for their methodology. |
| Open Datasets | Yes | To assess the performance, we use the MNIST data of hand-written digits of 0–9, with each image having p = 28 × 28 pixels. |
| Dataset Splits | No | For the MNIST data, 'In each experiment, we take n = 10,000 random samples to fit the clustering models, among which each digit has approximately 1000 samples, and we repeat experiments 10 times.' No specific train/test/validation splits or random seeds are provided for reproducibility of the exact sample partitioning. For the brain data, the paper states 'We take the mid-coronal section of 41 × 58 voxels. Excluding the empty ones outside the brain, they have a sample size of n = 1781.' which describes the data selection but not experimental splits. |
| Hardware Specification | Yes | To provide some running time, using a quad-core i7 CPU, at n = 1000, the HMC algorithm takes about 20 minutes for running 10,000 iterations. |
| Software Dependencies | No | The paper mentions 'BFGS optimization algorithm (implemented in the PyTorch package)' and 'No-U-Turn Sampler (NUTS-HMC) algorithm (Hoffman and Gelman, 2014) implemented in the hamiltorch package (Cobb and Jalaian, 2020)'. However, specific version numbers for PyTorch or hamiltorch are not provided. |
| Experiment Setup | Yes | To favor small values for the mode while accommodating a moderate degree of uncertainty, we use a Gamma prior αh Gamma(1.5, 1.0). For conjugacy, we choose an inverse-gamma prior for σh with E(σh) = βh, σh Inverse-Gamma(2, βσ), βσ = 1. In this article, we use t = 0.1 as a balance between the approximation accuracy and the numeric stability of the algorithm. To run the HMC sampler, we use the No-U-Turn Sampler (NUTS-HMC) algorithm (...) implemented in the hamiltorch package (Cobb and Jalaian, 2020), which also automatically tunes the other two working parameters ϵ and L. For clustering, we use an over-fitted mixture with k = 20 and small Dirichlet concentration parameter α = 1/20. |