Consistent Model-based Clustering using the Quasi-Bernoulli Stick-breaking Process
Authors: Cheng Zeng, Jeffrey W Miller, Leo L Duan
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulations and a data application of clustering brain networks, our proposed method recovers the ground-truth number of clusters, and leads to a small number of clusters. Keywords: Consistent Clustering, Exchangeable Partition Probability Function, Sparse Simplex. |
| Researcher Affiliation | Academia | Cheng Zeng EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA Jeffrey W Miller EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health Boston, MA 02115, USA Leo L Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA |
| Pseudocode | Yes | 3. Posterior Sampling Algorithm Since the quasi-Bernoulli mixture model involves a small modification to classic stickbreaking construction, we can use an efficient slice sampling algorithm [Kalli et al. (2011), as the improved version of Walker (2007)] for posterior inference. We use a sequence of decreasing positive constants ξ1, ξ2, . . . that converges to zero. In this article, we choose ξi = 0.5i for i ≥ 1 as suggested in Kalli et al. (2011). Given ci (the component assignment), consider a latent uniform ui Uniform(0, ξci), then we have a joint likelihood proportional to Πn i=1 1(ui < ξci)wci/ξcifθci(yi). We define the state of the Markov chain to be (c, θ, w, u) and the target distribution is the posterior p(c, θ, w, u | y), where y = y1:n, c = c1:n, θ = θ1:∞, and w = w1:∞. The slice sampler iterates the following steps: 1. Sample c from its full conditional. For i = 1, . . . , n: sample ci Categorical( wk) where wk = wk/ξkfθk(yi) / P {l:ξl>ui} wl/ξlfθl(yi), for k ∈ {l : ξl > ui}. Since the sequence ξ1, ξ2, . . . converges to zero, the index set {l : ξl > ui} is finite. Compute nk := P i 1(ci = k), and mk := P i 1(ci > k). 2. Sample u from its full conditional. For i = 1, . . . , n: sample ui from the uniform distribution over the interval (0, ξci). 3. Sample w from its full conditional. For k ∈ n i=1{l : ξl > ui}: Sample bk ∼ qδ1( ) + (1 − q)δϵ( ) where q = p / (p + (1 − p)ϵαIϵ(mk + α, nk + 1)). Sample βk by drawing X ∼ Beta(0,bk)(mk + α, nk + 1) and setting βk = X/bk, where Beta(0,ϵ) denotes a Beta distribution truncated to the interval (0, ϵ). Compute wk from b1:k and β1:k using Equation (2). 4. Sample θ from its full conditional. For k ∈ n i=1{l : ξl > ui}: sample θk from the distribution proportional to g(θk) Π i:ci=k fθk(yi), where g is the density of the base measure G. |
| Open Source Code | Yes | A software implementation and the steps needed to replicate the results in this paper are provided on https://github.com/zengcheng/quasi-bernoulli-stick-breaking. |
| Open Datasets | Yes | To demonstrate the ease of using our model in an advanced data analysis, we apply it to cluster multiple brain networks, collected from n = 812 subjects in the human connectome project (Marcus et al., 2011). |
| Dataset Splits | No | We first generate data with sample sizes n ∈ {50, 100, 250, 1000, 2500} from a threecomponent univariate Gaussian mixture distribution: 0.3 N(−4, 1^2)+0.3 N(0, 1^2)+0.4 N(5, 1^2). (Section 4.1) To demonstrate the ease of using our model in an advanced data analysis, we apply it to cluster multiple brain networks, collected from n = 812 subjects in the human connectome project (Marcus et al., 2011). (Section 5) |
| Hardware Specification | Yes | The algorithms are implemented in R, and run on a 4.0 GHz processor. |
| Software Dependencies | No | The algorithms are implemented in R, and run on a 4.0 GHz processor. |
| Experiment Setup | Yes | We set p = 0.9 for the quasi-Bernoulli probability in Equation (2), yielding a prior mean of no more than 1/(1 − p) = 10 components with mixture weights larger than ϵ. ... We set α = 1 and ϵ = 1/n^2.1, which satisfies the theoretical condition that ϵ = o(1/n^2) in Theorem 5. ... For each experiment, we run the Markov chain for 50,000 iterations, discard the first 20,000 as burn-ins, and use thinning by keeping only every 50th iteration. (Section 4) For the quasi-Bernoulli prior on w, we use p = 0.9, α = 1 and ϵ = 1/n^2.1. We run the MCMC sampler from Section 3 for 30, 000 iterations and discard the first 10, 000 as burn-ins. (Section 5) |