Determine the Number of States in Hidden Markov Models via Marginal Likelihood
Authors: Yang Chen, Cheng-Der Fuh, Chu-Lan Michael Kao
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method. Keywords: hidden Markov models, model selection, marginal likelihood, consistency, normalizing constant |
| Researcher Affiliation | Academia | Yang Chen EMAIL Department of Statistics University of Michigan 1085 South University Ann Arbor, MI 48109-1107, USA Cheng-Der Fuh EMAIL Graduate Institute of Statistics National Central University No. 300, Zhongda Rd., Zhongli District, Taoyuan City 320317, Taiwan Chu-Lan Michael Kao EMAIL Institute of Statistics National Yang-Ming Chiao-Tung University 1001 University Road, Hsinchu 30010, Taiwan |
| Pseudocode | Yes | We now give our procedure for estimating the marginal likelihood p K(y1:n) for each K {1, 2, . . . , K}. 1. Obtain posterior samples. Sample from p(φK|y1:n) using a preferred MCMC algorithm, and denote the samples by {φ(i) K }N i=1 (where N is often a few thousand). 2. Find a good importance function. Fit a mixture model using the samples {φ(i) K }N i=1, where the number of mixing components is given by either (a) any clustering algorithm, or (b) a pre-fixed number which is large enough. Construct the importance function g( ) by fitting a Gaussian mixture or using a heavier-tailed density as the mixture component; for example, a student-t distribution with a small degree of freedom, such as 2 or 3, with the same location and scale parameters as the fitted Gaussian mixture components. 3. Choose a finite region. Choose ΩK to be a bounded subset of the parameter space such that 1/2 < R ΩK g(φK)dφK < 1. This can be achieved by finding an appropriate finite region for each mixing component of g( ), avoiding the tail parts. 4. Estimate p K(y1:n) using either way as follows: Reciprocal importance sampling. Approximate p K(y1:n) by ˆp(RIS) K (y1:n) = ΩK g(φK)dφK p(y1:n, φ(i) K ) 1ΩK(φ(i) K ) where 1Ω(x) is 1 if x Ω, and zero otherwise. Importance sampling. (a) Draw M independent samples from g( ), denoted by {ψ(j) K }1 j M. (b) Approximate p K(y1:n) by ˆp(IS) K (y1:n) = 1 MPΩ p(y1:n, ψ(j) K ) g(ψ(j) K ) 1ΩK(ψ(i) K ), (17) where PΩ= #S/N with S = {i : φ(i) K ΩK; 1 i N}. |
| Open Source Code | Yes | (4) An easy-to-use R package, HMMmlselect, which implements our algorithm, is provided and publicly available at CRAN (https://cran.r-project.org/package=HMMmlselect). |
| Open Datasets | Yes | We apply the proposed marginal likelihood method to a set of single-molecule experimental data studied in Chen et al. (2016b), where single-molecule experiments are conducted to study the co-translational protein targeting process |
| Dataset Splits | No | The paper describes simulation studies where data is generated (e.g., "Simulate n observations from the HMM") and real-world data analysis from a prior study, but it does not specify any training/test/validation dataset splits for reproducibility. The simulation setup describes how data is created, not partitioned. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to conduct the numerical experiments or simulations. |
| Software Dependencies | No | The paper mentions an "R package, HMMmlselect" and refers to the "Baum-Welch (EM) algorithm" but does not specify version numbers for R or any other software dependencies. |
| Experiment Setup | Yes | In the numerical experiments, we fix the mean parameters of a K-state HMM to be µ = (1, 2, . . . , K) and vary the variances σ2 = (σ2, . . . , σ2) in the first set of simulation studies. The equal variances assumption is adopted here for simplicity of the presentation of the results, but this is not part of the model assumptions. We will relax this assumption in the second set of simulations. We consider four kinds of transition matrices, corresponding to flat (P (1) K ), moderate and strongly diagonal (P (2) K , P (3) K ) and strongly off-diagonal (P (4) K ) cases: ... The number of observations, n, varies from 200 to 2000, and the true number of hidden states, K, ranges from 3 to 5. ... We conduct m = 200 repeated simulations, each of which compares the marginal likelihood method with the BIC as follows. ... The prior mean for the mean parameters of the K states are set as the K equally spaced quantile levels between 0.05 and 0.95 of the observed trajectory, and the corresponding variances are set to be 1002. The priors for each row of the transition matrix are flat, that is, Dirichlet with hyperparameters all equal to 1. For the scaled-inverse chi-square prior for the variance parameters, we set the degree of freedom parameter ν = 3 and the scale parameter s2 as: s = quantile(y, 0.75) quantile(y, 0.25)))/(2 K) |