Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Topological Hidden Markov Models

Authors: Adam B Kashlak, Prachi Loliencar, Giseon Heo

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the versatility of this methodology by applying it to simulated diffusion processes such as Brownian and fractional Brownian sample paths as well as the Ornstein-Uhlenbeck process. Our methodology is applied to the identification of sleep states from overnight polysomnography time series data with the aim of diagnosing Obstructive Sleep Apnea in pediatric patients. It is also applied to a series of annual cumulative snowfall curves from 1940 to 1990 in the city of Edmonton, Alberta.
Researcher Affiliation Academia Adam B Kashlak EMAIL Department of Mathematical and Statistical Sciences University of Alberta Edmonton, AB, T6G 2G1 Canada; Prachi Loliencar EMAIL School of Dentistry University of Alberta Edmonton, AB, T6G 1C9 Canada; Giseon Heo EMAIL School of Dentistry University of Alberta Edmonton, AB, T6G 1C9 Canada
Pseudocode Yes Algorithm 1 The THMM Baum-Welch Algorithm and Algorithm 2 The Viterbi Algorithm
Open Source Code Yes R code to recreate these simulations can be found at https://github.com/cachelack/Topological-Hidden-Markov-Model.git. This also includes the THMM variants of the Baum-Welch and Viterbi algorithms.
Open Datasets Yes R code to recreate these simulations can be found at https://github.com/cachelack/Topological-Hidden-Markov-Model.git. This also includes the THMM variants of the Baum-Welch and Viterbi algorithms. In this section, we consider 50 years (winters) of cumulative snowfall growth curves from the city of Edmonton, Alberta as recorded by the Meteorological Service of Canada (see https://climate.weather.gc.ca/).
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits. For simulated data, it mentions T = 200 samples. For the OSA data, it states There were 948 epochs in patient CF050 but no splits. For snowfall data, it mentions 50 years (winters).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several R packages and functions (e.g., 'mclust package', 'mhsmm package', 'optim function', 'kmeans.fd function from the fda.usc R package', 'ksmooth in the base stats package in R') but does not specify their version numbers.
Experiment Setup Yes For a simple setting to test the THMM algorithm, we simulate a sequence of T = 200 Brownian sample paths with 5 different states corresponding to different drift parameters... two principal components were used for the f PCA HMM approach. In this simulation, five states were once again used to generate data with means ยต = (-2, 0, 4, 2, 1) and c = (4, 4, 8, 2, 20)... The sampling rate for EEG is 512 samples per second. Each signal was split into a sequence of epochs, i.e. 30 second intervals... For each of the five methods considered, 20 models were fit and the fitted model that returned the highest likelihood was kept. Table 12 computes the ARI for the predicted state sequence for each pairing of fitted models.