Abstract
In multiple time series data, clustering the component profiles can identify meaningful latent groups while also detecting interesting change points in their trajectories. Conventional time series clustering methods, however, suffer the drawback of requiring the co-clustered units to have the same cluster membership throughout the entire time domain. In contrast to these “global” clustering methods, we develop a Bayesian “local” clustering method that allows the functions to flexibly change their cluster memberships over time. We design a Markov chain Monte Carlo algorithm to implement our method. We illustrate the method in several real-world datasets, where time-varying cluster memberships provide meaningful inferences about the underlying processes. These include a public health dataset to showcase the more detailed inference our method can provide over global clustering alternatives, and a temperature dataset to demonstrate our method’s utility as a flexible change point detection method. Supplemental materials for this article, including R codes implementing the method, are available online.
Supplementary Materials
The supplementary materials detail the choice of hyper-parameters and the MCMC algorithm used to sample from the posterior. We also include additional figures demonstrating the local clustering method’s ability to recover individual-specific curves. The data for our simulation experiment can be accessed as a separate csv file from the online supplementary materials accompanying this article. R codes implementing and demonstrating the methods developed in this article are also included in the online supplementary materials. Manuals for the codes and a ReadMe file providing additional details on how data should be formatted for compatibility with our codes are also included.
Acknowledgments
We thank the Editor, Dr. Robert Gramacy, an anonymous Associate Editor, and three anonymous referees for their thorough review of the originally submitted manuscript and their many constructive comments and suggestions which led to a significantly improved final article.
Disclosure Statement
There are no relevant financial or non-financial competing interests to report here.
Notes
1 This article concentrates specifically on the analysis of multiple time series data where each constituent series pertains to the same variable. Such data may be obtained, for example, as (a) multiple univariate time series from a set of different but comparable sources over the same time period; or (b) multiple records collected from the same source over different recurrent time cycles of same length. The two real datasets analyzed in Section 3 of the main article here correspond to one each of these two scenarios.