31
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Variance-Reduced Stochastic Optimization for Efficient Inference of Hidden Markov Models

ORCID Icon, , , , & ORCID Icon
Received 06 Oct 2023, Accepted 24 Apr 2024, Published online: 07 Jun 2024

References

  • Adam, T., Griffiths, C., Leos Barajas, V., Meese, E., Lowe, C., et al. (2019), “Joint Modelling of Multi-scale Animal Movement Data using Hierarchical Hidden Markov Models,” Methods in Ecology and Evolution, 10, 1536–1550. DOI: 10.1111/2041-210X.13241.
  • Agarwal, N., Bullins, B., and Hazan, E. (2017), “Second-order Stochastic Optimization for Machine Learning in Linear Time,” The Journal of Machine Learning Research, 18, 4148–4187.
  • Baldi, P., and Chauvin, Y. (1993), “Smooth On-Line Learning Algorithms for Hidden Markov Models,” Neural Computation, 6, 307–318. DOI: 10.1162/neco.1994.6.2.307.
  • Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” The Annals of Mathematical Statistics, 41, 164–171. DOI: 10.1214/aoms/1177697196.
  • Bebbington, M. S. (2007), “Identifying Volcanic Regimes using Hidden Markov Models,” Geophysical Journal International, 171, 921–942. DOI: 10.1111/j.1365-246X.2007.03559.x.
  • Boyd, S., and Vandenberghe, L. (2004), Convex Optimization, Cambridge: Cambridge University Press.
  • Cade, D. E., Gough, W. T., Czapanskiy, M. F., Fahlbusch, J. A., Kahane-Rapport, S. R., Linsky, J. M. J., et al. (2021), “Tools for Integrating Inertial Sensor Data With Video Bio-Loggers, Including Estimation of Animal Orientation, Motion, and Position,” Animal Biotelemetry, 9, 34. DOI: 10.1186/s40317-021-00256-w.
  • Chen, J., and Li, P. (2009), “Hypothesis Test for Normal Mixture Models: The EM Approach,” The Annals of Statistics, 37, 2523–2542. DOI: 10.1214/08-AOS651.
  • Defazio, A., Bach, F., and Lacoste-Julien, S. (2014), “SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives,” in Advances in Neural Information Processing Systems, NIPS ’14.
  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
  • Fang, C., Li, C. J., Lin, Z., and Zhang, T. (2018), “SPIDER: Near-optimal Non-convex Optimization via Stochastic Path-integrated Differential Estimator,” in Advances in Neural Information Processing Systems (Vol. 31).
  • Fletcher, R. (2000), Newton-Like Methods, Wiley.
  • Fletcher, R., and Reeves, C. M. (1964), “Function Minimization by Conjugate Gradients,” The Computer Journal, 7, 149–154. DOI: 10.1093/comjnl/7.2.149.
  • Gales, M., and Young, S. (2007), “The Application of Hidden Markov Models in Speech Recognition,” Foundations Trends Signal Processing, 1, 195–304.
  • Glennie, R., Adam, T., Leos-Barajas, V., Michelot, T., Photopoulou, T., and McClintock, B. T. (2023), “Hidden Markov Models: Pitfalls and Opportunities in Ecology,” Methods in Ecology and Evolution, 14, 43–56. DOI: 10.1111/2041-210X.13801.
  • Gotoh, Y., Hochberg, M., and Silverman, H. (1998), “Efficient Training Algorithms for HMMs Using Incremental Estimation,” IEEE Transactions on Speech and Audio Processing, 6, 539–548. DOI: 10.1109/89.725320.
  • Gürbüzbalaban, M., Ozdaglar, A., and Parrilo, P. (2021), “Why Random Reshuffling Beats Stochastic Gradient Descent,” Mathematical Programming, 186, 49–84. DOI: 10.1007/s10107-019-01440-w.
  • Jensen, F. H., Tervo, O. M., Heide-Jørgensen, M. P., and Ditlevsen, S. (2023), “Detecting Narwhal Foraging Behaviour From Accelerometer and Depth Data Using Mixed-Effects Logistic Regression,” Animal Biotelemetry, 11, 14. DOI: 10.1186/s40317-023-00325-2.
  • Johnson, R., and Zhang, T. (2013), “Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, neurIPS, Red Hook, NY, USA: Curran Associates Inc.
  • Karimi, B., Wai, H.-T., Moulines, E., and Lavielle, M. (2019), “On the Global Convergence of (Fast) Incremental Expectation Maximization Methods,” in Advances in Neural Information Processing Systems (Vol. 32), Curran Associates, Inc.
  • Khreich, W., Granger, E., Miri, A., and Sabourin, R. (2012), “Survey of Techniques for Incremental Learning Of HMM Parameters,” Information Sciences, 197, 105–130. DOI: 10.1016/j.ins.2012.02.017.
  • Kingma, D., and Ba, J. (2015), “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations (ICLR).
  • Kleinberg, B., Li, Y., and Yuan, Y. (2018), “An Alternative View: When Does SGD Escape Local Minima?” in Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, eds. J. Dy and A. Krause, PMLR.
  • Kottaram, A., Johnston, L., Cocchi, L., Ganella, E., Everall, I., Pantelis, C., et al. (2019), “Brain Network Dynamics in Schizophrenia: Reduced Dynamism of the Default Mode Network,” Human Brain Mapping, 40, 2212–2228. DOI: 10.1002/hbm.24519.
  • Langrock, R., Adam, T., Leos Barajas, V., Mews, S., Miller, D. L., and Papastamatiou, Y. P. (2018), “Spline-Based Nonparametric Inference in General State-Switching Models,” Statistica Neerlandica, 72, 179–200. DOI: 10.1111/stan.12133.
  • Lawler, E., Whoriskey, K., Aeberhard, W. H., Field, C., and Mills Flemming, J. (2019), “The Conditionally Autoregressive Hidden Markov Model (CarHMM): Inferring Behavioural States from Animal Tracking Data Exhibiting Conditional Autocorrelation,” Journal of Agricultural, Biological and Environmental Statistics, 24, 651–668. DOI: 10.1007/s13253-019-00366-2.
  • Leos Barajas, V., Gangloff, E. J., Adam, T., Langrock, R., van Beest, F. M., Nabe-Nielsen, J., and Morales, J. M. (2017), “Multi-Scale Modeling of Animal Movement and General Behavior Data Using Hidden Markov Models with Hierarchical Structures,” Journal of Agricultural, Biological and Environmental Statistics, 22, 232–248. DOI: 10.1007/s13253-017-0282-9.
  • Li, Y. (2005), “Hidden Markov Models with States Depending on Observations,” Pattern Recognition Letters, 26, 977–984. DOI: 10.1016/j.patrec.2004.09.050.
  • Li, Z., Han, J., and Song, Y. (2020), “On the Forecasting of High-Frequency Financial Time Series Based on Arima Model Improved by Deep Learning,” Journal of Forecasting, 39, 1081–1097. DOI: 10.1002/for.2677.
  • Liu, S., Wu, H., and Meeker, W. Q. (2015), “Understanding and Addressing the Unbounded ‘Likelihood’ Problem,” The American Statistician, 69, 191–200. DOI: 10.1080/00031305.2014.1003968.
  • Mamon, R. S., and Elliott, R. J. (2007), Hidden Markov Models in Finance, New York: Springer.
  • McClintock, B. T., Langrock, R., Gimenez, O., Cam, E., Borchers, D. L., Glennie, R., and Patterson, T. A. (2020), “Uncovering Ecological State Dynamics With Hidden Markov Models,” Ecology Letters, 23, 1878–1903. DOI: 10.1111/ele.13610.
  • McRae, T., Volpov, B. E., Sidrow, S. F., Auger-Méthé, M., Heckman, N., and Trites, A. (2024), “Killer Whale Respiration Rates,” PloS One, 19, 1–26. DOI: 10.1371/journal.pone.0302758.
  • Neal, R. M., and Hinton, G. E. (1998), A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants, chapter 12, pp. 355–368, Dordrecht: Springer Netherlands.
  • Nguyen, L. M., Liu, J., Scheinberg, K., and Takáč, M. (2017), “SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient,” in International Conference on Machine Learning, PMLR.
  • Patterson, T. A., Parton, A., Langrock, R., Blackwell, P. G., Thomas, L., and King, R. (2017), “Statistical Modelling of Individual Animal Movement: An Overview of Key Methods and a Discussion of Practical Challenges,” Advances in Statistical Analysis, 101, 399–438.
  • Pearl, J. (1982), “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach,” in Proceedings of the Second AAAI Conference on Artificial Intelligence, AAAI’82, AAAI Press.
  • ———(1988), “Chapter 4: Belief Updating by Network Propagation,” in Probabilistic Reasoning in Intelligent Systems, San Francisco (CA): Morgan Kaufmann, pp. 143–237.
  • Pirotta, E., Edwards, E. W. J., New, L., and Thompson, P. M. (2018), “Central Place Foragers and Moving Stimuli: A Hidden-State Model to Discriminate the Processes Affecting Movement,” Journal of Animal Ecology, 87, 1116–1125. DOI: 10.1111/1365-2656.12830.
  • Pohle, J., Langrock, R., van Beest, M., and Schmidt, N. M. (2017), “Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement,” Journal of Agricultural, Biological and Environmental Statistics, 22, 1–24. DOI: 10.1007/s13253-017-0283-8.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: 10.1214/aoms/1177729586.
  • Schmidt, M., Le Roux, N., and Bach, F. (2017), “Minimizing Finite Sums With the Stochastic Average Gradient,” Mathematical Programming, 162, 83–112. DOI: 10.1007/s10107-016-1030-6.
  • Shamir, O. (2016), “Without-Replacement Sampling for Stochastic Gradient Methods,” in Advances in Neural Information Processing Systems (Vol. 29), Curran Associates, Inc.
  • Sidrow, E., Heckman, N., Fortune, S. M. E., Trites, A. W., Murphy, I., and Auger-Méthé, M. (2022), “Modelling Multi-Scale, State-Switching Functional Data with Hidden Markov Models,” Canadian Journal of Statistics, 50, 327–356. DOI: 10.1002/cjs.11673.
  • Tamposis, I. A., Theodoropoulou, M. C., Tsirigos, K. D., and Bagos, P. G. (2018), “Extending Hidden Markov Models to Allow Conditioning on Previous Observations,” Journal of Bioinformatics and Computational Biology, 16, 1850019. DOI: 10.1142/S0219720018500191.
  • Tennessen, J. B., Holt, M. M., Hanson, M. B., Emmons, C. K., Giles, D. A., and Hogan, J. T. (2019a), “Kinematic Signatures of Prey Capture From Archival Tags Reveal Sex Differences in Killer Whale Foraging Activity,” Journal of Experimental Biology, 222, jeb191874. DOI: 10.1242/jeb.191874.
  • Tennessen, J. B., Holt, M. M., Ward, E. J., Hanson, M. B., Emmons, C. K., Giles, D. A., and Hogan, J. T. (2019b), “Hidden Markov Models Reveal Temporal Patterns and Sex Differences in Killer Whale Behavior,” Scientific Reports, 9, 14951. DOI: 10.1038/s41598-019-50942-2.
  • Tennessen, J. B., Holt, M. M., Wright, B. M., Hanson, M. B., Emmons, C. K., Giles, D. A., Hogan, J. T., Thornton, S. J., and Deecke, V. B. (2023), “Divergent Foraging Strategies Between Populations of Sympatric Matrilineal Killer Whales,” Behavioral Ecology, 34, 373–386. DOI: 10.1093/beheco/arad002.
  • Thiesson, B., Meek, C., and Heckerman, D. (2001), “Accelerating EM For Large Databases,” Machine Learning, 45, 279–299. DOI: 10.1023/A:1017986506241.
  • Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., et al. (2019), “SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python,” arXiv e-prints, arXiv:1907.10121.
  • Viviant, M., Monestiez, P., and Guinet, C. (2014), “Can We Predict Foraging Success in a Marine Predator from Dive Patterns Only? Validation with Prey Capture Attempt Data,” PloS One, 9, 1–10. DOI: 10.1371/journal.pone.0088503.
  • Wright, B. M., Ford, J. K. B., Ellis, G. M., Deecke, V. B., Shapiro, A. D., Battaile, B. C., and Trites, A. W. (2017), “Fine-Scale Foraging Movements by Fish-Eating Killer Whales (Orcinus Orca) Relate to the Vertical Distributions and Escape Responses of Salmonid Prey (Oncorhynchus Spp.),” Movement Ecology, 5, 3. DOI: 10.1186/s40462-017-0094-0.
  • Wu, C. F. J. (1983), “On the Convergence Properties of the EM Algorithm,” The Annals of Statistics, 11, 95–103. DOI: 10.1214/aos/1176346060.
  • Ye, F. X., Ma, Y., and Qian, H. (2017), “Estimate Exponential Memory Decay in Hidden Markov Model and its Applications,” arXiv preprint arXiv:1710.06078.
  • Zhang, G., Poupart, P., and Trimponias, G. (2020), “Comparing EM with GD in Mixture Models of Two Components,” in Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, eds. R. P. Adams and V. Gogate, PMLR.
  • Zhu, R., Wang, L., Zhai, C., and Gu, Q. (2017), “High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm,” in Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, PMLR.
  • Zinkevich, M., Weimer, M., Li, L., and Smola, A. (2010), “Parallelized Stochastic Gradient Descent,” in Advances in Neural Information Processing Systems (Vol. 23), eds. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Curran Associates, Inc.
  • Zucchini, W., Macdonald, I. L., and Langrock, R. (2016), Hidden Markov Models for Time Series - An Introduction Using R, Boca Raton, FL: CRC Press.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.