192
Views
0
CrossRef citations to date
0
Altmetric
Research Article

smashGP: Large-Scale Spatial Modeling via Matrix-Free Gaussian Processes

, , & ORCID Icon
Received 24 Jan 2023, Accepted 03 May 2024, Published online: 13 Jun 2024
 

Abstract

Gaussian processes are essential for spatial data analysis. Not only do they allow the prediction of unknown values, but they also allow for uncertainty quantification. However, in the era of big data, directly using Gaussian processes has become computationally infeasible as cubic run times are required for dense matrix decomposition and inversion. Various alternatives have been proposed to reduce the computational burden of directly fitting Gaussian processes. These alternatives rely on assumptions on the underlying structure of the covariance or precision matrices, such as sparsity or low-rank. In contrast, this article uses hierarchical matrices and matrix-free methods to enable the computation of Gaussian processes for large spatial datasets by exploiting the underlying kernel properties. The proposed framework, smashGP, represents the covariance matrix as an H2 matrix in O(n) time and is able to estimate the unknown parameters of the model and predict the values of spatial observations at unobserved locations in O(nlogn) time thanks to fast matrix-vector products. Additionally, it can be parallelized to take full advantage of shared-memory computing environments. With simulations and case studies, we illustrate the advantage of smashGP to model large-scale spatial datasets. Supplementary materials for this article are available online.

Supplementary Materials

Code: Code used to run smashGP can be found at https://gitlab.com/libsmash_public/smashgp. See the Readme file for detailed instructions.

Supplementary Materials: Document containing: (A) derivations for GPs with nonconstant mean; (B) additional details on SMASH and its matrix-free operations; (C) definitions of the evaluation metrics used in the simulations and case studies; (D) additional details on the computational complexity of smashGP; (E) parameter estimation evaluation via simulations; (F) comparison with state-of-art methods for large datasets; and (G) case study for a million data points. (smashGP_supplementary.pdf,.pdf file)

Acknowledgments

The authors thank the editor, associate editor, and two anonymous reviewers for their constructive comments and suggestions that have considerably improved the article.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

This work was partially supported by NSF grants CMMI-1839591 and OAC-2003683.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.