Full article: Covariance–Based Rational Approximations of Fractional SPDEs for Computationally Efficient Bayesian Inference

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The stochastic partial differential equation (SPDE) approach is widely used for modeling large spatial datasets. It is based on representing a Gaussian random field u on $R^{d}$ as the solution of an elliptic SPDE $L^{β} u = W$ where L is a second-order differential operator, $2 β \in N$ is a positive parameter that controls the smoothness of u and $W$ is Gaussian white noise. A few approaches have been suggested in the literature to extend the approach to allow for any smoothness parameter satisfying $β > d / 4$ . Even though those approaches work well for simulating SPDEs with general smoothness, they are less suitable for Bayesian inference since they do not provide approximations which are Gaussian Markov random fields (GMRFs) as in the original SPDE approach. We address this issue by proposing a new method based on approximating the covariance operator $L^{- 2 β}$ of the Gaussian field u by a finite element method combined with a rational approximation of the fractional power. This results in a numerically stable GMRF approximation which can be combined with the integrated nested Laplace approximation (INLA) method for fast Bayesian inference. A rigorous convergence analysis of the method is performed and the accuracy of the method is investigated with simulated data. Finally, we illustrate the approach and corresponding implementation in the R package rSPDE via an application to precipitation data which is analyzed by combining the rSPDE package with the R-INLA software for full Bayesian inference. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

Handling many observations from a Gaussian random field in spatial statistics can be challenging since the related computational tasks involve factorizations of large covariance matrices which are usually dense. This is often referred to as the “big N problem” (Banerjee, Carlin, and Gelfand Citation2015), and various approaches have been suggested to handle the computational issues (see, e.g., Heaton et al. Citation2019, for a recent comparison). One of the most widely used methods is the SPDE approach by Lindgren et al. (Citation2011). This is based on the fact that a centered Gaussian random field u on the spatial domain $D = R^{d}$ with an isotropic Matérn covariance function (Matérn Citation1960), (1) $ϱ (s) = \frac{σ^{2}}{2^{ν - 1} Γ (ν)} {(κ ‖ s ‖)}^{ν} K_{ν} (κ ‖ s ‖), s \in R^{d},$ (1) can be represented as a solution to the stochastic partial differential equation (SPDE) (2) ${(κ^{2} - Δ)}^{β} (τ u) = W in D .$ (2)

Here, $Γ (\cdot)$ is the Gamma function, $K_{ν}$ is a modified Bessel function of the second kind, Δ is the Laplace operator, and $W$ is Gaussian white noise. The parameter $κ > 0$ controls the practical correlation range, $σ^{2}$ is the variance of u, $τ^{2} = Γ (ν) / (σ^{2} κ^{2 ν} {(4 π)}^{d / 2} Γ (ν + d / 2))$ , and the fractional power β is related to the smoothness parameter $ν > 0$ via the relation $2 β = ν + d / 2$ (Whittle Citation1963). Lindgren et al. (Citation2011) used this representation to construct computationally efficient Gaussian Markov Random Field (GMRF) approximations of Gaussian Matérn fields by considering the SPDE on a bounded domain $D$ , restricting the smoothness to $2 β \in N$ , and then performing a finite element method (FEM) discretization.

The SPDE approach has become widely used in applications, and has initiated a great number of extensions and generalizations (Lindgren, Bolin, and Rue Citation2022). The reason for this is not only the computational benefits, but also that it provides a flexible framework for defining more sophisticated models for spatial data. It, for example, facilitates the construction of nonstationary Gaussian random fields by allowing the parameters κ and τ to be spatially varying (Lindgren et al. Citation2011; Fuglstad et al. Citation2015), and allows for the construction of Matérn-like random fields on more general manifolds by defining such fields via the SPDE (2) posed on the manifold (Lindgren et al. Citation2011; Bolin and Lindgren Citation2011).

One of the main criticisms of the SPDE approach is the requirement $2 β \in N$ , which restricts the possible values of the corresponding smoothness parameter ν of the Matérn covariance function. Given the importance of ν when performing prediction, as shown by Stein (Citation1999) and Bolin and Kirchner (Citation2023), several methods for removing the restriction of $2 β \in N$ have been proposed. Lindgren et al. (Citation2011, Author’s response) proposed to construct a GMRF approximation by approximating the spectrum of a Gaussian Matérn field by a spectrum that is a reciprocal of a polynomial. This method is applicable for stationary models but it can not be applied to nonstationary models, and it has a fixed accuracy which may not be sufficient for certain applications. Bolin, Kirchner, and Kovács (Citation2020) proposed combining the FEM approximation of Lindgren et al. (Citation2011) with a quadrature approximation of the fractional operator to obtain a numerical method that works for any $β > d / 4$ and can be made arbitrarily accurate. That work also provided a theoretical convergence analysis of the method, which was extended in Bolin, Kirchner, and Kovács (Citation2018) and Herrmann, Kirchner, and Schwab (Citation2020). Bolin and Kirchner (Citation2020) later proposed a different type of approximation referred to as the rational SPDE approach, which has a lower computational cost.

Even though the methods that work for nonstationary models with general smoothness are computationally efficient, they are much less used than the standard SPDE approach for statistical applications. The reason for this is that nonfractional SPDE models work in combination with the integrated nested Laplace approximation (INLA) method (Rue, Martino, and Chopin Citation2009) and are implemented in the widely used R-INLA (Lindgren and Rue Citation2015) R (R Core Team Citation2022) package. This software facilitates including SPDE-based models in general Bayesian latent Gaussian models, and the great majority of all applications of the SPDE approach have been done via this software.

Unfortunately, the methods of Bolin, Kirchner, and Kovács (Citation2020) and Bolin and Kirchner (Citation2020) provide approximations which are not compatible with R-INLA. The reason is that the methods do not yield a Markov approximation, so the precision matrix obtained from the approximations are not sparse. The covariance matrix of the approximations are of the form $P Q^{- 1} P$ , where both P and Q are sparse matrices that depend on the parameters of the model. To achieve a sparse precision matrix, which is necessary for R-INLA, Bolin and Kirchner (Citation2020) showed that one can work with a latent model with sparse precision matrix Q if the projection matrix A, which connects the locations of the mesh for the FEM approximation and the observation locations (see Section 2 for details), is adjusted to $\hat{A} = AP$ . This matrix, however, depends on the model parameters, which is not allowed in R-INLA.

The main goal of this work is to solve this problem by proposing an alternative rational approximation. The main idea is to approximate the covariance operator of the random field directly, instead of first approximating the solution u and then deriving the corresponding covariance operator. This provides an approximation suitable for R-INLA, which is more numerically stable than that of the original rational SPDE approach. The proposed method is implemented in the R package rSPDE (Bolin and Simas Citation2023), which is available on CRAN and has an interface to R-INLA. Using the package, we show that the proposed method facilitates full Bayesian inference of all model parameters, including β, for latent Gaussian models based on fractional SPDEs.

The outline of the article is as follows. In Section 2, we give an overview of the model structure of the proposed approximation and show how it can be used for computationally efficient inference. The mathematical details and justifications of the method are provided in Sections 3–5. Specifically, in Section 3, we introduce the generalized Whittle–Matérn fields, which contain most of the previously proposed nonstationary SPDE-based Gaussian random fields as special cases, and for which our proposed method is applicable. In that section, we also provide the details of the FEM approximations. The new covariance-based rational approximation is introduced in Section 4, where we also prove that it provides an approximation of the covariance function of the generalized Whittle–Matérn field with an explicit rate of convergence in the L₂-norm. In Section 5, we show that the covariance-based rational approximation can be represented as a GMRF, and illustrate how this can be used for statistical inference. Some of the details of the rSPDE implementation are discussed in Section 6, and a comparison in terms of the accuracy of approximating covariance function by our method and some other methods is provided in Section 7. An application to modeling of precipitation data is presented in Section 8 and the article concludes with a discussion in Section 9. Finally, the supplementary materials contain seven appendices which provide further technical details and proofs.

2 Overview of the Approximation Strategy

As mentioned in the introduction, the main idea behind our strategy is to directly approximate the covariance operator of the random field. In this section we show the structure of the resulting approximation and also provide an illustration on how it can be used for inference in a simple application. More details will be given in later sections. The covariance-based rational approximation of the Whittle–Matérn field $u (s)$ defined in (2), whose covariance operator is ${(κ^{2} - Δ)}^{- 2 β}$ , uses a combination of the finite element method and rational approximations in order to approximate $u (s)$ as $u_{n} (s) = \sum_{j = 1}^{n} w_{j} φ_{j} (s)$ , where ${w_{j}}_{j = 1}^{n}$ are stochastic weights and ${φ_{j}}_{j = 1}^{n}$ are FEM basis functions. With our approximation, $w = {[w_{1}, \dots, w_{n}]}^{⊤}$ can be expressed as a sum of m + 1 independent GMRFs $x_{i} = {(x_{i 1} \dots x_{in})}^{⊤}$ with sparse precision matrices: (3) $w = \sum_{i = 1}^{m + 1} x_{i}, where x_{i} \sim N (0, Q_{i}^{- 1}) .$ (3)

Any linear predictor in R-INLA has this form, which means that we can perform inference in a computationally efficient manner based on the covariance-based rational approximation by using the same ideas as are used in R-INLA. For example, suppose that we observe $y_{1}, \dots, y_{N}, N \in N$ , and spatial locations $s_{1}, \dots, s_{N} \in R^{d},$ where (4) $y_{i} = u (s_{i}) + ϵ_{i}, i = 1, \dots, N,$ (4) and $ϵ = {[ϵ_{1}, \dots, ϵ_{N}]}^{⊤} \sim N (0, Q_{ϵ}^{- 1})$ for some sparse matrix $Q_{ϵ}$ , such as $Q_{ϵ} = \frac{1}{σ_{ϵ}^{2}} I_{N}$ if we have independent measurement noise. Defining $y = {[y_{1}, \dots, y_{N}]}^{⊤}$ , (4) can be written in matrix form as $y = A w + ϵ$ , where A is the projector matrix with elements $A_{ij} = φ_{j} (s_{i})$ . Let $X = {[x_{1}^{⊤}, \dots, x_{m + 1}^{⊤}]}^{⊤}$ . Then, the precision matrix of X is the block diagonal matrix (5) $Q = diag (Q_{1}, \dots, Q_{m + 1}) .$ (5)

Writing the model in terms of the weights X allows us to equivalently write the model as $y = \bar{A} X + ϵ$ where $\bar{A}$ is a block matrix of size $N \times n (m + 1)$ obtained by combining m + 1 copies of A as $\bar{A} = [\begin{matrix} A & \dots & A \end{matrix}]$ . Thus, $y | X \sim N (\bar{A} X, Q_{ϵ}^{- 1})$ and $X \sim N (0, Q^{- 1})$ , where Q is given in (5). Standard results for latent Gaussian models then give us that the posterior distribution of X is $X | y \sim N (μ_{X | y}, Q_{X | y}^{- 1})$ , where (6) $μ_{X | y} = Q_{X | y}^{- 1} {\bar{A}}^{⊤} Q_{ϵ} y and Q_{X | y} = {\bar{A}}^{⊤} Q_{ϵ} \bar{A} + Q .$ (6)

Finally, we can obtain the marginal likelihood, $l (y)$ , of y as (7) $\begin{matrix} 2 l (y) = & log | Q | + log | Q_{ϵ} | - log | Q_{X | y} | - μ_{X | y}^{⊤} Q μ_{X | y} \\ - {(y - \bar{A} μ_{X | y})}^{⊤} Q_{ϵ} (y - \bar{A} μ_{X | y}) - n log (2 π) . \end{matrix}$ (7)

The sparsity of Q is essential for computation. For instance, evaluating $log | Q |$ in the likelihood can be done efficiently based on sparse Cholesky decomposition (Rue and Held Citation2005). Sparsity of Q also facilitates computationally efficient sampling of w, and hence of u. See Appendix E for further details on the methods for sampling and likelihood evaluation.

3 Whittle–Matérn Fields and FEM Approximation

In this section we introduce the class of fractional-order SPDEs we are interested in as well as their FEM approximations. The model assumptions are presented in Section 3.1, and in Section 3.2 we introduce the FEM approximations and study their convergence.

Let us begin by introducing the notation that will be needed later on. Given a bounded domain $D \subset R^{d}$ , $d \in {1, 2, 3}$ , we denote by $L_{2} (D)$ the Lebesgue space of square-integrable real-valued functions endowed with the inner product ${(ϕ, ψ)}_{L_{2} (D)} = \int_{D} ϕ (x) ψ (x) d x$ . We denote the Sobolev space of order k by $H^{k} (D)$ : $H^{k} (D) = {w \in L_{2} (D) : D^{γ} w \in L_{2} (D), \forall γ \in N^{d}, | γ | \leq k},$ where we are using the multiindex notation for the differential operator $D^{γ}$ , and ${(\cdot, \cdot)}_{H^{k} (D)}$ is the Sobolev inner product: ${(u, v)}_{H^{k} (D)} = \sum_{γ \in N^{d} : | γ | \leq k} {(D^{γ} u, D^{γ} v)}_{L_{2} (D)} .$

We denote by $H_{0}^{1} (D)$ the closure of $C_{c}^{\infty} (D)$ in $H^{1} (D)$ , where $C_{c}^{\infty} (D)$ is the set of infinitely differentiable functions with compact support on $D$ . Additional notations needed for the theoretical analysis are given in Appendix A.

3.1. Model Assumptions

We are interested in the class of Gaussian random fields on $D$ that can be represented as solutions to SPDEs of the form (8) $L^{β} (τ u) = W in D,$ (8) where $L^{β}$ is a fractional power (in the spectral sense) of a second-order elliptic differential operator L which determines the covariance structure of u, $τ > 0$ is a constant parameter, and $W$ is Gaussian white noise on $L_{2} (D)$ . We have the following assumptions on $D$ :

Assumption 1.

The domain $D$ is an open, bounded, convex polytope with closure $\bar{D}$ .

Under Assumption 1, we may define $H_{N}^{2} (D) = {w \in H^{2} (D) : \partial w / \partial ν = 0 on \partial D},$ where ν is the outward unit normal vector to $\partial D$ . Indeed, the expression $\partial w / \partial ν = 0$ on $\partial D$ makes sense since the trace of Dw is well-defined in this case (see, e.g., Evans and Gariepy Citation2015, Theorem 4.6). Let us now describe the assumptions on the differential operator L:

Assumption 2.

The operator L is given in divergence form by $Lu = - \nabla \cdot (H \nabla u) + κ^{2} u$ , and is equipped either with homogeneous Dirichlet or Neumann boundary conditions. Furthermore, the function $H : \bar{D} \to R^{d \times d}$ is symmetric, Lipschitz continuous and uniformly positive definite, and $κ : D \to R$ is an essentially bounded function, that is, $ess {sup}_{x \in D} κ (x) = inf {a \in R : λ ({x : κ (x) > a}) = 0} < \infty .$

Under Neumann boundary conditions, we additionally require that $ess {inf}_{x \in D} κ (x) = sup {b \in R : λ ({x : κ (x) < b}) = 0} \geq κ_{0} > 0,$ where λ is the Lebesgue measure on $D$ .

The SPDE (8) under Assumptions 1 and 2 defines a class of models that have previously been considered by Bolin, Kirchner, and Kovács (Citation2020), Cox and Kirchner (Citation2020), Herrmann, Kirchner, and Schwab (Citation2020), and Bolin and Kirchner (Citation2020) and is referred to as generalized Whittle–Matérn fields. It contains many previously proposed nonstationary SPDE-based spatial Gaussian random field models as special cases, such as those by Lindgren et al. (Citation2011), Fuglstad et al. (Citation2015), Fuglstad et al. (Citation2019), and Hildeman, Bolin, and Rychlik (Citation2021), and the method that we later introduce thus also applies to those models and their fractional extensions.

In the case of Dirichlet boundary conditions, define the space $V = H_{0}^{1} (D) \subset L_{2} (D)$ , and in the case of Neumann boundary conditions let $V = H^{1} (D) \subset L_{2} (D)$ . Then, under Assumptions 1 and 2, L induces the following continuous and coercive bilinear form on V: (9) $a_{L} (v, u) = {(H \nabla u, \nabla v)}_{L_{2} (D)} + {(κ^{2} u, v)}_{L_{2} (D)}, u, v \in V .$ (9)

Remark 1.

Under Assumptions 1 and 2, if $f \in L_{2} (D)$ , then there exists a unique solution u of Lu = f and the operator L is $H^{2} (D)$ -regular, that is, $u \in H^{2} (D) \cap H_{0}^{1} (D)$ under Dirichlet boundary conditions, whereas under Neumann boundary conditions, we have $u \in H_{N}^{2} (D)$ . See, for instance, (Grisvard Citation2011, Theorem 3.2.1.2) for Dirichlet boundary conditions or (Grisvard Citation2011, Theorem 3.2.1.3) for Neumann boundary conditions.

By Remark 1, specifically by the existence and uniqueness of the solution to the equation Lu = f, we can define the inverse operator $L^{- 1} : L_{2} (D) \to L_{2} (D)$ . By Rellich-Kondrachov theorem (Evans and Gariepy Citation2015, Theorem 4.11), $L^{- 1}$ is a compact operator, and observe that $L^{- 1}$ is self-adjoint, see Appendix A for a justification. Hence, by the spectral theorem for self-adjoint and compact operators, there exists an orthonormal basis ${e_{j}}_{j \in N}$ in $L_{2} (D)$ formed by eigenvectors of L whose eigenvalues ${λ_{j}}_{j \in N}$ are nonnegative and can be arranged in a nondecreasing order.

Remark 2.

Under Assumptions 1 and 2, the operator L satisfies the Weyl’s law, that is, there exist $c, C > 0$ such that for every $j \in N$ , $c j^{2 / d} \leq λ_{j} \leq C j^{2 / d} .$ See Davies (Citation1995, Theorem 6.3.1) for the Dirichlet case. For the Neumann case, the Weyl’s law holds for the case in which H is a constant diagonal matrix (Fedosov Citation1963, Citation1964), in particular, it holds for the Neumann Laplacian. The result for a general H satisfying Assumption 2 is a direct consequence of the Weyl’s law for the Neumann Laplacian together with Proposition B.2 in Appendix B and the min–max principle.

Our goal is to obtain approximations of the covariance operator $L^{- 2 β}$ of the Gaussian random field u which solves Equationequation (8)(8) $L^{β} (τ u) = W in D,$ (8) . Let $ϱ^{β} (x, y) = \sum_{j = 1}^{\infty} λ_{j}^{- 2 β} e_{j} (x) e_{j} (y) .$

Then, one can readily check, by Steinwart and Scovel (Citation2012, Theorem 3.10), that the covariance operator $L^{- 2 β}$ is a kernel operator, with kernel $ϱ^{β} (\cdot, \cdot)$ . That is, for any $f \in L_{2} (D)$ , we have $(L^{- 2 β} f) (x) = \int_{D} ϱ^{β} (x, y) f (y) d y$ for a.e. $x \in D$ . It is well-known that there exists a centered square-integrable Gaussian random field u that solves (8) if, and only if, its covariance operator, $L^{- 2 β}$ , has finite trace (Lototsky and Rozovsky Citation2017, Theorem 3.2.5). Under Assumptions 1 and 2, one can use Weyl’s law (Remark 2) to show that $L^{- 2 β}$ has finite trace if, and only if, $β > d / 4$ . Hence, if $β > d / 4$ , then u is a centered square-integrable Gaussian random field with covariance function $ϱ^{β} (x, y) = E [u (x) u (y)],$ where the equality holds for a.e. $(x, y) \in D \times D .$

3.2. Finite Element Approximation

The goal is now to provide a convergence analysis for FEM approximations of the covariance operator $L^{- 2 β}$ . Let us start by describing the setup we will use.

Assumption 3.

Let $V_{h} \subset V$ be a finite element space that is spanned by a set of continuous piecewise linear basis functions ${φ_{j}}_{j = 1}^{n_{h}}$ (see Appendix D), with $n_{h} \in N$ , defined with respect to a triangulation $T_{h}$ of $\bar{D}$ indexed by the mesh width $h : = {max}_{T \in T_{h}} h_{T}$ , where $h_{T} : = diam (T)$ is the diameter of the element $T \in T_{h}$ . We assume that the family ${(T_{h})}_{h \in (0, 1)}$ of triangulations inducing the finite-dimensional subspaces ${(V_{h})}_{h \in (0, 1)}$ of V is quasi-uniform, that is, there exist constants $K_{1}, K_{2} > 0$ such that $ρ_{T} \geq K_{1} h_{T}$ and $h_{T} \geq K_{2} h$ for all $T \in T_{h}$ and all $h \in (0, 1)$ . Here, $ρ_{T} > 0$ is the radius of the largest ball inscribed in $T \in T_{h}$ .

We are now in a position to describe the FEM discretization of the model (8). Let $L_{h} : V_{h} \to V_{h}$ be defined in terms of the bilinear form a_L as its restriction to $V_{h} \times V_{h}$ : ${(L_{h} ϕ_{h}, ψ_{h})}_{L_{2} (D)} = a_{L} (ϕ_{h}, ψ_{h}), ϕ_{h}, ψ_{h} \in V_{h} .$

Note that L_h is a positive-definite, symmetric, linear operator on the finite-dimensional space V_h. Hence, we may arrange the eigenvalues of L_h as $0 < λ_{1, h} \leq λ_{2, h} \leq \dots \leq λ_{n_{h}, h},$ with corresponding eigenvectors ${e_{j, h}}_{j = 1}^{n_{h}}$ which are orthonormal in $L_{2} (D)$ . Let $W_{h}$ denote Gaussian white noise on V_h. That is, there exist independent standard Gaussian random variables $ξ_{1}, \dots, ξ_{n_{h}}$ such that $W_{h} = \sum_{j = 1}^{n_{h}} ξ_{j} e_{j, h}$ . Then, we refer to the following SPDE on V_h as the discrete model of (8): (10) $L_{h}^{β} u_{h} = W_{h} .$ (10)

Let u_h be a solution of (10), then the covariance operator of u_h is given by $L_{h}^{- 2 β}$ , and $ϱ_{h}^{β} (x, y) = \sum_{j = 1}^{n_{h}} λ_{j, h}^{- 2 β} e_{j, h} (x) e_{j, h} (y), for a . e . (x, y) \in D \times D,$ is the corresponding covariance function. We have the following result regarding the convergence of the FEM approximation $ϱ_{h}^{β}$ to the exact covariance function $ϱ^{β}$ in the $L_{2} (D \times D)$ -norm defined by $| | f | |_{L_{2} (D \times D)}^{2} = \int_{D} \int_{D} f {(x, y)}^{2} d xdy$ . The proof is given in Appendix B.

Proposition 1.

Under Assumptions 1, 2, and 3, for each $β > d / 4$ and each $ε > 0$ , we have (11) $| | ϱ^{β} - ϱ_{h}^{β} | |_{L_{2} (D \times D)} ≲_{ε, β, H, κ, D} h^{min {4 β - d / 2 - ε, 2}} .$ (11)

Here, and in the remainder of the article, the notation $A ≲_{θ_{1}, \dots, θ_{k}} B$ , where $k \in N$ , means that there exists a constant C depending on $θ_{1}, \dots, θ_{k}$ ( $θ_{i}, i = 1, \dots, k$ , can be a parameter, a function, a domain, etc.) such that $A \leq CB .$

Remark 3.

Cox and Kirchner (Citation2020, Theorem 1) proved the bound (11) in the case of homogeneous Dirichlet boundary conditions. They did not provide a bound for the case of homogeneous Neumann boundary conditions. Proposition 1 arrives at the same bound for the Neumann case. For this, we additionally require that $ess {inf}_{x \in D} κ (x) \geq κ_{0} > 0$ and that the domain $D$ is a convex polytope in the Neumann case. As far as we know, this is a new result. The key step in the proof is to obtain an analogous result to Cox and Kirchner (Citation2020, Lemma 2), which is given by Proposition B.2 in Appendix B.

4 Rational Approximation

Having introduced the FEM approximation, we are now ready to define the complete approximation of the covariance operator of the generalized Whittle–Matérn fields. The approximation is obtained by combining a rational approximation of the fractional power of the covariance operator with the FEM approximation. We begin by introducing the method and then provide a theoretical justification by showing an explicit rate of convergence of the approximate covariance function to the correct one in the $L_{2} (D \times D)$ -norm.

In Bolin and Kirchner (Citation2020), the authors obtained an approximation of the solution to (8), which also implicitly defines an approximation of the corresponding covariance operator. However, as we have previously mentioned, this results in an approximation that is not implementable in R-INLA. Also, for statistical applications there is usually no need to have an approximation of the solution itself, since only the corresponding distribution matters for inference. With this in mind, we propose to directly approximate the covariance operator $L^{- 2 β}$ . To this end, we first split $L_{h}^{- 2 β} = L_{h}^{- {2 β}} L_{h}^{- ⌊ 2 β ⌋}$ , where ${x} = x - ⌊ x ⌋$ is the fractional part of x. Then, we approximate $L_{h}^{- {2 β}}$ with a rational approximation. This yields an approximation (12) $L_{h}^{- 2 β} \approx L_{h, m}^{- 2 β} : = L_{h}^{- ⌊ 2 β ⌋} p (L_{h}^{- 1}) q {(L_{h}^{- 1})}^{- 1} .$ (12)

Here, $p (L_{h}^{- 1}) = \sum_{i = 0}^{m} a_{i} L_{h}^{m - i}$ and $q (L_{h}^{- 1}) = \sum_{j = 0}^{m} b_{j} L_{h}^{m - j}$ are polynomials obtained from a rational approximation of order m of the real-valued function $f (x) = x^{{2 β}}$ . That is, $x^{{2 β}} \approx \frac{\sum_{i = 0}^{m} a_{i} x^{i}}{\sum_{i = 0}^{m} b_{i} x^{i}} .$

Specifically, to obtain ${a_{i}}_{i = 0}^{m}$ and ${b_{i}}_{i = 0}^{m}$ , we approximate the function $f (x) = x^{{2 β}}$ on the interval $[λ_{n_{h}, h}^{- 1}, λ_{1, h}^{- 1}]$ , which covers the spectrum of $L_{h}^{- 1}$ . The coefficients are computed as the best rational approximation in the $L_{\infty}$ -norm, which, for example, can be obtained via the second Remez algorithm (Remez Citation1934) or by the recent, and more stable, BRASIL algorithm (Hofreither Citation2021). See Appendix F for details about this algorithm and a justification for the choice of using the best rational approximation in $L_{\infty}$ -norm.

By defining the covariance function $ϱ_{h, m}^{β} (x, y) = \sum_{j = 1}^{n_{h}} λ_{j, h}^{- ⌊ 2 β ⌋} p (λ_{j, h}^{- 1}) q {(λ_{j, h}^{- 1})}^{- 1} e_{j, h} (x) e_{j, h} (y),$ for a.e. $(x, y) \in D$ , we have that $ϱ_{h, m}^{β}$ is the kernel of the covariance operator $L_{h, m}^{- 2 β}$ . There are two sources of errors when we consider $ϱ_{h, m}^{β}$ as an approximation of the true covariance function $ϱ^{β}$ of the generalized Whittle–Matérn field: the FEM approximation and the rational approximation. The following proposition, whose proof is given in Appendix B, shows that we have control of these two sources of errors via the FEM mesh width h and the order of the rational approximation m.

Proposition 2.

Let $β > d / 4$ . Under Assumptions 1, 2, and 3, for every $ε > 0$ and for sufficiently small h, we have: (13) $| | ϱ_{h, m}^{β} - ϱ^{β} | |_{L_{2} (D \times D)} ≲_{ε, β, H, κ, D} h^{min {4 β - d / 2 - ε, 2}} + 1_{2 β \notin N} h^{- d / 2} e^{- 2 π \sqrt{{2 β} m}} .$ (13)

Remark 4.

We can calibrate the accuracy of the rational approximation with the finite element error by choosing $m \in N$ such that $m = ⌈ {(min {4 β - d / 2 - ε, 2} + d / 2)}^{2} \frac{{(log h)}^{2}}{4 π^{2} {2 β}} ⌉ .$ This ensures that the rate of convergence in (13) is $min {4 β - d / 2 - ε, 2}$ . See Section 7 for further details on the choice of m.

5 GMRF Representation

The goal of this section is to obtain a sparse matrix representation of the precision operator of the rational approximation from the previous section, so that the methods in Section 2 can be used for computationally efficient sampling and likelihood evaluation.

The solution u_h in (10) at spatial location s can be represented as $u_{h} (s) = \sum_{j = 1}^{n_{h}} w_{j} φ_{j} (s)$ , where ${w_{j}}_{j = 1}^{n_{h}}$ are stochastic weights and ${φ_{j}}_{j = 1}^{n_{h}}$ are the piecewise linear finite element basis functions. We will now show how to represent $w = {[w_{1}, \dots, w_{n_{h}}]}^{⊤}$ as a sum of independent GMRFs, each with a sparse precision matrix. The key step is to apply a partial fraction decomposition in (12): (14) $L_{h, m}^{- 2 β} = L_{h}^{- ⌊ 2 β ⌋} (\sum_{i = 1}^{m} r_{i} {(L_{h} - p_{i} I_{V_{h}})}^{- 1} + k I_{V_{h}}) .$ (14)

Here, ${r_{i}}_{i = 1}^{m}, {p_{i}}_{i = 1}^{m}$ and k are real numbers, and $I_{V_{h}}$ is the identity operator mapping the finite element space to itself. Let C be the mass matrix with elements $C_{i, j} = {(φ_{i}, φ_{j})}_{L_{2} (D)}$ , and let L be the matrix obtained by the bilinear form $a_{L} (\cdot, \cdot)$ induced by the differential operator L, which has elements $L_{i, j} = {(H \nabla φ_{i}, \nabla φ_{j})}_{L_{2} (D)} + {(κ^{2} φ_{i}, φ_{j})}_{L_{2} (D)}$ . Then, we can use (14) to obtain the covariance matrix of w as (see Appendix C for a derivation): (15) $Σ_{w} = {(L^{- 1} C)}^{⌊ 2 β ⌋} \sum_{i = 1}^{m} r_{i} {(L - p_{i} C)}^{- 1} + K_{⌊ 2 β ⌋},$ (15) where $K_{0} = k C$ and $K_{n} = k {(L^{- 1} C)}^{n - 1} L^{- 1}$ when $n \geq 1, n \in N$ . In the Matérn case, that is, when κ is a constant and H is an identity matrix, we simply have $L = G + κ^{2} C$ , where G is the stiffness matrix with elements $G_{i, j} = {(\nabla φ_{i}, \nabla φ_{j})}_{L_{2} (D)}$ .

Since we have the same degree for numerator and denominator in the rational approximation, we can use the BRASIL algorithm (Hofreither Citation2021) to compute the coefficients ${a_{i}}_{i = 0}^{m}$ and ${b_{i}}_{i = 0}^{m}$ in (12) and thus the coefficients ${r_{i}}_{i = 0}^{m}, {p_{i}}_{i = 0}^{m}$ and k in (14). Another option, commonly used in practice, is to use a “near best” rational approximation. One such option, which was used in Bolin and Kirchner (Citation2020), and which is also implemented in the rSPDE package, is the Clenshaw–Lord Chebyshev–Padé algorithm (Baker and Graves-Morris Citation1996). See Appendix F for details about this algorithm. Also, observe that the interval $[λ_{n_{h}, h}^{- 1}, λ_{1, h}^{- 1}]$ where one should compute the rational approximation may vary with the parameters κ and H, and that recomputing the coefficients ${a_{i}}_{i = 0}^{m}$ and ${b_{i}}_{i = 0}^{m}$ for different values of these parameters is not practical for implementations. To avoid this, recall from Assumption 2 that $κ_{0}^{2}$ is a lower bound for the eigenvalues of L in the case of Neumann boundary conditions and that $λ_{1} \leq λ_{1, h}$ (see Proposition B.1 in Appendix B). We can, then, re-scale the operator L_h as $L_{h} / κ_{0}^{2}$ so that we can replace the interval $[λ_{n_{h}, h}^{- 1}, λ_{1, h}^{- 1}]$ by $[δ, 1]$ , where, ideally, δ is chosen in such way that $δ \leq κ_{0}^{2} / λ_{n_{h}, h}$ for all considered mesh sizes h. In the rSPDE package, the choices δ = 0 and $δ = 10^{- (5 + m) / 2}$ are implemented. However, the difference in accuracy with respect to approximating the covariance function is negligible between these two choices.

For these options, we verified empirically that if $f_{β} (x) = x^{{2 β}}, {2 β} = 2 β - ⌊ 2 β ⌋$ , and ${\hat{f}}_{β, m}$ is the rational approximation of $f_{β}$ where the numerator and denominator have same degree m, then ${\hat{f}}_{β, m} = x^{⌊ 2 β ⌋} \sum_{j = 1}^{m} r_{i} {(x - p_{i})}^{- 1} + k,$ where ${p_{i}}_{i = 1}^{m}$ are negative real numbers and ${r_{i}}_{i = 1}^{m}$ and k are positive real numbers. This, together with the fact that the BRASIL algorithm is only implemented for rational approximations with numerator and denominator having the same degree, are the main reasons we chose to consider the numerator and denominator having the same degree m. Bolin and Kirchner (Citation2020) instead considered a rational approximation where the numerator has degree m and the denominator has degree m + 1. However, with this choice the partial fractions would not yield a decomposition into positive-definite operators in our case.

Since ${p_{i}}_{i = 1}^{m}$ are negative real numbers and ${r_{i}}_{i = 1}^{m}$ and k are positive real numbers, we have that $r_{i} {(L^{- 1} C)}^{⌊ 2 β ⌋} {(L - p_{i} C)}^{- 1}$ , for $i = 1, \dots m$ , and $K_{⌊ 2 β ⌋}$ are valid covariance matrices. Thus, w can be expressed as a sum of m + 1 independent random vectors $x_{i}$ as in (3), where $Q_{i}$ is the precision matrix of $x_{i}$ . By (15), we obtain that (16) $Q_{i} = {\begin{matrix} r_{i}^{- 1} (L - p_{i} C) {(C^{- 1} L)}^{⌊ 2 β ⌋} & i = 1, \dots, m, \\ K_{⌊ 2 β ⌋}^{- 1} & i = m + 1. \end{matrix}$ (16)

Let $X = {[x_{1}^{⊤}, \dots, x_{m + 1}^{⊤}]}^{⊤}$ . Then, the precision matrix of X is the block diagonal matrix shown in (5). The final step in order to obtain a GMRF representation is to use the mass lumping technique as for the standard SPDE approach, see Appendix C.5 in Lindgren et al. (Citation2011). Thus, the mass matrix C in (16) is replaced by a lumped mass matrix $\tilde{C}$ , where $\tilde{C}$ is a diagonal matrix with ${\tilde{C}}_{ii} = \sum_{j = 1}^{n_{h}} C_{ij}$ , for $i = 1, \dots, n_{h}$ . With this adjustment, Q in (5) is sparse and we thus have obtained a GMRF representation.

6 Implementation and the rSPDE Package

The proposed covariance-based rational approximation method has been implemented in the R package rSPDE. In the following sections, we will use this package to illustrate the performance of the method. In this section, we give a brief introduction to the package and how it can be used in combination with R-INLA for computationally efficient Bayesian inference of latent Gaussian models involving the generalized Whittle–Matérn fields.

The usual workflow of fitting standard SPDE models in R-INLA can be divided into six steps. Namely, constructing the FEM mesh, defining SPDE model, creating a projector matrix, building the INLA stack, specifying the model formula, and finally calling the function inla to fit the model. Details about this can be found in Lindgren and Rue (Citation2015). To fit a model with a fractional SPDE, this procedure remains the same. The only difference is that when defining the SPDE model, creating the projector matrix and building the index for INLA stack, we use functions from the rSPDE package. These functions are very similar to the corresponding R-INLA functions in terms of functionality. For example, a fractional SPDE model can be created with the command:

model <- rspde.matern(mesh = mesh) where mesh is a FEM mesh that can be obtained by inla.me sh.2d function from R-INLA. The default order of the rational approximation in this function is m = 2, which provides a good trade off between computational cost and accuracy, see . As for the corresponding inla.spde2.matern function that can be used to define non-fractional SPDE models in INLA, one can also set priors for κ and τ in rspde.matern. Further, we can also define a prior for the smoothness parameter ν or specify ν so that a SPDE model with a fixed smoothness parameter can be generated. This feature can be used, for example, in the case that one already knows what ν is or wants to compare two different models with different ν, as we will do in Section 8.

Figure 1. Errors in $L_{2} (D \times D)$ -norm (top) and supremum norm ( $L_{\infty} (D \times D)$ ) (bottom) on $D = {[0, 1]}^{2}$ for different practical ranges ρ for different values of ν. All methods use the same FEM mesh, with 100 equally spaced nodes in each direction.

Figure 1. Errors in L2(D×D)-norm (top) and supremum norm (L∞(D×D)) (bottom) on D=[0,1]2 for different practical ranges ρ for different values of ν. All methods use the same FEM mesh, with 100 equally spaced nodes in each direction.

The projector matrix $\bar{A}$ for a given mesh and observation locations loc is computed as

A <- rspde.make.A(mesh = mesh,

loc = loc)

As for the creation of the model, the default order of the rational approximation when creating the projector matrix is $m = 2$ , which can be changed by the user. The other arguments of the function are the same as those in the corresponding R-INLA function inla.spde.make.A. In the step of building the INLA stack, usually an index set is needed. The index can be computed with the function rspde.make.index, which replaces the R-INLA function inla.spde.make.index and has the same arguments. With these functions, the fractional models can be used as any other random effect in R-INLA. After fitting the model with the R-INLA function inla, posterior samples from a latent field and hyperparameters can be obtained by using inla.posterior.sample, and posterior distributions of the model parameters can be extracted via the rSPDE function rspde.result.

Besides the INLA-related functions, the rSPDE package also provides various utility functions. For example, once a fractional SPDE model, model, has been created with the rspde.matern function, one can simulate from it by calling simulate(model) to obtain a prior sample from a given choice of parameters, and the marginal log-likelihood from Section 2 can be computed by

l <- rSPDE.matern.loglike(model,

y, A, sigma.e)

Here, y is the observed data and sigma.e is the standard deviation of the measurement noise. In addition, if a model is fitted with this approach, then kriging and posterior sampling can be obtained by using the predict function. For further details and examples, we refer the reader to the vignettes at https://davidbolin.github.io/rSPDE.

Finally, rSPDE also provides an interface to the inlabru package (Bachl et al. Citation2019), which simplifies the construction of spatial models. This was used in the application in Section 8, where the entire code for defining and fitting the fractional model is

mesh <- inla.mesh.2d(loc = loc,

max.edge = c(0.5, 10),

cutoff = 0.35)

spde <- rspde.matern(mesh = mesh,

nu.upper.bound = 1)

res <- bru(z ∼ −1 + field(coordinates,

model = spde), data = data)

Here loc are the measurement locations and data is a data frame with the locations and observations.

7 Numerical Experiments

In this section, we compare the accuracy of the covariance–based rational approximation with the operator-based method from Bolin and Kirchner (Citation2020), and with the “parsimonious” method from Lindgren et al. (Citation2011). Since the latter method is implemented in R-INLA, we refer to it as the INLA approximation. We also note that the INLA method constructs a covariance-based Markov approximation (see also Bolin and Kirchner Citation2020, sec. 2), so it can be viewed as a 0th order covariance-based rational approximation.

For the comparison, we consider the SPDE model (2) with homogeneous zero Neumann boundary conditions on the unit square $D = {[0, 1]}^{2}$ , with τ chosen such that $σ^{2}$ in the Matérn covariance is one. The reason we consider the square domain, is that we have an explicit expression for the covariance function of the solution u. Indeed, we have, from Khristenko et al. (Citation2019, eq. (2.13)), that the covariance function of u is given by (17) $\begin{matrix} ϱ_{u}^{β} (x, y) = \sum_{k \in Z^{2}} [ϱ (| | x + 2 k - y | |) \\ + ϱ (| | (x_{1} + 2 k_{1} - y_{1}, x_{2} + 2 k_{2} + y_{2}) | |) \\ + ϱ (| | (x_{1} + 2 k_{1} + y_{1}, x_{2} + 2 k_{2} - y_{2}) | |) \\ + ϱ (| | x + 2 k + y | |)], \end{matrix}$ (17) where $| | \cdot | |$ is the Euclidean norm on $R^{2}$ and $ϱ (\cdot)$ is the Matérn covariance function in (1) with σ = 1 and $ν = 2 β - 1$ . To compare the accuracy of the covariance approximations, we evaluate the true and approximate covariance functions on a regular mesh on ${[0, 1]}^{2}$ with $N = 100$ equally spaced nodes on each axis. We will compare these approximations with respect to the $L_{2} ({[0, 1]}^{2} \times {[0, 1]}^{2})$ -norm and the supremum norm on ${[0, 1]}^{2} \times {[0, 1]}^{2}$ . Appendix G shows how we approximate the errors in these two norms in detail.

For the operator-based and covariance-based rational approximations, we consider the orders of rational approximation as $m = 1, 2, 3, 4$ . We choose smoothness parameters ranging from 0.1 to 3.1 with steps of size 0.05. Further, we test three possible values of κ. These values of κ, say $κ_{1} (ν), κ_{2} (ν)$ and $κ_{3} (ν)$ are chosen in such a way that the practical range $ρ = \sqrt{8 ν} / κ$ is fixed as $0.1, 0.5$ , and 1, respectively, for all values of ν. The resulting errors for the different methods are shown in .

We begin by observing that for smoothness parameters $ν = 1, 2$ , or 3, there is no rational approximation and the errors come from the FEM approximation. With this in mind, one should note that for smaller range parameters most of the approximation error comes from the FEM approximation, thus, yielding a small difference of errors across the different methods. However, for larger ranges, such as, in this case, practical range equal to 1, the errors have different orders of magnitude as the order of the rational approximation increases, with the errors from the operator-based and covariance-based approximations of same rational approximation order having approximately the same order of magnitude. Furthermore, we can observe numerical instabilities of the operator-based approximations of order 3 and 4 as ν increases for both practical ranges 0.5 and 1, whereas the covariance-based method is stable for all orders of approximation.

In order to further illustrate the effect of the FEM error on the rational approximation of the covariance operator we repeated the analysis from above but with a coarser FEM mesh, consisting of 50 equally spaced nodes on each axis over the domain ${[0, 1]}^{2}$ . The results are shown in in Appendix G. We now observe that for practical range 0.1, there is no visible difference between the covariance-based or operator-based rational approximations of orders 1 to 4, with a very small difference between the “parsimonious” INLA approximation and the remaining rational approximations. Further, for practical ranges 0.5 and 1, we hardly see any differences between the rational approximations of orders 2, 3, and 4. The only noticeable difference being that for large values of ν, the operator-based rational approximation becomes numerically unstable. On the other hand, it is noteworthy that for practical ranges 0.5 and 1, there is a significant difference (difference in orders of magnitude) between the rational approximations of order 0, 1 and the remaining orders.

To summarize, the results indicate that the covariance-based method generally has a similar accuracy as the operator-based method, which is higher than the accuracy provided by INLA’s method. The results also show that the covariance-based method is more numerically stable, especially for larger values of m, the order of the rational approximation.

It is important to remember that the INLA method only provides a fixed approximation, furthermore it only works in this case of stationary parameters, whereas the other methods are applicable also for nonstationary models and can be made arbitrarily precise by increasing the order m. As previously mentioned, the operator-based method is not suitable for inference in R-INLA, but the covariance-based method is. Thus, in conclusion, the covariance-based method provides a method that facilitates inference for stationary and nonstationary fractional SPDE-based models in R-INLA, which is also more accurate than the current INLA method for stationary models.

The numerical experiments in this section were implemented using the rSPDE package. All plots in this section, along with several more, for different choices of all the parameters involved, can be found in a shiny (Chang et al. Citation2021) app available at https://github.com/davidbolin/rSPDE. The results above were obtained by the Clenshaw–Lord Chebyshev–Padé algorithm with δ = 0 (see Section 5). The shiny app also contains the results by the BRASIL algorithm and the Clenshaw–Lord Chebyshev–Padé algorithm with $δ = 10^{- (5 + m) / 2}$ . We also include results on likelihood errors in Appendix G.

8 Application

In this section, we illustrate the usage of the covariance-based rational approximation method through an application to a spatial dataset of precipitation observations. The dataset, available at https://www.image.ucar.edu/Data/precip_tapering/ contains annual precipitation anomalies observed by weather stations in the United States (standardized by the long-run mean and standard deviation for each station). We study the data from the year 1962, which contains observations from 7352 stations throughout the contiguous United States. We chose this dataset because it is simple enough to use a stationary model, which allows us to highlight the advantages of the fractional model without having to construct a complicated hierarchical model. Kaufman, Schervish, and Nychka (Citation2008) also studied this data as an illustration for the covariance tapering method.

We model the data by (4) where $ϵ$ is independent Gaussian measurement noise with $Q_{ϵ} = σ_{ϵ}^{- 2} I$ and u is a Whittle-Matérn field obtained as a solution to (2), where $D$ is a bounded region (see ). The field is discretized using a finite element mesh that covers the contiguous United States with 9485 nodes. shows the mesh and the 7352 stations. Our interest is to compare the stationary SPDE models with either a fractional smoothness parameter ν (referred to as the fractional model) or a fixed parameter ν = 1 (referred to as the integer model) in terms of predictive power. In order to more easily interpret the parameters, we consider a parameterization of the Whittle–Matérn field in terms the standard deviation $σ = \sqrt{Γ (ν)} / (τ κ^{ν} \sqrt{(4 π) Γ (ν + 1)})$ , the practical correlation range $ρ = \sqrt{8 ν} / κ$ , and the smoothness ν. The prior distributions for the parameters are chosen as the default choices from the rSPDE package. That is, the priors of $log (ρ)$ and $log (σ)$ are independent Gaussian distributions with variance 10 and the mean values are chosen based on size of the domain. Further, the prior of ν is a Beta distribution on the interval (0, 1) with mean 1/2 and variance 1/16. The choice of prior for ν is motivated by the fact that we do not believe that this should be a very smooth field. We also tested with Beta distributions on larger intervals and found that this did not affect the parameter estimates or the predictive performance of the model much.

Figure 2. the finite element mesh over the contiguous U.S. And the stations shown as dots.

We fit the models using R (version 4.2.1) and the rSPDE package (version 2.2.0) combined with inlabru (version 2.7.0) and R-INLA (version 23.02.17) running on a machine with an Intel i9-12900KF CPU, 64GB RAM and an Ubuntu operating system. The complete code for the analysis can be found in the supplementary materials. The total time for fitting the fractional and integer models are 38.4 sec and 15.4 sec, respectively.

The posterior distributions of the parameters of the Gaussian field and standard deviation of measurement noise for the two models are shown in . One can note that the posterior mode of ν for the fractional model is around 0.52, which indicates that a fractional smoothness is needed. Compared to the fractional model, the integer model has a smaller σ and a larger $σ_{ϵ}$ , indicating that the latent field explains less of the variability of the data. Finally, the practical correlation range of the integer model is substantially smaller than that of the fractional model, which likely is caused by the fact that a small range is needed to better explain the short range behavior of the data if the smoothness parameter is forced to be an integer.

Figure 3. Posterior distributions of σ, ρ, ν, and $σ_{ϵ}$ .

To further compare the models, we perform two leave-group-out pseudo cross-validation studies (Liu and Rue Citation2022). In the first, for each station, we predict the value of the station based on all data except that from stations that are closer than a certain distance D (referred to as the distance of removed data). We then vary this distance and compute the accuracy of the predictions as functions of D. In the second, for each station, we instead remove the data from the k nearest stations and compute the accuracy of the predictions as functions of k. According to the screening effect (Stein Citation2002), in both cases the removed observations are the most informative. The quality of the prediction is measured in terms of Mean Squared Errors (MSE) and the negative Log-Score (LS) (Good Citation1952). Both metrics are negatively oriented, which means that a lower value indicates a better result. The results of the two cross-validation studies are shown in . We see that the fractional model outperforms the integer model in both cases. For example, the fractional model with the distance of removed data being 400 km achieves the same levels of MSE and negative LS as the integer model with the distance of removed data being 300 km (indicated by dashed lines). Also, the fractional model with 125 removed data points achieves the same levels of MSE and negative LS as the integer model with 100 removed data points. We can note that the two models have similar performance when the distance of removed data and number of removed data are close to zero. This is expected due to the mean-squared continuity of the latent fields, combined with the fact that the models have nugget effects. This means that both models will have an MSE close to the variance of measurement error when the distance of removed data or number of nearest removed data are close to zero.

Figure 4. MSE and negative Log-Score as functions of distance (in km) of removed data (top) and number of removed data (bottom).

9 Discussion

We have introduced a new rational SPDE approach which provides stable and computationally efficient approximations for the covariance structure of generalized Whittle–Matérn Gaussian random fields with general smoothness $β > d / 4$ . We further derived an explicit rate of convergence of the method, which provides a theoretical justification for the approach. Compared to the rational SPDE approach of Bolin and Kirchner (Citation2020), the main advantage is that we obtain a GMRF representation of the approximation. This allowed us to implement the method so that fractional SPDE models now can be estimated in R-INLA, where we in particular can estimate the smoothness parameter from data.

The current version of rSPDE has truncated log-normal and beta priors for the smoothness parameter ν as possible choices. A natural question for future research is how this prior should be chosen in a more systematic way. A potential way to do this is following the idea of penalized complexity priors (PC-priors) (Simpson et al. Citation2017). Fuglstad et al. (Citation2019) derived PC-priors for κ and τ of the Whittle–Matérn fields assuming a fixed value of ν. We plan to extend that work by deriving PC-priors for all three parameters. Another potential area of future work is to extend the proposed method to spatio-temporal SPDE models as those proposed by Lindgren et al. (Citation2020).

Supplementary Materials

Appendices: The seven appendices of the manuscript (Appendix.pdf).

R code and data: Code for replicating results of the application considered in Section 8 with data associated with the code (code_for_application.zip).

Supplemental material

supplementary_materials.zip

Download Zip (1,010 KB)

Acknowledgments

Our sincere thanks to Elias T. Krainski and Håvard Rue for their help with explaining some details of the internal structure of the R-INLA software and to the anonymous reviewers for insightful comments and suggestions on the article.

Disclosure Statement

The authors report there are no competing interests to declare.

References

Bachl, F. E., Lindgren, F., Borchers, D. L., and Illian, J. B. (2019), “Inlabru: An R Package for Bayesian Spatial Modelling from Ecological Survey Data,” Methods in Ecology and Evolution, 10, 760–766. DOI: 10.1111/2041-210X.13168.
Web of Science ®Google Scholar
Baker, G. A., Jr., and Graves-Morris, P. (1996), Padé Approximants (2nd ed.), Volume 59 of Encyclopedia of Mathematics and Its Applications. Cambridge Cambridge University Press.
Google Scholar
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015), Hierarchical Modeling and Analysis for Spatial Data. (2nd ed.), Volume 135 of Monographs on Statistics and Applied Probability. Boca Raton, FL CRC Press.
Google Scholar
Bolin, D., and Kirchner, K. (2020), “The Rational SPDE Approach for Gaussian Random Fields with General Smoothness,” Journal of Computational and Graphical Statistics, 29, 274–285. DOI: 10.1080/10618600.2019.1665537.
Web of Science ®Google Scholar
Bolin, D., and Kirchner, K. (2023), “Equivalence of Measures and Asymptotically Optimal Linear Prediction for Gaussian Random Fields with Fractional-Order Covariance Operators,” Bernoulli, 29, 1476–1504. DOI: 10.3150/22-BEJ1507.
Web of Science ®Google Scholar
Bolin, D., Kirchner, K., and Kovács, M. (2018), “Weak Convergence of Galerkin Approximations for Fractional Elliptic Stochastic PDEs with Spatial White Noise,” BIT Numerical Mathematics, 58, 881–906. DOI: 10.1007/s10543-018-0719-8.
Web of Science ®Google Scholar
Bolin, D., Kirchner, K., and Kovács, M. (2020), “Numerical Solution of Fractional Elliptic Stochastic PDEs with Spatial White Noise,” IMA Journal of Numerical Analysis, 40, 1051–1073. DOI: 10.1093/imanum/dry091.
Web of Science ®Google Scholar
Bolin, D., and Lindgren, F. (2011), “Spatial Models Generated by Nested Stochastic Partial Differential Equations, with an Application to Global Ozone Mapping,” Annals of Applied Statistics, 5, 523–550.
Web of Science ®Google Scholar
Bolin, D., and Simas, A. B. (2023), “rSPDE: Rational Approximations of Fractional Stochastic Partial Differential Equations,” R Package Version 2.2.0
Google Scholar
Chang, W., Cheng, J., Allaire, J., Sievert, C., Schloerke, B., Xie, Y., Allen, J., McPherson, J., Dipert, A., and Borges, B. (2021), shiny: Web Application Framework for R R package version 1.6.0.
Google Scholar
Cox, S. G., and Kirchner, K. (2020), “Regularity and Convergence Analysis in Sobolev and Hölder Spaces for Generalized Whittle-Matérn Fields,” Numerische Mathematik, 146, 819–873. DOI: 10.1007/s00211-020-01151-x.
Web of Science ®Google Scholar
Davies, E. B. (1995), Spectral Theory and Differential Operators, Cambridge Studies in Advanced Mathematics., Cambridge Cambridge University Press.
Google Scholar
Evans, L. C., and Gariepy, R. F. (2015), “Measure Theory and Fine Properties of Functions,” Textbooks in Mathematics. (Revised ed.), Boca Raton, FL CRC Press.
Google Scholar
Fedosov, B. (1963), “Asymptotic Formulas for the Eigenvalues of the Laplacian in the Case of a Polygonal Region,” Soviet Mathematics - Doklady, 4, 1092–1096.
Google Scholar
Fedosov, B. (1964), “Asymptotic Formulas for the Eigenvalues of the Laplace Operator in the Case of a Polyhedron,” Soviet Mathematics - Doklady, 5, 988–990.
Google Scholar
Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2015), “Does Non-Stationary Spatial Data Always Require Non-Stationary Random Fields?,” Spatial Statistics, 14, 505–531. DOI: 10.1016/j.spasta.2015.10.001.
Web of Science ®Google Scholar
Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2019), “Constructing Priors That Penalize the Complexity of Gaussian Random Fields,” Journal of the American Statistical Association, 114, 445–452. DOI: 10.1080/01621459.2017.1415907.
Web of Science ®Google Scholar
Good, I. J. (1952), “Rational Decisions,” Journal of the Royal Statistical Society, Series B, 14, 107–114. DOI: 10.1111/j.2517-6161.1952.tb00104.x.
Google Scholar
Grisvard, P. (2011), Elliptic Problems in Nonsmooth Domains., Volume 69 of Classics in Applied Mathematics. Philadelphia, PA Society for Industrial and Applied Mathematics (SIAM.).
Google Scholar
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., and Zammit-Mangion, A. (2019), “A Case Study Competition among Methods for Analyzing Large Spatial Data,” Journal of Agricultural, Biological, and Environmental Statistics, 24, 398–425. DOI: 10.1007/s13253-018-00348-w.
PubMed Web of Science ®Google Scholar
Herrmann, L., Kirchner, K., and Schwab, C. (2020), “Multilevel Approximation of Gaussian Random Fields: Fast Simulation,” Mathematical Models and Methods in Applied Sciences, 30, 181–223. DOI: 10.1142/S0218202520500050.
Web of Science ®Google Scholar
Hildeman, A., Bolin, D., and Rychlik, I. (2021), “Deformed SPDE Models with an Application to Spatial Modeling of Significant Wave Height,” Spatial Statistics, 42, 100449–100427. Paper NoDOI: 10.1016/j.spasta.2020.100449.
Web of Science ®Google Scholar
Hofreither, C. (2021), “An Algorithm for Best Rational Approximation Based on Barycentric Rational Interpolation,” Numerical Algorithms, 88, 365–388. DOI: 10.1007/s11075-020-01042-0.
Web of Science ®Google Scholar
Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008), “Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets,” Journal of the American Statistical Association, 103, 1545–1555. DOI: 10.1198/016214508000000959.
Web of Science ®Google Scholar
Khristenko, U., Scarabosio, L., Swierczynski, P., Ullmann, E., and Wohlmuth, B. (2019), “Analysis of Boundary Effects on PDE-Based Sampling of Whittle-Matérn Random Fields,” SIAM/ASA Journal on Uncertainty Quantification, 7, 948–974. DOI: 10.1137/18M1215700.
Web of Science ®Google Scholar
Lindgren, F., Bakka, H., Bolin, D., Krainski, E., and Rue, H. (2020), “A Diffusion-based Spatio-Temporal Extension of Gaussian Matérn fields,” arXiv: 2006.04917v2.
Google Scholar
Lindgren, F., Bolin, D., and Rue, H. (2022), “The SPDE Approach for Gaussian and non-Gaussian Fields: 10 Years and Still Running,” Spatial Statistics, 50, 100599. Paper NoDOI: 10.1016/j.spasta.2022.100599.
Web of Science ®Google Scholar
Lindgren, F., and Rue, H. (2015), “Bayesian Spatial Modelling with R-INLA,” Journal of Statistical Software, 63, 1–25. DOI: 10.18637/jss.v063.i19.
Web of Science ®Google Scholar
Lindgren, F., Rue, H., and Lindström, J. (2011), “An Explicit Link between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 73, 423–498. DOI: 10.1111/j.1467-9868.2011.00777.x.
Web of Science ®Google Scholar
Liu, Z., and Rue, H. (2022), “Leave-Group-Out Cross-validation for Latent Gaussian Models,” arXiv:2210.04482.
Google Scholar
Lototsky, S. V., and Rozovsky, B. L. (2017), Stochastic Partial Differential Equations. Universitext, Cham Springer.
Google Scholar
Matérn, B. (1960), Spatial Variation: Stochastic Models and their Application to Some Problems in Forest Surveys and Other Sampling Investigations Statens Skogsforskningsinstitut, Stockholm. Meddelanden Från Statens Skogsforskningsinstitut, Band 49, Nr. 5.
Google Scholar
R Core Team (2022), R: A Language and Environment for Statistical Computing. Vienna, Austria R Foundation for Statistical Computing.
Google Scholar
Remez, E. Y. (1934), “Sur la Détermination Des Polynômes D’Approximation de Degré Donnée,” Communications of the Kharkov Mathematical Society, 10, 41–63.
Google Scholar
Rue, H., and Held, L. (2005), Gaussian Markov Random Fields., Volume 104 of Monographs on Statistics and Applied Probability. Boca Raton, FL Chapman & Hall/CRC. Theory and Applications.
Google Scholar
Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 71, 319–392. DOI: 10.1111/j.1467-9868.2008.00700.x.
Web of Science ®Google Scholar
Simpson, D., Rue, H., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017), “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors,” Statistical Science, 32, 28. DOI: 10.1214/16-STS576.
Web of Science ®Google Scholar
Stein, M. L. (1999), Interpolation of Spatial Data. Springer Series in Statistics. New York Springer-Verlag. Some Theory for Kriging.
Google Scholar
Stein, M. L. (2002), “The Screening Effect in Kriging,” The Annals of Statistics, 30, 298–323. DOI: 10.1214/aos/1015362194.
Web of Science ®Google Scholar
Steinwart, I., and Scovel, C. (2012), “Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs,” Constructive Approximation, 35, 363–417. DOI: 10.1007/s00365-012-9153-3.
Web of Science ®Google Scholar
Whittle, P. (1963), “Stochastic Processes in Several Dimensions,” Bulletin of the International Statistical Institute, 40, 974–994.
Google Scholar

Covariance–Based Rational Approximations of Fractional SPDEs for Computationally Efficient Bayesian Inference

ABSTRACT

1 Introduction

2 Overview of the Approximation Strategy