Full article: Fast, Approximate Maximum Likelihood Estimation of Log-Gaussian Cox Processes

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Log-Gaussian Cox processes (LGCPs) offer a framework for regression-style modeling of point patterns that can accommodate spatial latent effects. These latent effects can be used to account for missing predictors or other sources of clustering that could not be explained by a Poisson process. Fitting LGCP models can be challenging because the marginal likelihood does not have a closed form and it involves a high dimensional integral to account for the latent Gaussian field. We propose a novel methodology for fitting LGCP models that addresses these challenges using a combination of variational approximation and reduced rank interpolation. Additionally, we implement automatic differentiation to obtain exact gradient information, for computationally efficient optimization and to consider integral approximation using the Laplace method. We demonstrate the method’s performance through both simulations and a real data application, with promising results in terms of computational speed and accuracy compared to that of existing approaches. Supplementary material for this article is available online.

KEYWORD:

1 Introduction

Researchers often wish to model a spatial point pattern—a collection of event locations in some domain of interest—as a function of a set of predictors measured over the spatial domain. A motivating example in this article is a survey of gorilla nesting sites in Kagwene Gorilla Sanctuary, Cameroon (Funwi-Gabga and Mateu Citation2012, ; further detail can be found in Section 5). We would like to model this point pattern as a function of elevation and other spatial variables in order to better understand where gorillas nest and why. This can be done using a spatial point process model. However, point patterns often exhibit clustering that cannot be explained by measured predictors alone, rendering the widely-used Inhomogeneous Poisson Process (IPP) model inappropriate, as we see for the gorilla data (, top panel).

Fig. 1 A: An observed point pattern of gorilla nesting locations in Kagwene Gorilla Sanctuary in Cameroon. B: An inhomogeneous K function for the point pattern (Baddeley and Turner Citation2000), with simulation envelope (computed as in Myllymäki et al. Citation2017), to check goodness of fit under an IPP model (top panel) and LGCP model (bottom panel). For model details see Section 5. Note there is evidence of additional spatial clustering to that accounted for by the IPP model, but not so for the LGCP.

The log-Gaussian Cox Process (LGCP; Møller, Syversveen, and Waagepetersen Citation1998) offers a way to incorporate additional spatial clustering into point process models that is not captured by measured covariates. This is achieved by including a Gaussian Random Field (GRF) to induce additional spatial correlation between observations—effectively acting as a spatially correlated error term in the model. LGCP models are particularly appropriate in instances where clustering arises from missing or unmeasured environmental processes/phenomena, as opposed to those in which clustering/dispersal is due to interactions between the point events themselves (Diggle et al. Citation2013). Here we are focusing on a LGCP framework in the context of modeling a realized point process (via its point pattern) driven by a collection of measured and unmeasured predictors throughout some spatial domain, rather than trying to model, say, potential parent and offspring processes that sometimes underlie interactions between points (and see also Adams, Murray, and MacKay Citation2009, Gonçalves and Gamerman Citation2018, who fit different types of Cox process outside of the log-Gaussian framework).

Fitting LGCP models is computationally challenging. The largest burden in fitting LGCPs arises from the variance-covariance structure of a finite-dimensional realization of the latent Gaussian process. Taylor et al. (Citation2013) provide an R package that fits a LGCP via Markov-chain Monte Carlo (MCMC) methods, initially using Metropolis-Hastings algorithms via Gibbs samplers, although a Metropolis-adjusted Langevin algorithm (MALA) is now more common (Shirota and Gelfand Citation2017). These algorithms can be very costly computationally, and do not scale well as the size of point patterns increases, because of the need to sample the high-dimensional Gaussian random field at each iteration. Faster algorithms have been proposed using a few general strategies: a reduced rank approximation to the latent variance-covariance matrix (Chakraborty et al. Citation2011, Simpson et al. Citation2016); inducing sparsity using, for example, nearest-neighbor Gaussian processes (Datta et al. Citation2016, Shirota and Banerjee Citation2019); aggregating point events to counts (Johnson, Diggle, and Giorgi Citation2019); or approximating the posterior distribution using variational Bayes mean-field (Teng, Nathoo, and Johnson Citation2017) or innovative applications of the Laplace method (Rue, Martino, and Chopin Citation2009). Of these options, Integrated Nested Laplace Approximation (INLA; Rue, Martino, and Chopin Citation2009, Simpson et al. Citation2016) is currently the most widely used in practice, exploiting work by Lindgren, Rue, and Lindström (Citation2011) to approximate a Gaussian-Markov Random Field (GMRF) using a reduced rank approach that is motivated as a solution to a stochastic partial differential equation. In simulation studies, Taylor and Diggle (Citation2014) found INLA to be generally less accurate but faster than MCMC-MALA assuming covariance parameters are known a priori, and Teng, Nathoo, and Johnson (Citation2017) found INLA to have similar accuracy but substantially faster computation times than variational Bayes mean-field and Hamiltonian Monte Carlo approaches. Bachl et al. (Citation2019) wrote a software wrapper (inlabru) to facilitate use of INLA to fit LGCP models by non-specialists.

In the machine learning literature, sparse Gaussian processes are a popular approach (Quinonero-Candela and Rasmussen Citation2005) closely related to other reduced-rank Gaussian processes used to induce correlation in hierarchical models (Banerjee et al. Citation2008, Datta et al. Citation2016). Variational Bayesian approximations have also been explored in this context (Titsias Citation2009, Hensman, Matthews, and Ghahramani Citation2015), including for point patterns (Lloyd et al. Citation2015). In this article we also propose a variational approach and exploit sparsity to reduce computation times, but we do so in the context of log-Gaussian Cox Processes, fitted by (approximate) maximum likelihood.

Within the frequentist paradigm, Waagepetersen and Guan (Citation2009) proposed a two-step fitting procedure for this problem, which involves estimating first-order parameters of the mean intensity function using a Poisson likelihood score function, then estimating higher-order parameters of the latent field to minimize contrasts between the theoretical and parameterized K functions. This idea is related to method-of-moments estimation, and while it can achieve fits in seconds for quite complex models, it does so by emphasizing pragmatism rather than statistical efficiency.

In this article we propose a fast, novel maximum-likelihood approach to fitting LGCP to spatial point patterns, involving three innovations outlined in Section 3. First, variational approximation (VA) permits a closed form approximation to the marginalized log-likelihood. Second, we use a reduced-rank approximation to the large spatial variance-covariance matrices that arise. Finally, automatic differentiation is used to quickly obtain gradient information for efficient optimization and inference. Performance in fitting LGCP is also examined in comparison to INLA in simulation (Section 4) and application (Section 5). We conclude the article with a brief discussion (Section 6).

2 Log-Gaussian Cox Processes

We denote the domain of interest by $D$ which is continually indexed by spatial parameter s so that we write the presence-only data as $S_{n} = {s_{i}}$ for $i = 1, \dots, n$ , that is, a collection of locations at which a species of interest is recorded. In this article we consider $D \subseteq R^{2}$ , however, the methods could be applied more generally.

The target of inference for point process models is their spatially varying intensity function $λ (s)$ , which describes the limiting number of point events per unit area in some infinitesimally small region around the point s. We assume the log-intensity is a linear combination of predictor variables, $X (s)$ with coefficients $β$ . For conciseness, any intercept term is absorbed into $β$ , with a corresponding indicator variable in $X (s)$ . For LGCPs, the intensity inherits stochasticity from a zero-mean, latent Gaussian process, $ξ (s)$ , characterized by some valid covariance function ( $f_{ξ}$ ) governed by parameters $θ_{ξ}$ . Note this model assumes that there are two stochastic outcomes to the observed point pattern, $S_{N = n}$ , that is, the realization of the Gaussian field and the realization of the Poisson process. The LGCP intensity is given by (1) $\ln λ (s) = X (s) β + ξ (s) .$ (1)

We further assume that conditional on ξ, S_n is the realization of an IPP with intensity $λ (s)$ .

We will use $π (\cdot)$ throughout to denote the probability density function of a random variable (or stochastic process) in general, with different probability density functions distinguished by the arguments of $π (\cdot)$ . It can be shown that the point pattern has (conditional) probability density: (2) $π (S_{n} | ξ) = {\prod_{i = 1}^{n} \exp {X (s_{i}) β + ξ (s_{i})} } \exp { | D | - \int_{D} λ (t) d t }$ (2) with respect to a unit rate Poisson process, where $| D |$ denotes the size of the domain of interest, see Daley and Vere-Jones (Citation2007) for details. The proportionality constant $\exp {| D |}$ is dropped in subsequent use for convenience.

The spatial integral that describes the mean is often approximated using numerical quadrature (e.g., Berman and Turner Citation1992, Illian, Sørbye, and Rue Citation2012): (3) $\int_{D} λ (t) d t \approx \sum_{i = n + 1}^{n + q} w_{i} \exp {X (s_{i}) β + ξ (s_{i})}$ (3) where $D$ has effectively been discretized into q regions represented by quadrature points ${s_{i}}_{i = n + 1}^{n + q}$ , in addition to the n point events, so that each $i = n + 1, \dots, n + q$ occupies a neighborhood of area $w_{i} \equiv | s_{i} |$ such that $| D | = \sum_{i = n + 1}^{n + q} w_{i}$ . The quadrat sizes, $| s_{i} |, i = n + 1, \dots, n + q$ , need not be constant (i.e., forming a regular grid) and could, for example, be estimated using a Dirichlet tessellation (Simpson et al. Citation2016). In this article, for simplicity, we will choose quadrature points on a regular grid such that they have equal quadrat size $| s_{i} | = \frac{| D |}{q}$ for $i = n + 1, \dots, n + q$ , and we set $| s_{i} | = 0$ for presence points, $i = 1, \dots, n$ . Quadrature weights set in this way are not a function of the observed point pattern S_n, which is convenient for simulation, or analysis of multiple point patterns in a common domain.

Applying the approximation in (3) involves an n + q dimensional realization to the Gaussian random field $ξ (s)$ , at both the observed point events S_n and the quadrature points, which we write as $ξ = {[ξ (s_{1}), \dots, ξ (s_{n + q})]}^{T}$ . This quickly becomes problematic when computing the marginal log-likelihood of the point pattern, given by (4) $l (β) \approx \log \int \sim π (S_{n} | ξ) π (ξ) d ξ$ (4) where $π (ξ) \sim N (0, C)$ ; $\sim π (S_{n} | ξ)$ is (2) approximated with (3); and C is the variance-covariance matrix induced by the covariance function, $f_{ξ}$ . Note that C has dimension $(n + q) \times (n + q)$ which becomes a computational bottleneck when either n or q is large. The number of quadrature points q is necessarily large in settings where the predictors, hence, estimated intensity, are not spatially smooth. A further difficulty is that (4) does not have a closed form.

3 Fast Approximate Estimation of Log-Gaussian Cox Processes

Our proposed strategy is to maximize an approximation to the marginal likelihood that addresses key computational bottlenecks in a pragmatic fashion.

3.1 Approximate Marginalization

We consider addressing the intractability of (4) using two types of approximation—a Gaussian variational approximation (Ormerod and Wand Citation2010, VA), or a Laplace approximation.

Our variational approximation involves re-expressing the marginal likelihood (4) as (5) $\begin{matrix} l (β) = \ln π (S_{n}) = \ln π (S_{n}) \cdot \int π (ξ | S_{n}) d ξ \\ = \int \ln [\frac{π (S_{n} | ξ) π (ξ)}{π (ξ | S_{n})}] π (ξ | S_{n}) d ξ \end{matrix}$ (5) then replacing the conditional or “posterior” density of the latent field, $π (ξ | S_{n})$ with some multivariate Gaussian density function, $π_{VA} (ξ) \sim N (m_{VA}, C_{VA})$ . This leads to a closed form approximation to (4), ${\underline{l}}_{VA} (β)$ , which can be written as (6) $\begin{matrix} {\underline{l}}_{VA} (β) = \int \ln (\frac{π (S_{n} | ξ) π (ξ)}{π_{VA} (ξ)}) π_{VA} (ξ) d ξ \\ \approx E_{VA} [\sum_{i = 1}^{n} X (s_{i}) β + ξ (s_{i})] \\ - E_{VA} [\sum_{i = n + 1}^{n + q} w_{i} \exp {X (s_{i}) β + ξ (s_{i})}] \\ + E_{VA} [\ln (\frac{π (ξ)}{π_{VA} (ξ)})] \\ = \sum_{i = 1}^{n} X (s_{i}) β + {[m_{VA}]}_{i} - \sum_{i = n + 1}^{n + q} w_{i} \\ \exp {X (s_{i}) β + {[m_{VA}]}_{i} + \frac{1}{2} {[C_{VA}]}_{i, i}} \\ + \frac{1}{2} [\ln | C^{- 1} C_{VA} | - Tr (C^{- 1} C_{VA}) \\ - m_{VA}^{T} C^{- 1} m_{VA} + (n + q)] \end{matrix}$ (6) where $E_{VA}$ denotes expectation with respect to the variational density; ${[m_{VA}]}_{i}$ denotes the ith element of vector $m_{VA}$ ; and ${[C_{VA}]}_{i, i}$ denotes the ith diagonal element of matrix $C_{VA}$ . We estimate parameters to satisfy ${\hat{β}, {\hat{m}}_{VA}, {\hat{C}}_{VA}, \hat{C}} = \underset{{β, m_{VA}, C_{VA}, C}}{\arg \max} {\underline{l}}_{VA} .$

That is, we find estimates that simultaneously maximize ${\underline{l}}_{VA}$ for the fixed effects, $β$ , as well as the variational parameters, $m_{VA}$ and $C_{VA}$ , and the prior variance-covariance matrix C. Maximizing with respect to the variational parameters can be motivated as minimizing the Kullback-Leibler distance between $π_{VA} (ξ)$ and $π (ξ | S_{n})$ (Ormerod and Wand Citation2010). A more detailed derivation of these results is available in the supplementary material: Section S1.

An alternative to variational approximation is to use a Laplace approximation for the marginal likelihood. If we again re-express the marginal likelihood in (4) as $\begin{matrix} l (β) & = \log [const . \cdot \int \exp {g (ξ)} d ξ], \end{matrix}$ then its Laplace approximation is given by (7) ${\underline{l}}_{Laplace} (β) \approx \log [const . \cdot \frac{{(2 π)}^{\frac{k}{2}}}{{| - H_{g} (ξ_{0}) |}^{\frac{1}{2}}} \exp {g (ξ_{0})}]$ (7) where H_g is the Hessian matrix of $g (ξ) = \ln π (S_{n} | ξ) + \ln π (ξ)$ , twice differentiable with respect to $ξ$ . The approximation is centered at $ξ_{0}$ which maximizes g and so in this case is the mode of the joint distribution of the latent field and point pattern, $π (S_{n}, ξ)$ . As the approximation involves only derivatives with respect to $ξ$ , we can implement this approach easily using automatic differentiation—see Section 3.3.

Note that the Laplace approximation assumes that the integrand is Gaussian and centeres the approximation about the mode of the joint probability distribution, while a Gaussian variational approximation assumes the posterior distribution of the latent field (given the point pattern) is Gaussian, with the approximation centred about the mean. The idea of assuming the posterior is Gaussian is particularly appealing since its prior distribution is Gaussian, so this remains a plausible approximation in data-poor settings, which need not be true of the Gaussian approximation to the integrand.

We also note that Laplace approximations can perform poorly for high dimensional integrals (Shun and McCullagh Citation1995), and the integral in (4) would often be high-dimensional. However, in the next section we propose a rank reduction approach where we effectively reduce the dimension to $k ≪ n + q$ , which addresses this issue.

3.2 Rank Reduction

As mentioned previously, a key computational bottleneck for both the variational and Laplace approximation is that $ξ$ has dimension n + q, and n or q is often large. We address this problem using a reduced rank approximation. That is, we assume (8) $ξ (s) \approx Z (s) u$ (8) where $Z (s) = [Z_{1} (s), \dots, Z_{k} (s)]$ is a set of k basis functions and $u = {[u_{1}, \dots, u_{k}]}^{T} \overset{iid}{\sim} N (0, σ_{prior}^{2} I)$ are random coefficients.

This approach is commonly referred to as fixed rank kriging (FRK) after Cressie and Johannesson (Citation2008). The software package FRK (Zammit-Mangion and Cressie Citation2021) can assist in implementation, providing options for generating basis functions under a range of different configurations, including approximations for spatio-temporal GPs by indexing the random coefficients over time. Closely related methods are predictive processes (Banerjee et al. Citation2008), which construct basis functions around knots using kriging, and the GMRF mesh of Lindgren, Rue, and Lindström (Citation2011), which represents a Gaussian process as a collection of piecewise linear basis functions. A key difference here is that the basis functions in (8) are not chosen as a function of covariance parameters, so they do not need to be updated in optimization, which has advantages in computational efficiency and stability.

Choice of number and type of basis functions $Z_{r} (s)$ is a key step, these govern the spatial dependence induced in $ξ$ . For $r = 1, \dots, k$ , we define a regular grid of “knots” ( $s_{r}^{*} \in D$ ) and compute local bi-square functions of the form (9) $Z_{r} (s) = { \begin{matrix} {[1 - {(\frac{d (s, s_{r}^{*})}{φ})}^{2}]}^{2} & | d (s, s_{r}^{*}) | \leq φ \\ 0 & | d (s, s_{r}^{*}) | > φ \end{matrix}$ (9) where $φ$ is the function radius, which enforces sparsity. We have found that use of sparse basis functions results in considerable computational savings. If the knots have been organized in a regular grid with spacing e then we choose $φ = 1.5 e$ as in Zammit-Mangion and Cressie (Citation2021). By choosing $φ$ as a function of knot configuration, the only remaining tuning parameter in our basis function configuration is the number of basis functions k. An alternative approach could be to consider the knots as variational parameters as in Titsias (Citation2009).

Alternative basis function configurations are obviously possible. In our explorations we have found negligible effect of the functional form of the basis functions, but the number of basis functions and their radius is important. We suggest these choices be informed by the data, as detailed later. One alternative configuration worthy of noting is to specify basis functions at multiple resolutions (usually two or three), with k_l basis functions of radius $φ_{l}$ at level l. When k_l varies across resolutions this configuration can capture variation at a range of scales of dependence (Cressie and Johannesson Citation2008).

Applying the assumption in (8), together with quadrature approximation as in (3), our model simplifies to a type of Generalized Linear Mixed Model (GLMM) with weighted observations. Specifically, the marginal likelihood we wish to maximize has the form: (10) $\begin{matrix} l_{u} (β) \approx \int {\sum_{i = 1}^{n} \ln λ_{u} (s_{i}) - \sum_{i = n + 1}^{n + q} w_{i} λ_{u} (s_{i})} π (u | S_{n}) d u \\ + \int \ln [\frac{π (u)}{π (u | S_{n})}] π (u | S_{n}) d u \end{matrix}$ (10) where $\ln λ_{u} (s) = X (s) β + Z (s) u$ . Our reduced rank assumption (8) makes our likelihood approximations much more tractable, as we now marginalize over k-dimensional u rather than n + q-dimensional $ξ$ .

For our variational approximation to (10), we replace $π (u ∣ S_{n})$ with $N (μ, Σ)$ . These type of variational approximations to GLMMs have previously been studied by Ormerod and Wand (Citation2012) and, given that our model takes such a form, we have the following result:

Theorem 1. A

mong all variational approximations to (10) taking the form $u ∣ S_{n} \overset{VA}{\sim} N (μ, Σ)$ , the optimal lower bound is achieved when $Σ = diag {σ_{1}^{2}, \dots, σ_{k}^{2}}$ .

This result follows directly from Theorem 1 of Ormerod and Wand (Citation2012), since we have assumed that the $u_{1}, \dots, u_{k}$ are independent. We can then set the variational variance-covariance matrix of u to be diagonal and our variational approximation reduces to (11) $\begin{matrix} {\underline{l}}_{u, VA} (β, μ, σ^{2}, σ_{prior}^{- 2}) \\ = \sum_{i = 1}^{n} X (s_{i}) β + Z (s_{i}) μ \\ - \sum_{i = n + 1}^{n + q} w_{i} \exp {X (s_{i}) β + Z (s_{i}) μ + \frac{1}{2} \sum_{r = 1}^{k} σ_{r}^{2} Z_{r} {(s_{i})}^{2}} \\ + \frac{1}{2} [k \ln (σ_{prior}^{- 2}) + (\sum_{r = 1}^{k} \ln (σ_{r}^{2})) \\ - σ_{prior}^{- 2} (\sum_{r = 1}^{k} μ_{r}^{2} + σ_{r}^{2}) + k] . \end{matrix}$ (11)

Derivation of this can be found in the supplementary material: Section S1. The assumed independence in the random coefficients is key to our use of (11). By Theorem 1, this is the optimal variational approximation if the coefficients are truly independent. Our simulation study (Section 4) aims to demonstrate the quality of the approximation empirically in cases where the true data-generation process is an LGCP with a general GRF.

3.3 Automatic Differentiation

The final component of our methodology for rapid fitting of LGCP models is the use of automatic differentiation (AD; see Griewank Citation1989) via the Template Model Builder (TMB) package in R (Kristensen et al. Citation2016). This computes exact derivatives of a given function quickly, which has a few benefits in the maximum likelihood context: gradient information can speed up optimization algorithms (e.g., Shanno Citation1970); it enables automated approximation of intractable integrals using the Laplace approximation, given that these are a function only of first and second derivatives as in (7); the Hessian matrix can be obtained automatically, for likelihood-based inference about parameters.

TMB software requires the objective function of interest to be coded in C++. We implemented both approximations previously discussed—the variational approximation in (11), and also a Laplace approximation, computed automatically from the integrand using AD techniques.

4 Simulation Study

We undertook a simulation study to compare the computational and statistical efficiency of our proposed methodology against other approximate methods for fitting LGCP within the statistical computing language R. Computations were executed on a high performance computing cluster running a 64 bit version of the CentOS distribution of Linux (Katana, UNSW Sydney Citation2010) with allowances for 20-60GB of RAM. Comparisons with exact fitting routines were not computationally feasible for the simulation settings here, for example, using a MALA routine (via the R package lgcp: Taylor et al. Citation2013) was found to require many millions of samples in each simulation to achieve adequate mixing properties, as suggested in Taylor et al. (Citation2015), which would take at least several hours to fit to a moderate-sized dataset, and would add months to computation time in our simulations.

Data were simulated on the unit square as a Poisson point process. The intensity function was a log-linear function of a single, deterministic covariate X, and a zero-mean GRF (ξ): $\ln λ (s) = β_{0} + β_{1} X (s) + ξ (s) .$

ξ was assumed to have an isotropic Gaussian covariance function, $f_{ξ} = \exp {- d {(s, t)}^{2} / ρ^{2}}$ , where ρ is a parameter that controls the correlation range. We additionally explored simulations under an exponential covariance function, $f_{ξ} = \exp {- d (s, t) / ρ}$ to reflect a broader range of latent field surfaces—see supplementary material: Section S2.1 for further details. We treated ξ as an unmeasured/unobserved covariate, and hence this formed a LGCP.

In our simulation design we examined the interplay between the spatial scale of the covariate X and the latent random field ξ using a 2 × 2 simulation design, where each of X and ξ was either chosen to be wiggly (W), with a correlation range of $\approx 0.05$ , or smooth (S), with a correlation range of $\approx 0.3$ (). We expect models to more accurately estimate the true data simulation process when the spatial scales of X and ξ do not coincide.

Fig. 2 The 2 × 2 simulation design showing the scenarios examined. Deterministic functions are used for the covariate ( $X (s)$ ) while particular examples of the latent field, $ξ (s)$ , are shown here. “Smooth” means a range of effect $\approx 0.3$ while “wiggly” means a range of effect $\approx 0.05$ , with the spatial domain being the unit square.

We used the spatstat package in R (Turner and Baddeley Citation2005) to simulate 1000 point patterns from this LGCP within each scenario. We controlled the size of point patterns simulated through β₀ to examine our simulation scenarios with expected number of points $E [N (D)] = 1000, 10, 000, 100, 000$ . We standardized the covariate and set the marginal variance of the latent field to 1 so that the magnitudes of model components were roughly equal, to assist convergence in optimization. We fixed the single covariate effect ( $β_{1} = 1.25$ ) as preliminary investigation revealed no change in relative performance of the competing models for a varied fixed effect size. To ensure an adequately fine quadrature approximation, (3), we used a regular square 101 × 101 grid of quadrature points.

We fitted point process regression models to the data, assuming intensity was a log-linear function of X and unobserved ξ, using four different procedures:

INLA via the package INLA, a common approach to fitting a LGCP regression model (Lindgren, Rue, and Lindström Citation2011, Simpson et al. Citation2016). The GRF was approximated using piecewise linear basis functions generated via INLA::inla.mesh.create(), applied under default settings to our regular grid of quadrature point locations. We used penalized complexity priors on covariance parameters of the latent field following Fuglstad et al. (Citation2019), setting $\Pr (σ > 5) = 0.01$ and $\Pr (ρ < 0.025) = 0.01$ .

VA the proposed methodology of Section 3, using a variational approximation to the likelihood as in (11) programmed in C++ and fit using TMB and quasi-Newton optimization. FRK was used with either a sparse grid of basis functions (7 × 7) or a dense grid (14 × 14) as in (9).

Lp the proposed methodology of Section 3, using a Laplace approximation to the likelihood as in (7) with the integrand programmed in C++ and fit using TMB and quasi-Newton optimization. FRK was used with either a sparse grid of basis functions (7 × 7) or a dense grid (14 × 14) as in (9).spNNGP (Finley, Datta, and Banerjee Citation2020) uses MCMC-MALA to fit spatial regression models. Poisson models cannot be fitted in spNNGP, so we assumed binomial data with a logistic link, which approximates an LGCP when the number of quadrature points is large (Warton and Shepherd Citation2010). The latent field is approximated using a nearest-neighbor Gaussian process (Datta et al. Citation2016). Priors on the latent field’s covariance parameters are somewhat informative, for details see supplementary material: Section S2.3.

IPP an inhomogeneous Poisson process, that is, with ξ omitted, to study the implications of failing to account for the missing covariate when fitting the model. This was fitted using (2) and (3), each excluding ξ, programmed in C++ and fit using TMB and quasi-Newton optimization.

We expect Lp and VA to perform better when the spatial scale of the basis functions matches the scale of ξ, that is, using a set of basis functions set along a regular 7 × 7 grid should perform better when ξ is smooth (W,S or S,S in ), whereas using basis functions set along a 14 × 14 grid should perform better when ξ is wiggly (S,W or W,W). We don’t expect spNNGP to perform as well because it only approximates a LGCP, and we only sampled a short chain (taking $10, 000$ samples after $20, 000$ burn-in) to keep computation time manageable in a simulation setting. As mentioned previously, we also expect difficulties teasing apart effects of X and ξ when they operate at similar spatial scales (S,S or W,W).

We defined how well a method fits by: Root Mean Square Error (RMSE) of $\hat{β_{1}}$ ; coverage probability of Wald confidence intervals for $\hat{β_{1}}$ ; and Kullback-Leibler divergence between the fitted and true $λ (D)$ , which has the form: (12) $\begin{matrix} \int_{D} λ (s) \ln [\frac{λ (s)}{\hat{λ} (s)}] d s \approx \sum_{i = n + 1}^{n + q} w_{i} λ (s_{i}) \ln [\frac{λ (s_{i})}{\hat{λ} (s_{i})}] \\ - \sum_{i = n + 1}^{n + q} w_{i} [λ (s_{i}) - \hat{λ} (s_{i})] . \end{matrix}$ (12)

The spNNGP method, fitting a different model, is biased for the intercept term and $λ (s)$ . We manually corrected for this by adjusting $\hat{λ} (s)$ for spNNGP such that its integral over $D$ was equal to n, the size of the simulated point pattern. Results summarizing the various scenarios under investigation can be found in and . The full and detailed results tables can be found in supplementary material: Section S2.4.

Fig. 3 Average computation times over variously sized point patterns ( $E [N (D)] = 1 000, 10, 000, 100, 000$ )—see supplementary material: Section S2.2 for details. Models include IPP, INLA, spNNGP and our proposed method using variational (VA) and Laplace (Lp) approximations. The latter two models were fitted both using a coarse regular grid of basis functions, 7 × 7 and a fine regular grid, 14 × 14. For small point patterns our methods were at least 50 times faster than INLA on average. For large point patterns they were as much as 1000 times faster than INLA on average.

Fig. 4 Performance of point and interval estimators for the various models, across simulation scenarios using either a wiggly (W) or a smooth (S) covariate and latent field (labeled with the covariate first, for example, “W,S” means the covariate is wiggly and the latent field smooth). A: Root mean squared error estimating the slope coefficient β₁. B: Kullback-Leibler divergence from the true intensity field $λ (D)$ to that fitted by each model. C: Coverage probabilities of 95% Wald intervals for the slope estimate, $\hat{β_{1}}$ (dotted vertical line indicates nominal coverage). We found that, provided we choose the most appropriate basis function configuration for the scenario, our proposed methods performed comparably to INLA in point estimation of β₁, and for Lp, also in estimation of intensity and interval coverage.

Simulations were very computationally intensive, with the generation and fitting of 1000 simulated datasets taking up to 48 thousand hours in some simulation settings.

4.1 Simulation Results

In simulations, our methods of fitting a LGCP model were orders of magnitude faster than INLA, converting computation times in some cases from half an hour to a few seconds (). IPP was faster again, but note it does not incorporate spatially correlated errors, which comes at considerable costs to performance. The number of basis functions used in our proposed methods (VA and Lp) appeared to affect computation time, more so than the size of the point pattern being fitted, but computation times remained substantially faster than INLA for all basis function configurations we considered. Despite fitting spNNGP with a relatively short chain length, it was still the slowest method considered, taking over an hour per dataset. In the remaining results we restrict our focus to point patterns with $E [N (D)] = 1000$ however, trends in results are broadly consistent with the more computationally demanding settings—see supplementary material: Section S2.2.

compares point estimates across fitting techniques. It was typically easier to estimate the parameters when the covariate and latent fields were more distinct, that is, acting at different spatial scales, as we found RMSE to be larger in W,W compared to W,S and S,S compared to S,W. Likewise with confidence intervals (), all models had poor coverage when the spatial scales of the covariate and latent fields were similar.

It does appear important that we select an appropriate number of basis functions for the type of latent field we are approximating—we needed many basis functions (14 × 14) when the latent field was wiggly (S,W or W,W) and fewer (7 × 7) when the latent field was smooth (S,S or W,S). Further, the cost of using too few basis functions (7 × 7 for S,W or W,W), is more than the cost of using more than are needed (14 × 14 for S,S or W,S).

When a basis function configuration was used that was appropriate for the simulation setting, the proposed Laplace approximation methods produced generally comparable results to INLA (). Variational approximation was competitive in RMSE (), but noticeably less accurate as an estimator of intensity when many basis functions were used (), and its confidence intervals tended to be too short (, with supplementary material: Section S2.4 for interval widths).

While choice of the number of basis functions (k) is key, the data could be used to guide this decision by selecting k to maximize the likelihood. We looked at how reliable a data-driven strategy was at choosing the more appropriate basis function configuration for each simulation scenario, and found that the more appropriate configuration could be chosen approximately 99% of the time using a Laplace approximation (see supplementary material: Table S1 for further details). The variational approximation had difficulty in the W,W simulations. We also note that the Laplace approach was sensitive to non-convergence, we found approximately a 1.7% fit failure rate for point patterns with $E [N (D)] = 1000$ —we discuss this further in Section 6.

Table 1 Four-fold cross-validated predicted conditional log-likelihoods of the various models and corresponding computation times.

Display Table

5 Real Data Example: Gorilla Nesting Locations

We illustrate our proposed method by analyzing the gorilla nesting dataset () provided in the R package inlabru (Bachl et al. Citation2019). We wish to model intensity of gorilla nest locations as a log-linear function of three predictors—elevation above sea level; distance to nearest water source; and average temperature, as a three level ordinal factor ().

Fig. 5 A: Spatial covariate describing elevation in meters above sea level. B: Spatial covariate describing distance in meters to nearest water source. C: Factor describing heat category: 1 = Coolest, 2 = Moderate, 3 = Warmest. D: Latent field (conditional on the data) estimated by VA method. E: Latent field (conditional on the data) estimated by INLA. F: Spatially blocked four-fold cross-validation.

We modeled the gorilla nesting point pattern using four methods for comparison—an IPP, as well as an LGCP using the INLA package on R, and both the variational and Laplace versions of our proposed methodology. As in Section 3.2 we used a regular grid of k local bi-square basis functions, but here we chose k to maximize our variational approximation to the marginal likelihood, arriving at k = 63 (). Note that models with different values of k are not nested, because basis functions radius is linked to k (Section 3.2), so when choosing k to maximize the likelihood we are in effect using the data to estimate the spatial range of the latent field ξ.

Fig. 6 The variational approximation to the marginal log-likelihood of the proposed LGCP model as a function of increasingly dense regular grids of local bi-square basis functions. The vertical dashed line denotes the basis functions at which the maximum is found. Secondary plot and axis (gray) indicates the cumulative computational time, in seconds, taken to fit models using each of the basis function configurations.

We compared the fitting methods in terms of predictive performance, via a hold-one-out, four-fold cross-validation (CV) that predicts the likelihood in held out test areas of the domain. The folds used are a collection of spatial blocked regions, aligned diagonally across the domain (). We compared model fits using predicted conditional log-likelihood: (13) $\begin{matrix} \sum_{h = 1}^{4} [\sum_{i = 1}^{n_{h}} X (s_{i}^{(h)}) {\hat{β}}^{∖ h} + {\hat{ξ}}^{∖ h} (s_{i}^{(h)}) \\ - \sum_{i = n_{h} + 1}^{n_{h} + q_{h}} w_{i}^{(h)} \exp {X (s_{i}^{(h)}) {\hat{β}}^{∖ h} + {\hat{ξ}}^{∖ h} (s_{i}^{(h)})}] \end{matrix}$ (13) where for fold h we denote as ${s_{i}^{(h)}}_{i = 1}^{n_{h}}$ the point events and ${s_{i}^{(h)}}_{i = n_{h} + 1}^{n_{h} + q_{h}}$ the quadrature points in this fold, and $w_{i}^{(h)}$ are their corresponding weights. The ${\hat{β}}^{∖ h}$ are fixed effect parameters estimated from the data excluding fold h, and ${\hat{ξ}}^{∖ h}$ is likewise the estimated latent field from the training data without fold h.

We found that our proposed method reduced the computation time required for a single model fit from minutes to seconds, with negligible loss of predictive performance (). Specifically, our four-fold cross-validation procedure took over 45 min to complete on INLA using code from Krainski et al. (Citation2018), whereas the same procedure, using a variational approximation with the appropriate number of basis functions, took just over 10 sec for a comparable predicted conditional likelihood—when run on Katana, UNSW Sydney (Citation2010) with allowance for 20GB of RAM. Note that this computation time does not take into account the basis function search of , which took an additional two and a half minutes. INLA used a mesh structure to approximate the latent field, provided by Bachl et al. (Citation2019), which act like basis functions (k = 1479) as described earlier. This larger value for k is one of the reasons for the difference in computation time, explored further in . In light of this seems excessive, but we found that INLA performed poorly if given a comparable number of basis functions. Conversely, our proposed method performed poorly when using $k \approx 1479$ due to overfitting (). Omitting a latent field altogether (IPP) was much faster to fit than other methods, but with very low predictive conditional likelihood.

Table 2 The computation time and likelihood for a single model fit for our proposed model for which we toggled on/off key components—a variational approximation for the marginalization (VA, vs. Laplace approximation, Lp); automatic differentiation (AD); sparse basis approximation for the latent field (Sparse/Dense Z); and large number of basis functions (Large k).

Display Table

We also constructed point and interval estimates of the fixed effects parameters in all models (). The most conspicuous difference was that the effect of elevation (β₁) was estimated to be larger for IPP, with an unrealistically small standard error. Amongst the LGCP fits, parameter estimates were generally similar, although confidence limits were longer for INLA than Lp or VA, especially for elevation. Note that elevation was the covariate that was smoothest spatially (), varying at a comparable spatial scale to our estimated latent fields (), so spatial confounding may be an issue here (analogous to S,S in simulations). This was our worst-case scenario from simulations () and we should not put a lot of weight in inferences about the elevation effect using any of the methods considered.

Fig. 7 Estimated fixed effects 95% Wald confidence intervals for the various models being compared. (a) Elevation effect. (b) Distance to water effect. (c) Heat category—contrast effect of Moderate to Coolest. (d) Heat category—contrast effect of Warmest to Coolest. We see little difference in the estimated fixed effects for the LGCP models (INLA, VA, Lp), so much of the differences in predictive performance came down to the way the latent effect was modeled (or was not modeled for the IPP).

Finally, we want to get a sense of which components of our novel methodology contributed most to the substantial improvements in the speed with which models can be fit. We have proposed three main innovations: using a variational or Laplace approximation; a fixed rank kriging approximation to the latent field, using a relatively small number of sparse basis functions; and automatic differentiation. We looked at the contributions of each of these innovations to computation time for the gorilla nesting LGCP model, by comparing fits for: variational versus Laplace approximations; sparse basis functions (9) versus dense basis functions using thin plate splines as in (as in Tzeng and Huang Citation2018); optimizing ${\underline{l}}_{VA}$ using automatic differentiation via TMB versus a quasi-Newton optimization, using function values only, of an expression for ${\underline{l}}_{VA}$ written in R; and using a small number of basis functions (k = 63) rather than a number closer to that used by Bachl et al. (Citation2019) (k = 1470). When we examine the various component models we see that each contributes substantial relative gains in computation speed. The largest speed gains appear to be due to use of AD and use of a small number of basis functions, relative to INLA. The variational approximation yielded the most modest speed gain here, but it still offered an almost four-fold improvement in computation time compared to using a Laplace approximation.

6 Discussion

We have proposed methodology for fitting LGCP models to point patterns that is orders of magnitude faster than INLA, a common method for fitting these models in practice. This speed-up came with negligible loss in statistical efficiency, if a Laplace approximation were used together with an appropriate basis function configuration. We implemented three key techniques—likelihood approximation (via a variational or Laplace approach), rank reduction, and automatic differentiation, each of which contributed to the computational speed-up (). The largest speed advantages seem to come from rank reduction and from using automatic differentiation (and associated coding of the objective function in C++). We also found considerable computational savings when using sparse basis functions (Cressie and Johannesson Citation2008) in rank reduction.

The two likelihood approximations considered here had different strengths—the Laplace approach provided more reliable inferences (), whereas the variational approach was faster and more stable (supplementary material: Table S1), especially for large k, with some suggestion that it also gave more precise point estimates (see supplementary material). The variational approximation tended to underestimate error in point estimates (Section 4), a result also noticed in Hui et al. (Citation2019), when studying variational approximations to generalized additive models. Our approach has similarities to the Poisson family of these models, if our approximation to the latent field were viewed as a 2D penalized smoother. Hui et al. (Citation2019) suggest using the method of Louis (Citation1982) to obtain a more robust estimator of the observed information matrix. The Louis (Citation1982) method was designed for missing data analysis, so by using this method we are arguing that the variational parameters represent missing data, which is somewhat defensible given that the variational parameters are used to estimate an unobserved random effect. Until such an approach is pursued we recommend that the Laplace approximation be used for inference while the variational approach primarily be used for exploratory work, for example, when doing a basis function search as in (also supplementary material Figure S6). A variational approach could also be used to provide warm starts for parameters in Laplace fits to reduce nonconvergence and speed up optimization for the Laplace approach, which we have found to have some success.

A key decision when implementing our method is choice of number of basis functions, and we recommend using a data-driven approach to decide this as in our application (). In simulations, performance of our proposed approach depended strongly on which basis function configuration was used (), but we found that the data could be used to quite reliably identify the relevant configuration to use for a given simulation scenario (supplementary material, Table S1). Fitting repeated models at different basis function configurations arguably erodes some of the computational gains of our method. But note that choice of basis function configuration might also be important to performance of INLA, but exploring this issue is computationally prohibitive in many applications—recall that a single INLA fit to our example dataset took more than 10 min () compared to the two and a half minutes taken to determine an appropriate basis configuration for our approach ().

Decisions were also required concerning the functional form of our basis functions and their knot locations. Cressie and Johannesson (Citation2008) suggest that the precise choice of form of basis function is not critical, nor is its location, as seemed to be the case in our exploratory analyses. The important consideration however is the number of basis functions and their spatial range, as flagged previously, which we determined using the data (Section 5). Cressie and Johannesson (Citation2008) also recommended using multiple resolutions of basis functions, which could readily be implemented here. Including multiple resolutions in a single fit could be particularly useful in settings where there might be multiple missing covariates, operating at different spatial scales, or in massive data situations where it is not practical to fit multiple models to search for an appropriate resolution for basis functions. In our application, we instead searched for an appropriate resolution for basis functions, and in simulations (Section 4), our latent field was stationary and Gaussian hence it operated at a single spatial scale. Note that in our simulations we found point estimation and inference to be difficult when the scale of the covariate and latent field coincided (scenarios S,S and W,W), essentially, basis functions became confounded with the covariate. Including multiple resolutions of basis functions may only increase the chance of such spatial confounding (Hodges and Reich Citation2010). Perhaps the merits of including multiple resolutions depends whether the main objective is to accurately predict the intensity of point events at certain locations, or to understand their relationship to particular measured predictors.

Our proposed approach to fitting LGCPs offers speed gains so large that we can consider using LGCPs in different ways to how they have been used previously. We have already exploited this speed gain to perform model selection to choose the number of basis functions and their range. This would not be computationally feasible using INLA, where a fit at a single basis function configuration took over 10 min. Other opportunities are in extending the model to handle more complex data structures, such as flexible models for spatio-temporal point patterns, and integrating different data types, as is often desired when modeling species distribution in ecology (Dorazio Citation2014, Fithian et al. Citation2015).

Supplementary Materials

All supplemental files are contained in a single archive.

Fast LGCP Modeling—Supplementary.pdf: containing S1: Proofs and working for the variational approximation to the marginal log-likelihood for a proposed rank-reduced LGCP Model. S2: Complete and detailed results for the simulation study (Section 4). S3: Figure of complete basis function search for applied data analysis (Section 5).

Fast LGCP Code and Data.zip: Gorilla nesting dataset, R coding scripts, and saved objects to conduct the applied data analyses presented in the manuscript.

readme.txt: Description of the Supplementary Material archive, including directions for data access and use of code.

Supplemental material

Supplemental Material

Download Zip (85.1 MB)

Acknowledgments

The authors would like to thank the anonymous reviewers for their contribution to the improved manuscript.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

This work was supported by the Australian Government Research Training Program (RTP) Scholarship, as well as the Australian Research Council Discovery Project scheme (DP180103543).

References

Adams, R. P., Murray, I., and MacKay, D. J. (2009), “Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities,” in Proceedings of the 26th Annual International Conference on Machine Learning. DOI: 10.1145/1553374.1553376.
Google Scholar
Bachl, F. E., Lindgren, F., Borchers, D. L., and Illian, J. B. (2019), “inlabru: An R Package for Bayesian Spatial Modelling from Ecological Survey Data,” Methods in Ecology and Evolution, 10, 760–766. DOI: 10.1111/2041-210X.13168.
Web of Science ®Google Scholar
Baddeley, A., and Turner, R. (2000), “Practical Maximum Pseudolikelihood for Spatial Point Patterns” (with discussion), Australian & New Zealand Journal of Statistics, 42, 283–322. DOI: 10.1111/1467-842X.00128.
Web of Science ®Google Scholar
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008), “Gaussian Predictive Process Models for Large Spatial Data Sets,” Journal of the Royal Statistical Society, 70, 825–848. DOI: 10.1111/j.1467-9868.2008.00663.x.
Google Scholar
Berman, M., and Turner, T. R. (1992), “Approximating Point Process Likelihoods with GLIM,” Journal of the Royal Statistical Society, Series C, 41, 31–38. DOI: 10.2307/2347614.
Google Scholar
Chakraborty, A., Gelfand, A. E., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2011), “Point Pattern Modelling for Degraded Presence-Only Data Over Large Regions,” Journal of the Royal Statistical Society, Series C, 60, 757–776. DOI: 10.1111/j.1467-9876.2011.01023.x.
Google Scholar
Cressie, N., and Johannesson, G. (2008), “Fixed Rank Kriging for Very Large Spatial Data Sets,” Journal of the Royal Statistical Society, Series B, 70, 209–226. DOI: 10.1111/j.1467-9868.2007.00633.x.
Google Scholar
Daley, D. J., and Vere-Jones, D. (2007), An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure, New York: Springer.
Google Scholar
Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016), “Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets,” Journal of the American Statistical Association, 111, 800–812. DOI: 10.1080/01621459.2015.1044091.
PubMed Web of Science ®Google Scholar
Diggle, P. J., Moraga, P., Rowlingson, B., and Taylor, B. M. (2013), “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm,” Statistical Science, 28, 542–563. DOI: 10.1214/13-STS441.
Web of Science ®Google Scholar
Dorazio, R. M. (2014), “Accounting for Imperfect Detection and Survey Bias in Statistical Analysis of Presence-Only Data,” Global Ecology and Biogeography, 23, 1472–1484. DOI: 10.1111/geb.12216.
Web of Science ®Google Scholar
Finley, A. O., Datta, A., and Banerjee, S. (2020), “spNNGP R Package for Nearest Neighbor Gaussian Process Models,” arXiv preprint arXiv:2001.09111.
Google Scholar
Fithian, W., Elith, J., Hastie, T., and Keith, D. A. (2015), “Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species,” Methods in Ecology and Evolution, 6, 424–438. DOI: 10.1111/2041-210X.12242.
PubMed Web of Science ®Google Scholar
Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2019), “Constructing Priors that Penalize the Complexity of Gaussian Random Fields,” Journal of the American Statistical Association, 114, 445–452. DOI: 10.1080/01621459.2017.1415907.
Web of Science ®Google Scholar
Funwi-Gabga, N., and Mateu, J. (2012), “Understanding the Nesting Spatial Behaviour of Gorillas in the Kagwene Sanctuary, Cameroon,” Stochastic Environmental Research and Risk Assessment, 26, 793–811. DOI: 10.1007/s00477-011-0541-1.
Web of Science ®Google Scholar
Gonçalves, F. B., and Gamerman, D. (2018), “Exact Bayesian Inference in Spatiotemporal Cox Processes Driven by Multivariate Gaussian Processes,” Journal of the Royal Statistical Society, Series B, 80, 157–175. DOI: 10.1111/rssb.12237.
Google Scholar
Griewank, A. (1989), “On Automatic Differentiation,” Mathematical Programming: Recent Developments and Applications, 6, 83–107.
Google Scholar
Hensman, J., Matthews, A., and Ghahramani, Z. (2015), “Scalable Variational Gaussian Process Classification,” in Artificial Intelligence and Statistics, PMLR.
Google Scholar
Hodges, J. S., and Reich, B. J. (2010), “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love,” The American Statistician, 64, 325–334. DOI: 10.1198/tast.2010.10052.
Web of Science ®Google Scholar
Hui, F. K. C., You, C., Shang, H. L., and Müller, S. (2019), “Semiparametric Regression Using Variational Approximations,” Journal of the American Statistical Association, 114, 1765–1777. DOI: 10.1080/01621459.2018.1518235.
Web of Science ®Google Scholar
Illian, J. B., Sørbye, S. H., and Rue, H. (2012), “A Toolbox for Fitting Complex Spatial Point Process Models Using Integrated Nested Laplace Approximation (INLA),” The Annals of Applied Statistics, 6, 1499–1530. DOI: 10.1214/11-AOAS530.
Web of Science ®Google Scholar
Johnson, O., Diggle, P., and Giorgi, E. (2019), “A Spatially Discrete Approximation to Log-Gaussian Cox Processes for Modelling Aggregated Disease Count Data,” Statistics in Medicine, 38, 4871–4887. DOI: 10.1002/sim.8339.
PubMed Web of Science ®Google Scholar
Katana, UNSW Sydney (2010), “PVC (Research Infrastructure), UNSW Sydney.” DOI: 10.26190/669X-A286.
Google Scholar
Krainski, E., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F., and Rue, H. (2018), Advanced Spatial Modeling with Stochastic Partial Differential Equations using R and INLA, Boca Raton, FL: Chapman and Hall/CRC.
Google Scholar
Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., and Bell, B. M. (2016), “TMB: Automatic Differentiation and Laplace Approximation,” Journal of Statistical Software, 70, 1–21. DOI: 10.18637/jss.v070.i05.
Web of Science ®Google Scholar
Lindgren, F., Rue, H., and Lindström, J. (2011), “An Explicit Link between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach,” Journal of the Royal Statistical Society, Series B, 73, 423–498. DOI: 10.1111/j.1467-9868.2011.00777.x.
Google Scholar
Lloyd, C., Gunter, T., Osborne, M., and Roberts, S. (2015), “Variational Inference for Gaussian Process Modulated Poisson Processes,” in International Conference on Machine Learning, PMLR.
Google Scholar
Louis, T. A. (1982), “Finding the Observed Information Matrix when using the EM Algorithm,” Journal of the Royal Statistical Society, Series B, 44, 226–233. DOI: 10.1111/j.2517-6161.1982.tb01203.x.
Google Scholar
Møller, J., Syversveen, A. R., and Waagepetersen, R. P. (1998), “Log Gaussian Cox Processes,” Scandinavian Journal of Statistics, 25, 451–482. DOI: 10.1111/1467-9469.00115.
Web of Science ®Google Scholar
Myllymäki, M., Mrkvicka, T., Grabarnik, P., Seijo, H., and Hahn, U. (2017), “Global Envelope Tests for Spatial Processes,” Journal of the Royal Statistical Society, Series B, 79, 381–404. DOI: 10.1111/rssb.12172.
Google Scholar
Ormerod, J. T., and Wand, M. (2010), “Explaining Variational Approximations,” The American Statistician, 64, 140–153. DOI: 10.1198/tast.2010.09058.
Web of Science ®Google Scholar
Ormerod, J. T., and Wand, M. P. (2012), “Gaussian Variational Approximate Inference for Generalized Linear Mixed Models,” Journal of Computational and Graphical Statistics, 21, 2–17. DOI: 10.1198/jcgs.2011.09118.
Web of Science ®Google Scholar
Quinonero-Candela, J., and Rasmussen, C. E. (2005), “A Unifying View of Sparse Approximate Gaussian Process Regression,” The Journal of Machine Learning Research, 6, 1939–1959.
Web of Science ®Google Scholar
Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations” (with discussion), Journal of the Royal Statistical Society, 71, 319–392. DOI: 10.1111/j.1467-9868.2008.00700.x.
Web of Science ®Google Scholar
Shanno, D. F. (1970), “Conditioning of quasi-Newton Methods for Function Minimization,” Mathematics of Computation, 24, 647–656. DOI: 10.1090/S0025-5718-1970-0274029-X.
Web of Science ®Google Scholar
Shirota, S., and Banerjee, S. (2019), “Scalable Inference for Space-Time Gaussian Cox Processes,” Journal of Time Series Analysis, 40, 269–287. DOI: 10.1111/jtsa.12457.
Web of Science ®Google Scholar
Shirota, S., and Gelfand, A. E. (2017), “Space and Circular Time Log Gaussian Cox Processes with Application to Crime Event Data,” The Annals of Applied Statistics, 11, 481–503. DOI: 10.1214/16-AOAS960.
Web of Science ®Google Scholar
Shun, Z., and McCullagh, P. (1995), “Laplace Approximation of High Dimensional Integrals,” Journal of the Royal Statistical Society, Series B, 57, 749–760. DOI: 10.1111/j.2517-6161.1995.tb02060.x.
Google Scholar
Simpson, D., Illian, J., Lindgren, F., Sørbye, S. H., and Rue, H. (2016), “Going Off Grid: Computationally Efficient Inference for log-Gaussian Cox Processes,” Biometrika, 103, 49–70. DOI: 10.1093/biomet/asv064.
Web of Science ®Google Scholar
Taylor, B., Davies, T., Rowlingson, B., and Diggle, P. (2015), “Bayesian Inference and Data Augmentation Schemes for Spatial, Spatiotemporal and Multivariate log-Gaussian Cox Processes in R,” Journal of Statistical Software, 63, 1–48. DOI: 10.18637/jss.v063.i07.
Web of Science ®Google Scholar
Taylor, B. M., Davies, T. M., Rowlingson, B. S., and Diggle, P. J. (2013), “lgcp: An R Package for Inference with Spatial and Spatio-Temporal Log-Gaussian Cox Processes,” Journal of Statistical Software, 52, 1–40. DOI: 10.18637/jss.v052.i04.
PubMed Web of Science ®Google Scholar
Taylor, B. M., and Diggle, P. J. (2014), “INLA or MCMC? A Tutorial and Comparative Evaluation for Spatial Prediction in Log-Gaussian Cox Processes,” Journal of Statistical Computation and Simulation, 84, 2266–2284. DOI: 10.1080/00949655.2013.788653.
Web of Science ®Google Scholar
Teng, M., Nathoo, F., and Johnson, T. D. (2017), “Bayesian Computation for Log-Gaussian Cox Processes: A Comparative Analysis of Methods,” Journal of Statistical Computation and Simulation, 87, 2227–2252. DOI: 10.1080/00949655.2017.1326117.
PubMed Web of Science ®Google Scholar
Titsias, M. (2009), “Variational Learning of Inducing Variables in Sparse Gaussian Processes,” in Artificial Intelligence and Statistics, PMLR.
Google Scholar
Turner, R., and Baddeley, A. (2005), “SPATSTAT: An R Package for Analyzing Spatial Point Patterns,” Journal of Statistical Software, 12, 1–42. DOI: 10.18637/jss.v012.i06.
Web of Science ®Google Scholar
Tzeng, S., and Huang, H.-C. (2018), “Resolution Adaptive Fixed Rank kriging,” Technometrics, 60, 198–208. DOI: 10.1080/00401706.2017.1345701.
Web of Science ®Google Scholar
Waagepetersen, R., and Guan, Y. (2009), “Two-Step Estimation for Inhomogeneous Spatial Point Processes,” Journal of the Royal Statistical Society, Series B, 71, 685–702. DOI: 10.1111/j.1467-9868.2008.00702.x.
Google Scholar
Warton, D. I., and Shepherd, L. C. (2010), “Poisson Point Process Models Solve the “Pseudo-Absence Problem” for Presence-Only Data in Ecology,” The Annals of Applied Statistics, 4, 1383–1402. DOI: 10.1214/10-AOAS331.
Web of Science ®Google Scholar
Zammit-Mangion, A., and Cressie, N. (2021), “FRK: An R Package for Spatial and Spatio-Temporal Prediction with Large Datasets,” Journal of Statistical Software, 98, 1–48. DOI: 10.18637/jss.v098.i04.
Web of Science ®Google Scholar

Fast, Approximate Maximum Likelihood Estimation of Log-Gaussian Cox Processes

Abstract

1 Introduction

2 Log-Gaussian Cox Processes