670
Views
1
CrossRef citations to date
0
Altmetric
Bayesian Methods

Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localization

ORCID Icon, &
Pages 101-111 | Received 21 Jul 2022, Accepted 28 May 2023, Published online: 20 Jul 2023

Abstract

Likelihood-free methods are an essential tool for performing inference for implicit models which can be simulated from, but for which the corresponding likelihood is intractable. However, common likelihood-free methods do not scale well to a large number of model parameters. A promising approach to high-dimensional likelihood-free inference involves estimating low-dimensional marginal posteriors by conditioning only on summary statistics believed to be informative for the low-dimensional component, and then combining the low-dimensional approximations in some way. In this article, we demonstrate that such low-dimensional approximations can be surprisingly poor in practice for seemingly intuitive summary statistic choices. We describe an idealized low-dimensional summary statistic that is, in principle, suitable for marginal estimation. However, a direct approximation of the idealized choice is difficult in practice. We thus suggest an alternative approach to marginal estimation which is easier to implement and automate. Given an initial choice of low-dimensional summary statistic that might only be informative about a marginal posterior location, the new method improves performance by first crudely localizing the posterior approximation using all the summary statistics to ensure global identifiability, followed by a second step that hones in on an accurate low-dimensional approximation using the low-dimensional summary statistic. We show that the posterior this approach targets can be represented as a logarithmic pool of posterior distributions based on the low-dimensional and full summary statistics, respectively. The good performance of our method is illustrated in low to moderate dimensional examples. Computer code to implement the methods for the examples of this article is available at https://github.com/cdrovandi/ABC-marginal-approximations.

1 Introduction

Likelihood-free statistical methods such as approximate Bayesian computation (ABC, Sisson, Fan, and Beaumont Citation2018) and Bayesian synthetic likelihood (BSL, Wood Citation2010; Price et al. Citation2018) are now commonly applied to conduct inference on the parameters of computationally expensive models for which simulation of synthetic data is easy, but likelihood computation is impractical. Such approaches aim to find values of the model parameters for which simulated and observed data are close, typically on the basis of a set of summary statistics.

Standard likelihood-free techniques such as ABC and BSL are known to scale poorly to a large number of summary statistics (Blum Citation2010; Frazier et al. Citation2022). Although BSL methods scale more readily to higher-dimensional summaries (Priddle et al. Citation2022), its performance can still degrade as the dimension of the summaries increases. In spite of this, the use of a high-dimensional summary statistic might seem desirable for several reasons. First, a high-dimensional summary statistic might be necessary to mitigate the inevitable information loss incurred by reducing the full dataset. Second, when the model contains a large number of parameters, a large number of summary statistics are required by default: when the dimension of the summary statistic is less than the parameter, it is not possible to point-identify all parameters. From a computational perspective, however, matching of simulated and observed summary statistics and posterior exploration become more difficult in higher dimensions.

In the ABC framework, a promising approach for performing likelihood-free analyses for high-dimensional parameter problems is marginal adjustment (Nott et al. Citation2014) and the closely related copula ABC approach (Li et al. Citation2017; Chen and Gutmann Citation2019). These approaches estimate low-dimensional (e.g., one or two dimensional) marginal posterior distributions, and then combine them in some way to obtain an approximation to the joint posterior distribution. The idea is that each parameter, or pair of parameters, is likely to be informed by only a small number of summary statistics. Thus, in principle, it is possible to accurately estimate these low-dimensional posterior distributions, and avoid the curse of dimensionality associated with the use of high-dimensional summary statistics. Other reasons why accurate marginal approximations might be important are that researchers often report inferences for individual parameters and that in initial modeling phases the identifiability of the model can be explored.

The purpose of this article is to demonstrate that the approximation of low-dimensional marginal posteriors based on intuitive low-dimensional summary statistic choices can be surprisingly poor in some cases, and less accurate than the corresponding approximation using all the summary statistics. Motivated by this observation, we describe a summary statistic choice for marginal posterior estimation that would, in principle, deliver precise marginal posterior inferences. Unfortunately, this statistic is not practically feasible to construct in the situations to which ABC is generally applied. Our more practical approach is based on crudely localizing the posterior approximation using all the summary statistics, and then subsequently conducting marginal inferences by matching low-dimensional summary statistics that are informative for different marginal parameters. We show that our two-stage approach can be thought of as sampling a logarithmic pool of posterior distributions based on low-dimensional and full summary statistics, respectively. In Appendix A of the supplementary material, we describe how our article is related to other parts of the literature, including other high-dimensional likelihood-free methods, machine learning approaches and how our approach is relevant beyond the ABC setting. We demonstrate our findings and method on low to moderate dimensional simulated examples. We leave the significant challenges of extending this work to high-dimensional models and model misspecification for future research.

2 Marginal Approximations in Likelihood-Free Inference

2.1 Approximate Bayesian Computation

We denote the parameter of the statistical model of interest as θΘRp, where p is the number of parameters. The observed data is given by y=(y1,,yn)Yn where n denotes the number of observations. Ideally we wish to base our statistical inferences on the posterior density π(θ|y)p(y|θ)π(θ),

where p(y|θ) is the likelihood function and π(θ) the prior density. However, for many complex models of interest, p(y|θ) may be too computationally expensive to permit the application of exact methods, up to Monte Carlo error, for approximating expectations with respect to the desired posterior. In such situations, we can resort to likelihood-free methods, which replace likelihood evaluations with model simulations, to obtain an approximation to the posterior.

Perhaps the most popular statistical likelihood-free method is ABC. In ABC, we often first choose a summary statistic function, S:YnS where SRd and d is the number of summary statistics, to map the data to a lower dimensional space. We then compare observed data y and simulated data xYn on the basis of the corresponding observed and simulated summary statistics. Overloading notation, we write the summary statistic function evaluated at a generic value for the data simply by S. When it is necessary to distinguish between the summary statistic function evaluated at the observed data y and simulated data x, we denote these by Sy=S(y), and Sx=S(x), respectively.

Write π(θ|Sy)π(θ)p(Sy|θ) for the partial posterior distribution conditioned on the summary statistic Sy, where p(S|θ) is the summary statistic likelihood evaluated at SS. In ABC, the posterior density is approximated as follows: (1) π(θ|y)π(θ|Sy)πϵ(θ|Sy)π(θ)pϵ(Sy|θ),(1) where πϵ(θ|Sy) denotes the ABC posterior, and (2) pϵ(Sy|θ)=Ynp(Sx|θ)Kϵ{ρ(Sy,Sx)}dSx,(2) with ρ:S×SR+a distance function that compares observed and simulated data, and Kϵ:R+R+a kernel function that is designed to be relatively large when ρ is small. We will refer to pϵ(Sy|θ) defined in (2) as the ABC likelihood. It is a kernel smoothed version of the true summary statistic likelihood p(Sy|θ). Ultimately, for a given S and ρ, the bandwidth parameter of the kernel ϵ, also referred to as the ABC tolerance, defines when the observed and simulated data are considered close. ABC can be more easily understood when using the indicator kernel function, Kϵ{ρ(Sy,Sx)}=I{ρ(Sy,Sx)ϵ}, which is equal to one when ρ(Sy,Sx)ϵ and 0 otherwise. The integral in (1) for a given θ can be unbiasedly estimated by taking a single draw xp(·|θ) from the model and evaluating Kϵ{ρ(Sy,Sx)}.

Although there are many algorithms for sampling from the ABC posterior (Sisson and Fan Citation2018), in this article we use the sequential Monte Carlo (SMC) ABC algorithm of Drovandi and Pettitt (Citation2011a). This method generates samples (often referred to as particles in the SMC context) from an adaptive sequence of ABC posteriors with decreasing ABC thresholds. It does this by eliminating a proportion of particles with the highest discrepancy at each iteration. The population of particles is rejuvenated via a resampling and move step, the latter achieved with an MCMC ABC kernel to preserve the distribution of particles. The number of MCMC iterations applied to each particle adapts with the overall MCMC acceptance rate. In this article we stop the SMC ABC algorithm when the acceptance rate in the MCMC step drops below 1%, since it implies that a great deal more computation is required to progress to smaller values of ϵ. The proposal distribution in the MCMC ABC kernel is a Gaussian random walk, with a covariance tuned based on the population of SMC particles, which is reasonably efficient for the examples considered in this article.

2.2 Marginal Approximations

It is well known that simple ABC methods struggle to produce accurate approximations of the (partial) posterior as the dimension of the summary statistic increases. A Monte Carlo estimate for the ABC likelihood (2) results in a conditional kernel density estimator of the likelihood for the summaries, the statistical behavior of which is known to severely degrade as the dimension of the summary statistic increases, even if the ABC tolerance is favorably chosen. Results in Blum (Citation2010), Blum and François (Citation2010), and Barber, Voss, and Webster (Citation2015) make the effect of the dimension on simple ABC algorithms more precise.

Further theoretical results for ABC methods are described in Li and Fearnhead (Citation2018b), Frazier et al. (Citation2018), and Li and Fearnhead (Citation2018a) consider regression adjusted ABC methods. Li and Fearnhead (Citation2018a, Citation2018b) obtain interesting results about both point estimation for the ABC posterior mean, and accurate uncertainty quantification of the ABC posterior, for rejection and importance sampling ABC algorithms with and without regression adjustment for an appropriately chosen sample-size dependent tolerance. With an adaptively chosen and informative proposal distribution, so long as the tolerance goes to zero fast enough, the authors demonstrate that ABC regression adjustment methods can control the Monte Carlo error of the posterior approximation and provide correct uncertainty quantification. However, it is not so easy to disentangle the impact of the summary statistic dimension in this theory, since the construction of the required proposal becomes more difficult as the parameter dimension increases, which is one situation when high-dimensional summary statistics are needed. Furthermore, in the case of simple accept/reject ABC, results in Frazier et al. (Citation2018) demonstrate that to control the Monte Carlo error the tolerance used in ABC must be chosen as an increasing function of the summary statistic dimension. We note that, in the BSL framework, Frazier et al. (Citation2022) obtain similar results to those of Li and Fearnhead (Citation2018a) for regression adjusted ABC, showing that the two methods behave similarly from a computational standpoint.

An appealing approach for overcoming issues associated with likelihood-free methods in high dimensions is to approximate low-dimensional posterior distributions (Nott et al. Citation2014; Li et al. Citation2017). Let θj be some component of θ. Note that θj may consist of more than one parameter, but for the purposes of this article it suffices to treat θj as a scalar. We note that θj may be informed by a relatively small number of summary statistics. We denote the corresponding summary statistic function as Sj:YnSj where SjS, and the observed statistic as Sj,y=Sj(y). The motivation for this is that πϵj(θj|Sj,y) could be a much better approximation to π(θj|Sy) compared to πϵ(θj|Sy) for two inter-related reasons: first, since Sj is lower dimensional, it is easier to find matches between observed and simulated summary statistic vectors within a given tolerance region; second, since we are attempting to match summaries in a lower-dimensional space, we can more readily control the approximation error introduced through the use of a positive tolerance (i.e., it is generally the case that the tolerance value ϵj can be much smaller than ϵ). For these reasons, there are meaningful benefits to employing low-dimensional summaries when targeting marginal inference in ABC.

In this article we demonstrate that, even with an intuitively reasonable choice of Sj, sometimes the approximation πϵj(θj|Sj,y) can be less accurate than πϵ(θj|Sy), and substantially so. As an example of an intuitively reasonable choice, the semi-automatic ABC method of Fearnhead and Prangle (Citation2012) produces summary statistics which are estimated posterior means for θ, and Sj could be obtained by extracting estimates for θj. That such an intuitive choice can perform badly is perhaps well-known, but it is worthwhile to demonstrate this in a simple normal example where using sufficient statistics to produce posterior approximations may not produce accurate inferences for θj.

2.2.1 The Normal Case

Write N(μ,ϕ) for a normal distribution with mean μ and variance ϕ, and write N(z;μ,ϕ) for the value of its density at z. Let y=(y1,y2,,yn) be a random sample drawn from a N(μ,ϕ) distribution with unknown μ and ϕ. We assume that μ and ϕ are independent a priori and allocate priors: μN(μ0,ϕ0) and ϕIG(α,β) where IG(α,β) denotes the inverse-gamma distribution with shape α and scale β and μ0,ϕ0,α,β are fixed. A minimal and Bayes sufficient statistic for estimating (μ,ϕ) jointly is y¯=1ni=1nyi and s2=1ni=1n(yiy¯)2, that is, π(μ,ϕ|y)=π(μ,ϕ|y¯,s2). Since y¯ and s2 are direct point estimates of μ and ϕ, respectively, it might be tempting to consider approximating the marginals of these parameters conditioning only on their respective point estimate statistics.

A certain realization of y with n=10,μ=0,ϕ=1,μ0=0,ϕ0=1,α=1,β=1 produces the results shown in .Footnote1 It is evident that π(μ|y¯) produces a diffuse approximation of the actual marginal posterior, whereas the approximation π(ϕ|s2) is highly accurate for the actual ϕ-marginal.

Fig. 1. Posteriors for toy normal example.

Fig. 1. Posteriors for toy normal example.

To understand the findings in , first rewrite the normal likelihood as N(y;μ,ϕ)p1(s2|ϕ)p2(y¯|μ,ϕ)={ϕ(n1)/2exp{s2/2ϕ}}{12πϕexp{n(y¯μ)2/2ϕ}}.

For inference on μ, the above decomposition makes clear that the marginal posterior for μ will depend on s2: under the inverse-gamma prior for ϕ we have π(μ|s2,y¯)π(μ)[β+n(μy¯)2(n1)s2]αn/2. In contrast, if we take as our “likelihood” for inference on μ, exp{n(y¯μ)2/2ϕ}, under the same prior for π(ϕ), we have the marginal posterior π(μ|y¯)π(μ)[β+n(μy¯)2]α1/2. While the two posterior locations are similar, their scales are not, with the use of the simpler likelihood resulting in a loss of precision, see .

Now we turn to the marginal approximation of ϕ. The term p2(y¯|μ,ϕ) is nonconstant for all ϕ except in the case where y¯=μ. Hence, the likelihood contribution of p2(y¯|μ,ϕ) is informative for inference on ϕ based on this likelihood; see, for example, Zhu and Reid (Citation1994) for additional discussion on this example.

However, the marginal posterior for ϕ|s2 is π(ϕ|s2)p1(s2|ϕ)π(ϕ), while the marginal posterior for ϕ|s2,y¯ is π(ϕ|s2,y¯)p1(s2|ϕ)π(ϕ)12πϕexp{n(μy¯)2/2ϕ}π(μ)dμ. While the two posteriors appear different, the integrand in π(ϕ|s2,y¯) can be shown to be constant as a function of ϕ under a diffuse prior for μ. Hence, even though the likelihood contribution containing y¯, that is, p2(y¯|μ,ϕ), does carry meaningful information about ϕ, the “likelihood” that results for integrating out μ, under a diffuse prior, ensures that the integrated component p2(y¯|μ,ϕ)π(μ)dμ has no information about ϕ that is not already contained in p1(s2|ϕ). This explains the accurate results for ϕ in . In more substantive problems pertinent to likelihood-free inference, the integrated likelihood term that results from marginalization is unlikely to simplify, and we should not expect accurate marginal approximations even for a seemingly obvious choice of low-dimensional summary statistic.

On a more intuitive level, s2 is obviously highly informative about ϕ, and its distribution does not depend on μ. This might be the best situation that we can hope for when conditioning on a low-dimensional summary statistic consisting of point estimates of parameters in likelihood-free inference. In contrast, even though y¯ is informative about μ, its distribution depends on ϕ.

2.2.2 A General Issue

The simple normal example clearly illustrates the potential inaccuracy of using marginal approximations based on summary statistics, and, thus, why the use of marginal approximations in ABC may produce inaccurate inferences. Therefore, it is helpful to give a more general formulation of the lessons of this example.

Consider that we wish to conduct inference on the unknown parameter θ=(ψ,λ) conditional on a vector of observed summary statistics Sy, which we partition as Sy=(S1,y,S2,y), and which has dimension at least as large as θ. Furthermore, consider the unrealistic scenario where the likelihood for the observed summary Sy can be factorized as (3) p(Sy|θ)p1(S1,y|ψ)p2(S2,y|S1,y,ψ,λ).(3)

The factorization in (3) allows us to write the joint posterior as π(θ|Sy)π(ψ)π(λ)p1(S1,y|ψ)p2(S2,y|S1,y,ψ,λ).

We define the “integrated likelihood” for ψ, evaluated at S2,y, and conditional on S1,y, as p2λ(S2,y|ψ,S1,y):=p2(S2,y|S1,y,ψ,λ)π(λ)dλ,where the notation p2λ(S2,y|ψ,S1,y) clarifies that we have integrated out λ from the likelihood contribution p2(S2,y|S1,y,ψ,λ) using the prior π(λ). The marginal posterior π(ψ|Sy) can then be written as π(ψ|Sy)=π(ψ)p1(S1,y|ψ)p2λ(S2,y|ψ,S1,y)/p1(S1,y|ψ)p2λ(S2,y|ψ,S1,y)π(ψ)dψ.

Even if (3) is valid, π(ψ|Sy) still depends on S2,y through p2λ(S2,y|ψ,S1,y). The only way that the posterior π(ψ|Sy) does not depend on S2,y is when, for the observed value Sy, the term p2λ(S2,y|ψ,S1,y) is a constant function of ψ. The latter requirement is essentially the requirement that the likelihood contribution p2(S2,y|S1,y,ψ,λ) is S-nonformative (short for non-informative), a concept due to Barndorff-Nielsen (Citation1976), when conducting inference on ψ.

Furthermore, the toy normal example makes clear that even in linear exponential families where the resulting decomposition in (3) is satisfied, the likelihood p2(S2,y|S1,y,ψ,λ) can depend on ψ. That is, while linear exponential families have minimally sufficient summaries, those same summaries are not generally S-nonformative when considering inference for a sub-vector of parameters, for example, ψ. Given this finding, there is no reason to suspect that marginal approximations by themselves will yield accurate partial posterior inference unless the summary statistics are chosen very carefully. The example in Appendix B of the supplementary material demonstrates how such a careful summary selection can lead to accurate marginal approximations. However, as we demonstrate in Section 4, finding summary statistics that satisfy the conditions outlined in this section is not an easy task.

3 Accurate Marginal Approximations via Localization

In Appendix C of the supplementary materials we show that ideal summary statistics for estimating the marginal partial posterior of θj are its corresponding marginal posterior moments. For example, suppose we are interested in estimating the one-dimensional parameter θj, and define an additional two-dimensional summary statistic vector S˜j=(E(θj|S),var(θj|S))S˜j. Conforming to our previous notation, we write S˜x,j and S˜y,j for the value of S˜j for simulated data x and observed data y, respectively. We show the posterior mean and variance of θj|S˜y,j is exactly the posterior mean and variance of θj|Sy. That is, we can match the mean and variance of the partial posterior with a much lower dimensional summary statistic, and this result extends to higher order moments. However, we also explain in Appendix C of the supplementary materials why it is practically difficult to approximate these statistics in the likelihood-free setting, especially for higher-order moments. Thus, as a more practical method, we propose the below localisation approach.

3.1 Localization Approach

We show how the issue described in Section 2.2.2 can be mitigated by first localizing the approximation using the full set of summary statistics S and then focusing on matching the summary subset Sj. More specifically, we consider the following approximate posterior for estimating the θj-marginal (4) πϵj(θ,Sx|Sy)π(θ)p(Sx|θ)I{ρ(Sy,Sx)ϵ0}·I{ρ(Sj,y,Sj,x)ϵj}.(4)

We abuse notation here and write ρ(·,·) for the metric on both full and reduced summary statistic spaces. Here, the role of ϵ0>ϵj is to provide an initial crude approximation of the posterior, so that the parameters are more tightly constrained compared to the prior. Note that we do not use the data twice; in the second constraint we simply impose a tighter restriction on Sj that is designed to be informative about θj. Our approach is related to other ABC methods which use a pilot run to truncate the prior such as those of Fearnhead and Prangle (Citation2012) and Blum and François (Citation2010). However, the key difference in our approach is that different summary statistics are used at the pilot and analysis stage.

We now provide some intuition on why this idea is helpful for marginal approximations. The joint selection condition is needed to ensure that values of θ are in the high density region for the approximation only if they produce reasonable global agreement for the entire vector of summaries; while the marginal selection condition for Sj allows us to focus on specific regions of the marginal parameter space where Sj,y is particularly informative about the unknown θj. However, in many cases the likelihood for the marginal summary statistic will be a complex function of all the parameters, and the joint selection step is critical as otherwise conducting inference using only the marginal selection condition could lead to diffuse posteriors, or, at worst, a complete lack of point identification. We later present examples of both types of behavior, which emphasizes the importance of including the joint selection condition when conducting marginal inferences.

To implement this in practice we use a pilot run of SMC ABC with the full set of statistics S for a relatively short time until the MCMC acceptance rate drops below some threshold much greater than 1%. Then we perform a second SMC ABC step initialized at the pilot approximation and aim to produce closer matches based on Sj (i.e., reduce ϵj) until the MCMC acceptance rate drops below 1%. Note that we also check in the second stage if a proposal satisfies the pilot constraint I{ρ(Sy,Sx)ϵ0}. If it does not, the proposal is rejected even if it matches the current constraint based on Sj. It is important that the acceptance rate threshold for the pilot run is set relatively high. First, we do not want to use much computation time on the pilot run. Second, in the continuation run, we want to reduce the tolerance associated with the marginal summary statistics greatly, and this will be difficult to do computationally if it is already difficult to match on the pilot tolerance. In our experiments, we use a pilot % stopping rule of 10%–30%, depending on the parameter dimensionality. Another possible stopping rule for the pilot run is when a certain number of model simulations have been exceeded, which may be a more explicit way to control the computational effort imposed in the pilot run. The second stage starts from the pilot result, and does not require many additional SMC iterations as it aims to match only on a low-dimensional statistic. The approach is summarized in Algorithm 1.

We note that there is no guarantee that our approach will improve on marginal ABC approximations using all the statistics. Ultimately, there is still more information contained in S about θj compared to Sj, and it will depend on how useful Sj is. Furthermore, the two stage approach is computationally partly hindered by having to match also on the pilot tolerance. However, in the examples of this article and the supplementary material, we obtain very good empirical performance in the large majority of cases.

Algorithm 1

SMC ABC approach for accurate estimation of π(θj|Sy).

Inputs: Acceptance rate thresholds for the pilot, p0,acc, and continued, pj,acc, SMC ABC runs. Note that p0,accpj,acc. Observed data y, prior distribution π(θ), discrepancy functions ρ(Sy,Sx) and ρ(Sj,y,Sj,x).

Outputs: Tolerances ϵ0 and ϵj. Samples {θ(i),ρj(i),Sj,x(i)}i=1N from the approximate posterior π(θ)p(Sx|θ)I{ρ(Sy,Sx)ϵ0}·I{ρ(Sj,y,Sj,x)ϵj}.

1: Run SMC ABC, reducing the tolerance with respect to the discrepancy function ρ(Sy,Sx) until the acceptance rate falls below p0,acc. This produces a sample {θ(i),ρ(i),Sx(i)}i=1N from πϵ0(θ,Sx|Sy) where ρ(i)=ρ(Sy,Sx(i)) and ϵ0max{ρ(i)}i=1N.

2: Compute ρj(i)=ρ(Sj,y,Sj,x(i)) for i=1,,N. This produces the initial value of ϵj=max{ρj(i)}i=1N. We can now think of {θ(i),ρ(i),Sx(i)}i=1N as an ABC sample from π(θ)p(Sx|θ)I{ρ(Sy,Sx)ϵ0}·I{ρ(Sj,y,Sj,x)ϵj}.

2: Continue running SMC ABC, reducing the tolerance ϵj with respect to the discrepancy function ρ(Sj,y,Sj,x), whilst still matching on I{ρ(Sy,Sx)ϵ0} for a fixed ϵ0, until the acceptance rate drops below pj,acc. This produces samples {θ(i),ρj(i),Sj,x(i)}i=1N based on the approximate posterior with kernel I{ρ(Sy,Sx)ϵ0}·I{ρ(Sj,y,Sj,x)ϵj}.

3.2 Interpretation in Terms of Logarithmic Pooling

The above approach has an interesting interpretation in terms of logarithmic pooling of posterior densities conditioned on summary statistic vectors Sy and Sj,y, respectively. For two densities f1(w) and f2(w) their log-pooled density with weight α, 0α1, is fp(w)=C(α)f1(w)αf2(w)1α,where C(α) is a normalizing constant, and we note that this definition can be extended to accommodate any finite number of densities (Genest, McConway, and Schervish Citation1986). Logarithmic pools and other related pooling methods are often used for constructing consensus priors when prior distributions from multiple individuals are available, in the literature on optimal (Bayesian) decision making (see Genest and Zidek Citation1986 for a review) and in the literature on combinations of forecast distributions (see Clements and Harvey Citation2011 for a review).Footnote2

A key feature of logarithmic pooling is that a high density value in the pooled density must necessarily have a high density value in all the densities being pooled. We now argue that that this feature can be used to obtain the joint matching condition in the SMC algorithm described above. Consider once again the ABC posterior density (1). Suppose we use the uniform kernel, so that Kϵ{ρ(Sy,Sx)}=I{ρ(Sy,Sx)ϵ}. Considering the summary statistic Sy, and denoting the tolerance in our approximation by ϵ0, the joint ABC density of parameter and summary is given by (5) πϵ0(θ,Sx|Sy)π(θ)p(Sx|θ)I{ρ(Sy,Sx)ϵ0}.(5)

Integrating out Sx in (5) gives the θ-marginal density given by (1). A similar ABC posterior density on θ and Sx conditional on Sj,y, with tolerance denoted ϵj is (6) πϵj(θ,Sx|Sy)π(θ)p(Sx|θ)I{ρ(Sj,y,Sj,x)ϵj}.(6)

It is immediate that for any 0<α<1, then for (5) and (6), their log-pooled density is (7) πp,ϵ(θ,Sx|Sy)π(θ)p(Sx|θ)I{ρ(Sy,Sx)ϵ0}·I{ρ(Sj,y,Sj,x)ϵj},(7) and the SMC-ABC algorithm described above targets sampling from the density (7), which involves the joint matching condition. Note that if we had integrated out Sx first in (5) and (6) and then pooled, this does not give the same result as pooling (5) and (6) and integrating out Sx afterwards. Based on the earlier interpretation of logarithmic pooling, a parameter value will only have high density under the pooled posterior if it has high density under the pilot ABC posterior and the ABC posterior where only Sj is matched.

The above makes it clear that, at least in the case of the uniform kernel, so long as α(0,1), the weights in the logarithmic pool do not influence the resulting posterior. The same result will not apply if one uses a bounded kernel function such as the triangular or Epanechnikov kernels. However, in the case of the commonly used Gaussian kernel, the effect of the pooling parameter α is just to change the effective kernel bandwidth parameters, since raising a Gaussian density to a power changes the scale, up to a normalizing factor. Hence, a joint matching condition for the Gaussian kernel also has an interpretation in terms of logarithmic pooling. While logarithmic pooling gives an interpretation of the posterior distribution sampled by Algorithm 1, for both uniform and Gaussian kernels, we stress that it is unnecessary to choose a pooling weight in both cases.

4 Examples

In the first two examples we consider only using a small number of summary statistics, so it is possible to use ABC to generate a gold standard approximation of the posterior distribution (and its corresponding marginals) conditional on these statistics. This is intentional so that we can properly assess the performance of the various marginal approximations. In the examples we also show the distribution of the simulated summary statistics for the accepted SMC-ABC posterior samples, so that we can compare how well the different approaches can match on individual summary statistics. For simplicity we refer to such distributions as the posterior distributions of the summaries. We provide two additional low-dimensional examples in Appendices E and F of the supplementary material, respectively. These examples qualitatively have similar results to the g-and-k example in Section 4.2. In the third example in this section we show that our approach can outperform SMC ABC with all summaries on a moderate dimensional example. All examples support our assertion that seemingly obvious choices of low-dimensional summary statistics do not necessarily lead to accurate marginal approximations without localization.

4.1 MA(2) Example

Consider data y=(y1,y2,,yn) generated from the following moving average model of order 2 (MA(2)) model (8) yt=et+θ1et1+θ2et2,(8) where, say, etN(0,1) and the unknown parameters θ=(θ1,θ2) are assumed to obey the invertibility region 2<θ1<2,θ1+θ2>1,θ1θ2<1. Our prior information on θ=(θ1,θ2) is uniform over the invertibility region. A useful choice of summary statistics for the MA(2) model are the sample autocovariances ηj(y)=1Tt=1+jTytytj, for j = 0, 1, 2. Then we have Sj=ηj1 for j = 1, 2, 3. The binding functions corresponding to these summary functions (expected values of the summary statistics as a function of θ) are given by: b1(θ)=1+θ12+θ22,b2(θ)=θ1+θ1θ2,b3(θ)=θ2.

Since θ1 only appears in b1(θ) and b2(θ), it is tempting to approximate the marginal for θ1 matching only S1 and S2. This turns out to be a poor choice generally as there are two solutions when we solve for θ1 and θ2 as a function of b1 and b2 in the binding function; this suggests that the posterior conditional on S1 and S2 is likely to concentrate around two values in the parameter space. Regarding estimation of the θ2-marginal, it may be sensible to match only on S3.

We consider a dataset of size n=10,000 generated with θ1=0.9 and θ2=0.05. When using only S1 and S2 as summary statistics, there is a second value of (θ1,θ2) matching the value of b1 and b2 obtained for the true parameter value, (θ1,θ2)(0.4860,0.7591), which is well separated from the true parameter. We consider six ABC approximations: (1) matching all three summary statistics S1S2, and S3, (2) matching only on S1 and S2, (3) matching only on S3, (4) a pilot ABC approximation matching S1S2, and S3, (5) continuing from the pilot approximation matching only S1 and S2 and (6) continuing from the pilot approximation matching only S3. We stop the pilot run when the MCMC acceptance rate drops below 30%.

The results are shown in . We first focus on estimating the θ1-marginal (left plot). It can be seen that the marginal approximation of θ1 is poor when conditioning on only S1 and S2. The posterior is multi-modal, as expected. The pilot run (shown as blue dash) produces a diffuse approximation but effectively eliminates the second mode. Then, continuing from the pilot run with S1 and S2 produces an accurate marginal approximation of θ1.

Fig. 2. Univariate ABC posterior approximations for θ1 from various approaches for the MA(2) example. Here Sj indicates which statistic is used in the corresponding ABC approximation (the solid black lines are based on all three summary statistics), “pilot” indicates the pilot run with all statistics, and “pilot + x” indicates the continuation of the pilot run with statistics x. The observed summary statistic is Sy(1.7999,0.8363,0.0658). The × symbol on the x-axis represents the true value of the parameter.

Fig. 2. Univariate ABC posterior approximations for θ1 from various approaches for the MA(2) example. Here Sj indicates which statistic is used in the corresponding ABC approximation (the solid black lines are based on all three summary statistics), “pilot” indicates the pilot run with all statistics, and “pilot + x” indicates the continuation of the pilot run with statistics x. The observed summary statistic is Sy≈(1.7999,0.8363,−0.0658)⊤. The × symbol on the x-axis represents the true value of the parameter.

Next we focus on the θ2-marginal (right plot). This time, the standard marginal approximation based on only S3 produces an accurate marginal approximation. S3 is a highly informative statistic for θ2, and the posterior for θ1 is close to the prior and shows little posterior dependence with θ2 (results not shown). As we see with the θ1 marginal results and the examples below, obtaining an accurate marginal approximation without the pilot is an exception rather than the rule. Here pilot +S3 gives a reasonable estimate of the θ2-marginal but it is not highly accurate. We suggest that this is a result of having to match on both the pilot and marginal tolerances, and so the pilot approximation may still have some influence. We will see in the below examples that continuing from a pilot ABC run is crucial to obtaining accurate marginal approximations.

As alluded to above, the MA(2) example clearly demonstrates the identification issues that can arise when doing marginal adjustments in an ad-hoc fashion. Namely, since the distribution of the summaries S1S2 results in a multi-modal likelihood function as a consequence of the multiplicity of roots discussed above, the information in these summaries is not enough by themselves to identify the true value of θ1. However, once we have adequately restricted the parameter space for θ2, which can be achieved through a pilot selection step based on all the summaries, the multiplicity of roots is alleviated, and the summaries S1S2 are highly informative for the true value of the unknown parameter θ1.

Since the parameter θ2 clearly influences the distribution of S1S2, there is no hope of obtaining marginal sufficiency for θ1, when we base our inference for θ1 on the summaries S1S2. Consequently, a joint selection step based on all the summaries will be required if we are to have any hope of identifying the unknown value of θ1.

In contrast, the asymptotic distribution of the third summary statistic is (in the limit) independent of θ1, and hence inference for θ2 based solely on S3 is likely to deliver reasonable results, as accords with the result in . In particular, we have that S3,x=n1etet2=θ2n1et22+θ1n1et1et2=θ2n1et22+op(1/n),where the op(1/n) comes about since et1 is independent of et2 for all t. Thus, for n large S3,x|θθ2nχ2(n), which has mean θ2, variance 2θ22/n, and does not depend on θ1.Footnote3 Hence, in large samples, p(S|θ)=p1(S3|θ1,θ2)p2(S1,S2|θ,S3)p1(S3|θ2)p2(S1,S2|θ,S3).

Therefore, if we conduct inference on θ2 using just S3 via the likelihood p1(S3|θ2), the results are likely to be quite accurate, especially in the case where there is little information about θ2 in the conditional distribution p2(S1,S2|θ,S3).

In general, the marginal posterior for θ2|S in this example is given by π(θ2|S)=π(θ2)p1(S3|θ2)p2θ1(S1,S2|θ2,S3)/π(θ2)p1(S3|θ2)p2θ1(S1,S2|θ2,S3)dθ2.

In the extremal case where p2θ1(S1,S2|θ,S3) is S-nonformative for θ2, the integrated likelihood is constant in θ2, and the posterior reduces to π(θ2|S3)π(θ2)p1(S3|θ2). The latter posterior is plotted in as the green dotted curve, which we see is virtually identical to the marginal posterior π(θ2|S) based on the full set of summaries.

4.2 Univariate g-and-k Example

The g-and-k distribution is a popular test example for ABC (e.g., Drovandi and Pettitt Citation2011b; Fearnhead and Prangle Citation2012), and is defined by its quantile function (9) Q(z(p)|θ)=a+b(1+ctanh[gz(p)/2])(1+z(p)2)kz(p),(9) where <a<, b > 0, <g<,k>0.5 and z(p)=Φ1(p) is the standard normal quantile function. The parameters θ=(a,b,g,k) control the location, scale, skewness and kurtosis, respectively. It is common practice to set c = 0.8. As with previous studies, we consider analyzing a simulated dataset of length n=10,000 with a = 3, b = 1, g = 2 and k = 0.5. Writing U(a,b) for the uniform distribution on [a,b], the prior on each component of θ is set as U(0,10) with independent components.

Drovandi and Pettitt (Citation2011b) consider using as summary statistics robust measures of location, scale, skewness, and kurtosis, S=(S1,S2,S3,S4) with S1=L2,S2=L3L1,S3=L3+L12L2S2,S4=E7E5+E3E1S2,where Li denotes the ith quartile and Ei denotes the ith octile. Given the natural interpretation of the g-and-k parameters, it might be intuitively appealing for each component of θ to use the corresponding robust summary statistic. That is, we use S1, S2, S3, and S4 for estimating the approximate posterior marginals of a, b, g, and k, respectively. For the pilot run we use an MCMC acceptance rate threshold of 20% (results are similar for thresholds of 15% and 10%, see Appendix D of the supplementary materials).

The results are shown in . (a) shows the univariate marginal parameter posterior approximations based on various approaches, whereas (b) shows the corresponding results for the four summary statistics. The solid red densities in (a) show the typical ABC approximation based on all the statistics S. Since there are only a small number of summary statistics, this approximation should be fairly close to π(θ|Sy). The purple dash results show the marginal approximations when only using the Sj statistic. Apart from the parameter k, these marginal approximations are substantially worse than the marginal approximations based on S, despite producing closer matches to Sj as demonstrated in (b). The pilot approximation results are shown in green dash. Our approach using the pilot run as the initial approximation followed by close matching with Sj produces the results shown as blue dash with univariate posterior approximations in agreement with the gold standard typical ABC approximation. (b) shows that this new approach results in very close matches with the Sj statistics.

Fig. 3. Univariate ABC posterior densities for the (a) parameters and (b) summary statistics obtained from various approaches for the g-and-k example. Here “all summs” means ABC with all four summary statistics, “pilot (all summs)” means ABC with all four summary statistics but using a relatively large tolerance, “summ” means ABC using the summary statistic relevant for its corresponding parameter and “pilot + summ” means ABC using the summary statistic relevant for its corresponding parameter but also satisfying the tolerance from “pilot (all summs)”. The dot on the x-axis shows the true parameter value in (a) and the observed summary statistic in (b).

Fig. 3. Univariate ABC posterior densities for the (a) parameters and (b) summary statistics obtained from various approaches for the g-and-k example. Here “all summs” means ABC with all four summary statistics, “pilot (all summs)” means ABC with all four summary statistics but using a relatively large tolerance, “summ” means ABC using the summary statistic relevant for its corresponding parameter and “pilot + summ” means ABC using the summary statistic relevant for its corresponding parameter but also satisfying the tolerance from “pilot (all summs)”. The dot on the x-axis shows the true parameter value in (a) and the observed summary statistic in (b).

One might expect the posterior of a given S1 to be accurate since the median of the g-and-k distribution is exactly a. Yet (a) reveals a poor approximation. We demonstrate in Appendix D of the supplementary material that the poor approximation relies on the fact that the distribution of S1 also depends on b, and that there is dependence between a and b in the posterior given S1.

4.3 Sparse Vector Autoregression Example

Here we consider the sparse vector autoregression example proposed in Thomas et al. (Citation2020). The model is given by yt=Xyt1+ηt,where ytRk is the k-dimensional observation of the time series at time t, XRk×k is the transition matrix and ηtN(0,σ2I) is a k-dimensional noise vector with σ2 being the noise parameter. Thomas et al. (Citation2020) consider a sparse matrix X where each variable is coupled with only one other variable, so that the only off-diagonal entries that are nonzero are the ones associated with these couplings. That is, if variable i is coupled with variable j, ij, then Xi,j0 and we also set Xi,jXj,i. We set the diagonal elements of X to be –0.1, which seems to ensure stability of the simulated vector autoregression.

The parameters are the nonzero off-diagonal terms of X and σ. Here we consider k = 20, which leads to p = 21 parameters, producing a moderate-dimensional example. We generate observed data with σ=0.1, and we assign a prior σU(0,1). For the elements of X, we draw the couplings randomly and simulate the true value of each non-diagonal entry of X from a U(1,1) distribution, which we also set as the prior for these parameters. We choose uniform priors for illustrative purposes; typically a prior on σ would not be bounded from above. We simulate n = 500 observations from the k-dimensional time series model. For the summary statistic corresponding to parameter Xi,j0 we use the lag 1 autocovariance, 1nt=2nytiyt1j where yti is the tth observation of the ith time series. To inform σ we use the sample standard deviation of the k times series concatenated together. Thus, for each parameter, there is a corresponding informative summary Sj.

Here we run SMC ABC with the k = 21 summaries, SMC ABC with each individual Sj summary and also our two stage procedure with each Sj used in the second stage and based on a pilot run with all summaries using a 10% acceptance rate as the stopping rule. The results are shown in . It is evident that running ABC to estimate each marginal posterior using its corresponding informative summary does not lead to accurate inferences. Unlike the previous low-dimensional examples, our two stage approach leads to more accurate inferences than standard SMC ABC for several of the parameters, and comparable inference for the other parameters.

Fig. 4. Univariate posterior approximations for the sparse vector autoregession example. The first four rows correspond to the parameters in X while the bottom plot corresponds to σ. Here “all summs” means ABC with all summary statistics, “pilot (all summs)” means ABC with all summary statistics but using a relatively large tolerance, “S” means ABC using the summary statistic relevant for its corresponding parameter and “pilot + S” means ABC using the summary statistic relevant for its corresponding parameter but also satisfying the tolerance from “pilot (all summs).” The true parameter value is shown as a cross on the x-axis.

Fig. 4. Univariate posterior approximations for the sparse vector autoregession example. The first four rows correspond to the parameters in X while the bottom plot corresponds to σ. Here “all summs” means ABC with all summary statistics, “pilot (all summs)” means ABC with all summary statistics but using a relatively large tolerance, “S” means ABC using the summary statistic relevant for its corresponding parameter and “pilot + S” means ABC using the summary statistic relevant for its corresponding parameter but also satisfying the tolerance from “pilot (all summs).” The true parameter value is shown as a cross on the x-axis.

5 Discussion

In this article we have presented a new approach for improving the accuracy of marginal approximations in likelihood-free inference when conditioning on a low-dimensional summary statistic. We showed through several examples and sufficiency arguments that marginal approximations can be poor unless summary statistics are chosen very carefully, even if the summaries are highly informative about the marginal posterior location. Our analysis demonstrates that, even if we have a vector of sufficient summaries, conducting ABC-based inference on a single (marginal) parameter using a well-intentioned subset of the summaries will not necessarily deliver inferences that agree with the partial posterior for the same (marginal) parameter based on the entire vector of summaries, even as ϵ0.

We have described some ideal summary statistics which are low-dimensional and which can achieve accurate estimation of marginal posterior means and variances given the full set of summary statistics S as the ABC tolerance goes to zero. Given that such statistics may be difficult to approximate well, in practice, we find that joint matching on the full summary statistics with a loose tolerance and low-dimensional summary statistics Sj informative about marginal posterior location with a much more stringent tolerance results in greatly improved marginal posterior estimation compared to matching using Sj alone. The joint matching criterion can be motivated from the point of view of logarithmic pooling of ABC approximations based on the full posterior and the summary statistics Sj.

Algorithmically, our approach used a crude pilot run of likelihood-free inference based on all the summary statistics in order to help identify the parameters, before honing in on the low-dimensional statistic intended to be informative for its corresponding low-dimensional parameter in a second stage. Our approach should combine well with the popular summary statistic selection approach of Fearnhead and Prangle (Citation2012), which forms summary statistics by estimating the posterior means via regressions. Fearnhead and Prangle (Citation2012) recommend performing a pilot run of ABC first in order to improve the fit of the regressions, which ultimately improves the derived summary statistics. This pilot run can be exploited in our approach, which also requires a pilot run, and then the individual posterior mean estimates from the regression can be use as low-dimensional summaries for the corresponding low-dimensional parameters.

Our article does not rigorously answer the question of how long the pilot ABC should be run. In general, we suggest to run the pilot ABC until the acceptance rate starts to dramatically drop. We want to perform the pilot run to eliminate very poor parts of the parameter space, but we do not want to run it so long that it becomes computationally difficult to satisfy the pilot discrepancy in the subsequent ABC run that focuses on matching on a low-dimensional summary statistic.

In this article, several examples used a small number of summary statistics so that we could obtain a gold standard marginal approximation to compare our method against. These examples were sufficient to illustrate the concepts and ideas of the article. We also illustrated the benefits of our approach in a moderate dimensional example. However, we suggest that the principles of this article can be extended to high-dimensional likelihood-free problems, and we leave this for future work. In truly high-dimensional problems, it may not suffice to crudely localize on the full summary statistic S: instead, and following the discussion in Appendix C of the supplementary materials, methods will be needed to define subsets of S which determine our idealized posterior mean and variance statistics for each marginal for use in the pilot run. These subsets do not need to be very low-dimensional, but should not be so high-dimensional that ABC methods are infeasible. Furthermore, the ideas of our article could be combined with machine learning approaches to likelihood-free inference (e.g., Cranmer, Pavez, and Louppe Citation2015), which are generally more scalable.

We also leave extensions to the challenging problem of misspecification in the likelihood-free setting (e.g., Frazier, Robert, and Rousseau Citation2020) for future research. We believe there is a role to play for modular inference ideas in likelihood-free inference (e.g., Chakraborty et al. Citation2023). In this case, if marginal inference was desired on some subset of the parameters ϕ say, and it was felt that some components of the full summary statistic vector provided misleading information about that parameter, then these components of the full summary statistic vector would need to be excluded from the pilot run. There are diagnostics in the ABC literature for highlighting which summaries cannot be matched (e.g., Frazier et al. Citation2018); when all summaries cannot be matched, one may need to choose which summaries to match, or try to exclude only a sparse set of summaries. What option to choose depends on the problem at hand and what the model will be used for.

Supplemental material

Supplemental Material

Download PDF (450.4 KB)

Acknowledgments

The authors are grateful to three anonymous referees whose comments led to improvements in this article. CD is affiliated with the QUT Centre for Data Science. DJN is affiliated with the NUS Institute of Operations Research and Analytics, National University of Singapore.

Disclosure Statement

The authors report there are no competing interests to declare.

Funding

CD gratefully acknowledges support from the Australian Research Council Future Fellowship Award (FT210100260). DTF gratefully acknowledges support by the Australian Research Council through grant DE200101070.

Notes

1 Since most of the marginal posteriors do not have closed-form distribution, we normalize the densities using trapezoidal numerical integration for convenience, except for π(ϕ|s2) which has an inverse-gamma distribution.

2 In general, the weights for the logarithmic pool must be specified. This can be done in several ways, with the most common way being to obtain point estimates of the weights using data (Poole and Raftery 2000). Alternatively, in certain settings, a prior distribution over the weights can be specified and a posterior distribution for the weights obtained via Bayes’ theorem (see, e.g., Carvalho et al. 2022).

3 From this result we also see that, for n large, n{S3,xθ2}|θN(0,θ22) and the asymptotic distribution of the scaled and centered statistic only depends on θ2.

References

  • Barber, S., Voss, J., and Webster, M. (2015), “The Rate of Convergence for Approximate Bayesian Computation,” Electronic Journal of Statistics, 9, 80–105. DOI: 10.1214/15-EJS988.
  • Barndorff-Nielsen, O. (1976), “Nonformation,” Biometrika, 63, 567–571. DOI: 10.1093/biomet/63.3.567.
  • Blum, M. G. B. (2010), “Approximate Bayesian Computation: A Non-parametric Perspective,” Journal of the American Statistical Association, 105, 1178–1187. DOI: 10.1198/jasa.2010.tm09448.
  • Blum, M. G. B., and François, O. (2010), “Non-linear Regression Models for Approximate Bayesian Computation,” Statistics and Computing, 20, 63–73. DOI: 10.1007/s11222-009-9116-0.
  • Carvalho, L. M., Villela, D. A., Coelho, F. C., and Bastos, L. S. (2022), “Bayesian Inference for the Weights in Logarithmic Pooling,” Bayesian Analysis, 1, 1–29. DOI: 10.1214/22-BA1311.
  • Chakraborty, A., Nott, D. J., Drovandi, C. C., Frazier, D. T., and Sisson, S. A. (2023), “Modularized Bayesian Analyses and Cutting Feedback in Likelihood-Free Inference,” Statistics and Computing, 33, 33. DOI: 10.1007/s11222-023-10207-5.
  • Chen, Y., and Gutmann, M. U. (2019), “Adaptive Gaussian Copula ABC,” in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, Volume 89 of Proceedings of Machine Learning Research, eds. K. Chaudhuri and M. Sugiyama, pp. 1584–1592. PMLR.
  • Clements, M. P., and Harvey, D. I. (2011), “Combining Probability Forecasts,” International Journal of Forecasting, 27, 208–223. DOI: 10.1016/j.ijforecast.2009.12.016.
  • Cranmer, K., Pavez, J., and Louppe, G. (2015), “Approximating Likelihood Ratios with Calibrated Discriminative Classifiers,” arXiv:1506.02169.
  • Drovandi, C. C., and Pettitt, A. N. (2011a), “Estimation of Parameters for Macroparasite Population Evolution Using Approximate Bayesian Computation,” Biometrics, 67, 225–233. DOI: 10.1111/j.1541-0420.2010.01410.x.
  • Drovandi, C. C., and Pettitt, A. N. (2011b), “Likelihood-Free Bayesian Estimation of Multivariate Quantile Distributions,” Computational Statistics and Data Analysis, 55, 2541–2556. DOI: 10.1016/j.csda.2011.03.019.
  • Fearnhead, P., and Prangle, D. (2012), “Constructing Summary Statistics for Approximate Bayesian Computation: Semi-Automatic Approximate Bayesian Computation,” Journal of the Royal Statistical Society, Series B, 74, 419–474. DOI: 10.1111/j.1467-9868.2011.01010.x.
  • Frazier, D., Nott, D. J., Drovandi, C., and Kohn, R. (2022), “Bayesian Inference Using Synthetic Likelihood: Asymptotics and Adjustments,” Journal of the American Statistical Association (to appear), DOI: 10.1080/01621459.2022.2086132.
  • Frazier, D. T., Martin, G. M., Robert, C. P., and Rousseau, J. (2018), “Asymptotic Properties of Approximate Bayesian Computation,” Biometrika, 105, 593–607. DOI: 10.1093/biomet/asy027.
  • Frazier, D. T., Robert, C. P., and Rousseau, J. (2020), “Model Misspecification in Approximate Bayesian Computation: Consequences and Diagnostics,” Journal of the Royal Statistical Society, Series B, 82, 421–444. DOI: 10.1111/rssb.12356.
  • Genest, C., McConway, K. J., and Schervish, M. J. (1986), “Characterization of Externally Bayesian Pooling Operators,” The Annals of Statistics, 14, 487–501. DOI: 10.1214/aos/1176349934.
  • Genest, C., and Zidek, J. V. (1986), “Combining Probability Distributions: A Critique and an Annotated Bibliography,” Statistical Science, 1, 114–135. DOI: 10.1214/ss/1177013825.
  • Li, J., Nott, D., Fan, Y., and Sisson, S. (2017), “Extending Approximate Bayesian Computation Methods to High Dimensions via a Gaussian Copula Model,” Computational Statistics & Data Analysis, 106, 77–89. DOI: 10.1016/j.csda.2016.07.005.
  • Li, W., and Fearnhead, P. (2018a), “Convergence of Regression-Adjusted Approximate Bayesian Computation,” Biometrika, 105, 301–318. DOI: 10.1093/biomet/asx081.
  • Li, W., and Fearnhead, P. (2018b), “On the Asymptotic Efficiency of Approximate Bayesian Computation Estimators,” Biometrika, 105, 285–299. DOI: 10.1093/biomet/asx078.
  • Nott, D. J., Fan, Y., Marshall, L., and Sisson, S. (2014), “Approximate Bayesian Computation and Bayes Linear Analysis: Toward High-Dimensional ABC,” Journal of Computational and Graphical Statistics, 23, 65–86. DOI: 10.1080/10618600.2012.751874.
  • Poole, D., and Raftery, A. E. (2000), “Inference for Deterministic Simulation Models: The Bayesian Melding Approach,” Journal of the American Statistical Association, 95, 1244–1255. DOI: 10.1080/01621459.2000.10474324.
  • Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J. (2018), “Bayesian Synthetic Likelihood,” Journal of Computational and Graphical Statistics, 27, 1–11. DOI: 10.1080/10618600.2017.1302882.
  • Priddle, J. W., Sisson, S. A., Frazier, D. T., and Drovandi, C. (2022), “Efficient Bayesian Synthetic Likelihood with Whitening Transformations,” Journal of Computational and Graphical Statistics, 31, 50–63. DOI: 10.1080/10618600.2021.1979012.
  • Sisson, S., and Fan, Y. (2018), “ABC Samplers,” in Handbook of Approximate Bayesian Computation, eds. S. A. Sisson, Y. Fan, and M. Beaumont, pp. 87–123, New York: Chapman and Hall/CRC.
  • Sisson, S. A., Fan, Y., and Beaumont, M. (2018), Handbook of Approximate Bayesian Computation (1st ed.). Boca Raton, FL: Chapman and Hall/CRC.
  • Thomas, O., Corander, J., Sá-Leão, R., de Lencastre, H., Kaski, S., and Pesonen, H. (2020), “Split-BOLFI for Misspecification-Robust Likelihood Free Inference in High Dimensions,” arXiv preprint arXiv:2002.09377v1.
  • Wood, S. N. (2010), “Statistical Inference for Noisy Nonlinear Ecological Dynamic Systems,” Nature, 466, 1102–1107. DOI: 10.1038/nature09319.
  • Zhu, Y., and Reid, N. (1994), “Information, Ancillarity, and Sufficiency in the Presence of Nuisance Parameters,” Canadian Journal of Statistics, 22, 111–123. DOI: 10.2307/3315827.