393
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The Block-Correlated Pseudo Marginal Sampler for State Space Models

, &

Abstract

Pseudo Marginal Metropolis-Hastings (PMMH) is a general approach to Bayesian inference when the likelihood is intractable, but can be estimated unbiasedly. Our article develops an efficient PMMH method that scales up better to higher dimensional state vectors than previous approaches. The improvement is achieved by the following innovations. First, a novel block version of PMMH that works with multiple particle filters is proposed. Second, the trimmed mean of the unbiased likelihood estimates of the multiple particle filters is used. Third, the article develops an efficient auxiliary disturbance particle filter, which is necessary when the bootstrap disturbance filter is inefficient, but the state transition density cannot be expressed in closed form. Fourth, a novel sorting algorithm, which is as effective as previous approaches but significantly faster than them, is developed to preserve the correlation between the logs of the likelihood estimates at the current and proposed parameter values. The performance of the sampler is investigated empirically by applying it to nonlinear Dynamic Stochastic General Equilibrium models with relatively high state dimensions and with intractable state transition densities and to multivariate GARCH diffusion-driven volatility in the mean models. Although we only apply the method to state space models, the approach will be useful in a wide range of applications such as large panel data models and stochastic differential equation models with mixed effects.

1 Introduction

Pseudo marginal Metropolis-Hastings (PMMH) (Andrieu and Roberts Citation2009) is a Bayesian inference method for statistical models having an intractable likelihood, when the likelihood can be estimated unbiasedly. Our article develops a PMMH sampling method for efficiently estimating the parameters of complex state space models. The method scales better than current approaches in the number of observations and latent states and can handle state transition densities that cannot be expressed in closed form; for example, many dynamic stochastic general equilibrium (DSGE) models, which are a popular class of macroeconomic time series state space models, do not have closed form transition densities when second order expansions are used.

A key issue in efficiently estimating statistical models using a PMMH approach is that the variance of the log of the estimated likelihood grows with the number of observations and the dimension of the latent states (Deligiannidis, Doucet, and Pitt Citation2018). Pitt et al. (Citation2012) show that to obtain a balance between computational time and the mixing of the Markov chain Monte Carlo (MCMC) chain, the number of particles used in the particle filter should be such that the variance of the log of the estimated likelihood is in the range 1–3, depending on the efficiency of the proposal for θ. Pitt et al. (Citation2012) also show that the efficiency of PMMH schemes deteriorates exponentially as that variance increases; we further note that in many complex statistical applications, it is computationally very expensive ensuring that the variance of the log of the estimated likelihood is within the required range.

Deligiannidis, Doucet, and Pitt (Citation2018) propose a more efficient PMMH scheme, called the correlated pseudo-marginal (CPM) method, which correlates the random numbers in the (single) particle filters used to construct the estimated likelihoods at the current and proposed values of the parameters. This dependence induces a positive correlation between the estimated likelihoods and reduces the variance of the difference in the logs of the estimated likelihoods which appear in the Metropolis-Hastings (MH) acceptance ratio. They show that the CPM scales up better with the number of observations compared to the standard pseudo marginal method of Andrieu, Doucet, and Holenstein (Citation2010) when the state dimension is small.

Tran et al. (Citation2016) propose an alternative correlated pseudo marginal approach to that of Deligiannidis, Doucet, and Pitt, calling it the block pseudo marginal (BPM) method; the BPM divides the random numbers used in constructing likelihood estimators into blocks and then updates the parameters jointly with one randomly chosen block of the random numbers in each MCMC iteration; this induces a positive correlation between the numerator and denominator of the MH acceptance ratio, similarly to the CPM. They show that for large samples the correlation of the logs of the estimated likelihoods at the current and proposed values is close to 11/S, where S is the number of blocks. However, they do not apply the BPM method for estimating time series state space models using the particle filter.

Our article introduces a new PMMH sampler, referred to as the multiple PMMH algorithm (MPM), which extends the CPM method of Deligiannidis, Doucet, and Pitt (Citation2018) and the BPM method of Tran et al. (Citation2016). The MPM sampler is innovative and addresses the inherent estimation issues as follows: (a) The likelihood estimator is a trimmed mean of unbiased likelihood estimators from multiple independent particle filters (PFs); these PFs can be run in parallel. This approach reduces the variance of the likelihood estimator compared to using a single particle filter. The algorithm is exact when there is no trimming and approximate otherwise, but our empirical results suggest that the approximation is very close to being exact. (b) The unknown parameters, and only the random numbers used in one of the PFs, are updated jointly. This is a novel block version of PMMH that works with multiple particle filters; see Section 2.2 for details. (c) An auxiliary disturbance PF (ADPF) algorithm is proposed to estimate the likelihood efficiently for state space models such as DSGE models, where the bootstrap filter is very inefficient and the state transition density does not have a closed form so that an auxiliary PF cannot be constructed; see Section 2.3 for details. (d) A novel sorting algorithm, which is as effective as previous approaches but is significantly faster than them, is proposed to maintain the correlation between the logs of the likelihood estimates at the current and proposed values. The standard resampling step in the particle filter introduces discontinuities that break down the correlation between the logs of the likelihood estimates at the current and proposed values, even when the current and proposed parameters are close. Section 3.2 shows that the MPM sampler with the proposed sorting algorithm can maintain the correlation between the logs of the estimated likelihoods for relatively high dimensional state space models. Note that the proposed PMMH sampler works with any particle filter algorithm, including the tempered particle filter of Herbst and Schorfheide (Citation2019), the standard bootstrap filter of Gordon, Salmond, and Smith (Citation1993), the disturbance particle filter of Murray, Jones, and Parslow (Citation2013), and the proposed auxiliary disturbance filter. When the number of state dimensions is greater than the number of disturbance dimensions, a disturbance particle filter is more efficient than a state-based particle filter. Additionally, the disturbance particle filter is useful for state space models with intractable state transition densities.

We illustrate the MPM sampler empirically and compare its performance to the CPM method, using a standard linear Gaussian state space model, a multivariate GARCH diffusion-driven volatility in the mean model and two nonlinear Dynamic Stochastic General Equilibrium (DSGE) models, using simulated and real data. Our work in estimating nonlinear DSGE models is also related to Fernández-Villaverde and Rubio-Ramírez (Citation2007) and Hall, Pitt, and Kohn (Citation2014) who use standard PMMH methods. We note that the MPM sampler will also be useful for other complex statistical models, such as panel data models and stochastic differential equation mixed effects models. The rest of the article is organized as follows. Section 2 introduces the state space model and discusses the MPM sampler. Section 3 presents results from both simulated and real data. Section 4 concludes with a discussion of our major results and findings. The article has an online supplement containing some further technical and empirical results. We have also made available the Matlab code to run all the empirical examples at https://github.com/davidgun1000/Block-Correlated-Pseudo-Marginal.

2 Bayesian Inference for State Space Models

Notation

We use the colon notation for collections of variables, that is, atr:s:=(atr,,ats) and for tu, at:ur:s:=(atr:s,,aur:s). Suppose {(Zt,Yt),t0} is a stochastic process, with parameter θ, where the Yt are the observations and the Zt are the latent state vectors; random variables are denoted by capital letters and their realizations by lower case letters. We consider the state space model with p(z0|θ) the density of Z0, p(zt|zt1,θ) the density of Zt given Z0:t1=z0:t1 for t1, and p(yt|zt,θ) is the density of Yt given Z0:t=z0:t, Y1:t1=y1:t1.

2.1 Bayesian Inference

The aim of Bayesian inference is to obtain the posterior distribution of the model parameters θ and the latent states z0:T, given the observations y1:T and a prior distribution p(θ); that is, (1) p(θ,z0:T|y1:T)=p(y1:T|θ,z0:T)p(z0:T|θ)p(θ)/p(y1:T),(1) where p(y1:T)=p(y1:T|θ,z0:T)p(z0:T|θ)p(θ)dz0:Tdθ is the marginal likelihood. The likelihood, p(y1:T|θ)=p(y1:T|θ,z0:T)p(z0:T|θ)dz0:T, can be calculated exactly using the Kalman filter for linear Gaussian state space models (LGSS) and for linear DSGE models so that posterior samples can be obtained using an MCMC sampling scheme. However, the likelihood can be estimated unbiasedly, but not computed exactly, for nonlinear and non-Gaussian state space models and the nonlinear DSGE models described in Section 3.4.

The bootstrap particle filter (Gordon, Salmond, and Smith Citation1993) provides an unbiased estimator of the likelihood for a general state space model. Andrieu and Roberts (Citation2009) and Andrieu, Doucet, and Holenstein (Citation2010) show that it is possible to use this unbiased estimator of the likelihood to carry out exact Bayesian inference for the model parameters. They call this MCMC approach pseudo marginal Metropolis-Hastings (PMMH).

Algorithm 1

The Multiple PMMH (MPM) algorithm

  • Set the initial values of θ(0) arbitrarily.

  • Sample usN(0,I) for s=1,,S, and run S (disturbance) particle filters (Algorithm 2 in Section 2.3) to estimate the likelihood p̂¯N(y|θ,u˜) as the trimmed mean of p̂N(y|θ,us),s=1,,S; a 0% is the mean and a 50% trimmed mean is the median.

  • For each MCMC iteration p, for p=1,,P,

    • Sample θ from the proposal density q(θ|θ).

    • Choose index s with probability 1/S, sample ηuN(0,I), and set us=ρuus+1ρu2ηu.

    • Run S (disturbance) particle filters (Algorithm 2 in Section 2.3) to compute the estimate of likelihood p̂¯N(y|θ,u˜).

    • With the probability in (4), set p̂¯(y1:T|θ,u˜)(p)=p̂¯(y1:T|θ,u˜), u˜(p)=u˜, and θ(p)=θ; otherwise, set p̂¯(y1:T|θ,u˜)(p)=p̂¯(y1:T|θ,u˜)(p1), u˜(p)=u˜(p1), and θ(p)=θ(p1).

The nonlinear (second-order) DSGE models considered in Section 3.4 belong to the class of general nonlinear state space models whose state transition density is difficult to work with or cannot be expressed in closed form. In such cases, it is useful to express the model in terms of the density of its latent disturbance variables as it can be expressed in closed form. The posterior in (1) becomes (2) p(θ,ϵ1:T,z0|y1:T)t=1Tp(yt|ϵ1:t,θ,z0)p(ϵt)p(z0|θ)p(θ),(2) which Murray, Jones, and Parslow (Citation2013) call the disturbance state-space model. The standard state space model can be recovered from the disturbance state space model by using the deterministic function F(zt1,ϵt;θ)zt. This gives us a state trajectory z0:T from any sample (ϵ1:T,z0). In the disturbance state-space model the target becomes the posterior distribution over the parameters θ, the latent noise variables ϵ1:T and the initial state z0, rather than (θ,z0:T).

2.2 Multiple PMMH (MPM)

This section discusses algorithm 1, which is the proposed multiple PMMH (MPM) method that uses multiple particle filters to obtain the estimate of the likelihood. The novel version of the block-correlated PMMH that works with multiple particle filters to induce a high correlation between successive logs of the estimated likelihoods is also discussed. Section S1 of the online supplement provides a review of the standard pseudo marginal Metropolis-Hastings method of Andrieu and Roberts (Citation2009). Our article focuses on MCMC based on the parameters, but it is straightforward to also use it to obtain posterior inference for the unobserved states; see, for example, Andrieu, Doucet, and Holenstein (Citation2010).

Let Us consists of all the random variables required to compute the unbiased likelihood estimate p̂N(y|θ,us) for the sth particle filter, with p(us) the density of us . We now define the joint target density of θ and u˜=(u1,,uS) as (3) p(θ,u˜|y1:T)p̂¯N(y|θ,u˜)p(θ)s=1Sp(us),(3) where p̂¯N(y|θ,u˜) is the likelihood estimates obtained from S particle filters discussed below and p(θ) is the prior of θ. We update the parameters θ jointly with a randomly selected block us in each MCMC iteration, with Pr(S=s)=1/S for any s=1,,S. The selected block us is updated using us=ρuus+1ρu2ηu, where ρu is the nonnegative correlation between the random numbers us and us and ηu is a standard normal vector of the same length as us . This is similar to component-wise MCMC whose convergence is well established in the literature; see, for example, Johnson, Jones, and Neath (Citation2013). Using this scheme, the acceptance probability is (4) α(θ,u˜;θ,u˜)=min(1,p̂¯N(y|θ,u˜=(u1,,us1,us,us+1,,uS))    p(θ)q(θ|θ)p̂¯N(y|θ,u˜=(u1,,us1,us,us+1,,uS))    p(θ)q(θ|θ)).(4)

Suppose that S particle filters are run in parallel. It is clear that the average of the S unbiased likelihood estimates from the particle filters, p̂¯N(y|θ,u˜)=s=1Sp̂N(y|θ,us)/S, is also an unbiased estimate of the likelihood (Sherlock, Thiery, and Lee Citation2017). We now discuss a trimmed mean approach to obtain a likelihood estimate from multiple particle filters, which we then use in the MPM algorithm. For example, the 20% trimmed mean averages the middle 60% of the likelihood values. Although the trimmed mean does not give an unbiased estimate of the likelihood, we show in Section 3 that the posterior based on the trimmed means is a very good approximation to the exact posterior.

Algorithm 2

The Auxiliary Disturbance Particle Filter

Input: uϵ,1:T1:N, uA,1:t11:N, θ and N

Output: ϵ1:T1:N, z1:T1:N, a1:T11:N, and w¯1:T1:N

For t = 1

  • (1a) Generate ϵ1i from m(ϵ1i|uϵ,1i,θ) and set z1i=F(z0i,ϵ1i;θ) for i=1,,N.

  • (1b) Compute the unnormalized weights w1i, for i=1,,N.

  • (1c) Compute the normalized weights, w¯1i=w1ij=1Nw1j for i=1,,N.

For t2

  • (2a) Sort the state particles zt1i or disturbance particles ϵt1i using the multidimensional Euclidean sorting method in Section 2.4 and obtain the sorted index ζi for i=1,,N, and the sorted state particles, disturbance particles, and weights z˜t1i=zt1ζi, ϵ˜t1i=ϵt1ζi and w¯˜ti=w¯t1ζi, for i=1,,N.

  • (2b) Obtain the ancestor indices based on the sorted state particles or disturbance particles a˜t11:N using the correlated multinomial resampling in Algorithm 3.

  • (2c) Obtain the ancestor indices based on original order of the particles at1i, for i=1,,N.

  • (2d) Generate ϵti from m(ϵti|uϵ,ti,θ) and set zti=F(zt1at1i,ϵti;θ), for i=1,,N.

  • (2e) Compute the unnormalized weights wti, for i=1,,N.

  • (2f) Compute the normalized weights w¯ti=wtij=1Nwtj for i=1,,N.

2.3 The Auxiliary Disturbance Particle Filter

This section discusses the auxiliary disturbance particle filter (ADPF) that we use to estimate the likelihoods in the MPM sampler described in Section 2.2. It is particularly useful for state space models when the state dimension is substantially bigger than the disturbance dimension, and the state transition density is intractable. Suppose that z1=F(z0,ϵ1;θ), where z0 is the initial state vector with density p(z0|θ), and zt=F(zt1,ϵt;θ)(t2), where ϵt is a vector of latent noise variables with density p(ϵt). Murray, Jones, and Parslow (Citation2013) express the standard state-space model in terms of the latent noise variables ϵ1:T, and call ϵtp(ϵt),yt|ϵ1:tp(yt|ϵ1:t,z0,θ)=p(yt|zt,θ),     t=1,,T, the disturbance state-space model. We note that the conditional distribution of yt depends on all the latent error variables, ϵ1:t.

Let u be the value of the random variable used to obtain the unbiased estimate of the likelihood. It has two components uϵ,1:T1:N and uA,1:T11:N; uϵ,ti is the random variable used to generate the particles ϵti given θ. We can write (5) ϵ1im(ϵ1i|uϵ,1i,θ),z1i=F(z0i,ϵ1i;θ) andϵtim(ϵti|uϵ,ti,θ),zti=F(zt1at1i,ϵti;θ),t2,(5) where m(ϵti|uϵ,ti,θ) is the proposal density to generate ϵti, and z0 is the initial state vector with density p(z0|θ). Denote the distribution of uϵ,ti as ψϵt(·). For t2, let uA,t1 be the vector of random variables used to generate the ancestor indices at11:N using the resampling scheme M(at11:N|w¯t11:N,zt11:N) and define ψAt1(·) as the distribution of uA,t1. Common choices for ψϵt(·) and ψAt1(·) are iid N(0,1) and iid U(0,1) random variables, respectively. The argument at11:N means that Zt1At1i=zt1at1i is the ancestor of Zti=zti.

Algorithm 2 takes the number of particles N, the parameters θ, the random variables used to generate the disturbance particles uϵ,1:T1:N, and the random variables used in the resampling steps uA,t11:N as the inputs; it outputs the set of state particles z1:T1:N, disturbance particles ϵ1:T1:N, ancestor indices a1:T11:N, and the normalized weights w¯1:T1:N. At t = 1, the disturbance particles ϵ11:N are obtained as a function of the random numbers uϵ,11:N using (5) in step (1a) and the state particles are obtained from z1i=F(z0i,ϵ1i;θ), for i=1,,N; the weights for all particles are then computed in steps (1b) and (1c).

Step (2a) sorts the state or disturbance particles from smallest to largest using the multidimensional Euclidean sorting procedure described in Section 2.4 to obtain the sorted disturbance particles, sorted state particles and weights. Algorithm 3 resamples the particles using the correlated multinomial resampling to obtain the sorted ancestor indices a˜t11:N. Step (2d) generates the disturbance particles ϵt1:N as a function of the random numbers uϵ,t1:N using (5) and the state particles are obtained from zti=F(zt1at1i,ϵti;θ), for i=1,,N; we then compute the weights for all particles in steps (2e) and (2f). The disturbance particle filter provides an unbiased estimate of the likelihood p̂N(y|θ,u):=t=1T(N1i=1Nwti), where wti=p(yt|zti,θ)p(ϵti)/m(ϵti|uϵ,ti,θ) for t=1,,T.

We use the defensive mixture proposal density (Hesterberg Citation1995) (6) m(ϵt|uϵ,t,θ)=πp(ϵt|θ)+(1π)q(ϵt|θ,y1:T), with     0<π1,(6) to generate ϵt in the auxiliary disturbance particle filter; here uϵ,t is the vector of random variables used to generate the particles ϵt given θ. If the observation density p(yt|zt,θ) is bounded, then (6) guarantees that the weights are bounded in the disturbance particle filter algorithm. In practice, we take q(ϵt|θ,y1:T)=N(ϵt|μ̂t,Σ̂t) and set π=0.05 in all the examples in Section 3; see Algorithm 4. If π = 1 in (6), then m(ϵt|θ)=p(ϵt|θ) is the bootstrap disturbance particle filter. However, the empirical performance of this filter is usually poor because the resulting likelihood estimate is too variable.

Algorithm 3

Correlated Multinomial Resampling Algorithm

Input: uA,t1, sorted states z˜t11:N, sorted disturbances ϵ˜t11:N, and sorted weights w¯˜t11:N

Output: a˜t11:N

  1. Compute the cumulative weights F̂t1N(j)=i=1jw¯˜t1i based on the sorted state particles {z˜t11:N,w¯˜t11:N} or the sorted disturbance particles {ϵ˜t11:N,w¯˜t11:N}

  2. set a˜t1i=minjF̂t1N(j)uA,t1i for i=1,,N. Note that a˜t1i for i=1,,N is the ancestor index based on the sorted states or the sorted disturbances.

Algorithm 4

Constructing proposal for the auxiliary disturbance particle filter (ADPF)

Input: the number of particle filters S, the number of particles for each particle filter N, particles ϵ1:S,1:T1:N, weights w¯1:S,1:T1:N, and ancestor indices a1:S,1:T11:N.

Output: the mean μ̂t and the covariance matrix Σ̂t, for t=1,,T.

  1. Given the particles ϵ1:S,1:T1:N with weights w¯1:S,1:T1:N and ancestor indices a1:S,1:T11:N from the output of the disturbance particle filters, the ancestral tracing algorithm of Kitagawa (Citation1996) is used to sample from particle approximations from the smoothing distribution p(ϵ1:T|θ,y1:T). This consists of sampling one particle trajectory from each of the S particle filters in parallel. For each particle filter, we first sample an index js with the probability w¯s,Tjs, tracing back its ancestral lineage bs,1:Tjs(bs,Tjs=js and bs,t1js=as,t1bs,tjs) and choosing the particle trajectory ϵs,1:Tjs=(ϵs,1bs,1js,,ϵs,Tbs,Tjs) for s=1,,S.

  2. The mean μ̂t and the covariance matrix Σ̂t are

    μ̂t:=1Ss=1Sϵs,tbs,tjs,andΣ̂t:=1S1s=1S(ϵs,tbs,tjsμ̂t)(ϵs,tbs,tjsμ̂t).

Algorithm 4 outlines how μ̂t and the covariance matrix Σ̂t are obtained, for t=1,,T, at each iteration of the MPM algorithm. It is computationally cheap to obtain μ̂t and Σ̂t for t=1,,T because they are available from the output of the S disturbance particle filters run at each MCMC iteration and the ancestral tracing method is fast. The proposal mean μ̂t and the covariance matrix Σ̂t, for t=1,,T, obtained at iteration p are used to estimate the likelihood at the next iteration. At the first iteration, we use the bootstrap filter to initialize the mean μ̂t and the covariance matrix Σ̂t, for t=1,,T. Algorithm 5 gives the MPM algorithm with ADPF.

2.4 Multidimensional Euclidean Sorting Algorithm

It is important that the logs of the likelihood estimates evaluated at the current and proposed values of θ and u˜ are highly correlated to reduce the variance of logp̂¯N(y|θ,u˜)logp̂¯N(y|θ,u˜) because this helps the Markov chain to mix well. However, the standard resampling step in the particle filter introduces discontinuities which breaks down the correlation between the logs of the likelihood estimates at the current and proposed values even when the current parameters θ and the proposed parameters θ are close. Sorting the particles from smallest to largest before resampling helps to preserve the correlation between the logs of the likelihood estimates at the current and proposed values (Deligiannidis, Doucet, and Pitt Citation2018). However, such simple sorting is unavailable for multidimensional state particles.

This section discusses Algorithm 6, which is the fast multidimensional Euclidean sorting algorithm used to sort the multidimensional state or disturbance particles in the particle filter algorithm described in Algorithm 2. We write the algorithm in terms of state particles, but similar steps can be implemented for the disturbance particles. The algorithm takes the particles zt1:N and the normalized weights w¯t1:N as the inputs; it outputs the sorted particles z˜t1:N, sorted weights w¯˜t1:N, and sorted indices ζ1:N. Let zti be the ith d-dimensional particles at time t, zti=(zt,1i,,zt,di). Let d(ztj,zti) be the Euclidean distance between two multidimensional particles. The first step is to calculate the means of the multidimensional particles z¯ti=(1/d)k=1dzt,ki at time t for i=1,,N. The first sorting index, ζ1, is the index of the multidimensional particle having the smallest value of z¯ti for i=1,,N. Therefore, the first sorted particle z˜t1 is ztζ1 with its associated weight w¯˜t1=w¯tζ1. We then calculate the Euclidean distance between the selected multidimensional particle and the set of all remaining multidimensional particles, d(z˜t1,zti), for all iζ1. The next step sorts the particles and weights according to the Euclidean distance from smallest to largest to obtain the sorted particles z˜t2:N, sorted weights w¯˜t2:N, and sorted indices ζ2:N.

Algorithm 5

The Multiple PMMH (MPM) with ADPF algorithm

  • Set the initial values of θ(0) arbitrarily.

  • Sample usN(0,I) for s=1,,S, run S (disturbance) particle filters to estimate the likelihood p̂¯N(y|θ,u˜) as the trimmed mean of p̂N(y|θ,us),s=1,,S; a 0% trimmed mean is the mean and a 50% trimmed mean is the median, and run Algorithm 4 to construct the (initial) proposal for the auxiliary disturbance particle filter.

  • For each MCMC iteration p, for p=1,,P,

    • Sample θ from the proposal density q(θ|θ).

    • Choose index s with probability 1/S, sample ηuN(0,I), and set us=ρuus+1ρu2ηu.

    • Run S (disturbance) particle filters given in Algorithm 2 to compute the estimate of the likelihood p̂¯N(y|θ,u˜)

    • Run Algorithm 4 to construct the proposal for the auxiliary disturbance particle filter at iteration p + 1.

    • With the probability in (4), set p̂¯(y1:T|θ,u˜)(p)=p̂¯(y1:T|θ,u˜), u˜(p)=u˜, and θ(p)=θ; otherwise, set p̂¯(y1:T|θ,u˜)(p)=p̂¯(y1:T|θ,u˜)(p1), u˜(p)=u˜(p1), and θ(p)=θ(p1).

Algorithm 6 simplifies the sorting algorithm proposed by Choppala et al. (Citation2016). In Choppala et al. (Citation2016), the first sorting index is the index of the multidimensional particle having the smallest value along its first dimension. To select the second sorted index, ζ2, we need to calculate the Euclidean distance between the first selected particle and the remaining multidimensional particles. Similarly, to select the third sorted index, ζ3, we need to calculate the Euclidean distance between the second selected particle and the remaining multidimensional particles. This process is repeated N times, which is expensive for a large number of particles and a long time series. In contrast, Algorithm 6 only needs to calculate the Euclidean distance of the first selected particle and the remaining particles once. Deligiannidis, Doucet, and Pitt (Citation2018) use the Hilbert sorting method of Skilling (Citation2004) to order the multidimensional state particles, which requires calculating the Hilbert index for each multidimensional particle and is much more expensive than Algorithm 6.

Algorithm 6

Multidimensional Euclidean Sorting Algorithm

Input: zt1:N, w¯t1:N

Output: sorted particles z˜t1:N, sorted weights w¯˜t1:N, sorted indices ζ1:N

  • Calculate the means of the multidimensional particles z¯ti=1dk=1dzt,ki at time t for i=1,,N. The first sorting index, ζ1, is the index of the multidimensional particle having the smallest value of z¯ti for i=1,,N. The first selected particle z˜t1 is ztζ1 with its associated weight w¯˜t1=w¯tζ1.

  • Calculate the Euclidean distance between the selected multidimensional particle and the set of all remaining multidimensional particles, d(z˜t1,zti), for iζ1.

  • Sort the particles and weights according to the Euclidean distance from smallest to largest to obtain ζi for i=2,,N. The sorted particles z˜ti=ztζi and w¯˜ti=w¯tζi for i=2,,N.

shows the CPU time (in seconds) of the three sorting methods for different numbers of particles N and state dimensions d. From the table, the proposed sorting algorithm is much faster than that in Choppala et al. (Citation2016) and the Hilbert sorting method. For example, for state dimensions d = 30 and N = 2000 particles, Algorithm 6 is 290 and 335 times faster than the Choppala et al. (Citation2016) and the Hilbert sorting methods, respectively

Table 1 Comparing different sorting methods for various combinations of N and d in terms of CPU time (in seconds): I: The fast Euclidean sorting algorithm, II: Choppala et al. (Citation2016) sorting method, and III: Hilbert sorting method.

3 Examples

Section 3.1 discusses the inefficiency measures we use to compare the performance of different particle filters or PMMH samplers. Section 3.2 investigates empirically the performance of the proposed methods for estimating a high-dimensional linear Gaussian state space model. Section 3.3 discusses a multivariate GARCH diffusion-driven volatility in the mean model. Sections 3.4.1 and 3.4.2 apply the proposed samplers to estimate nonlinear small scale DSGE and medium scale DSGE models, respectively.

3.1 Definitions of Inefficiency

We use the inefficiency factor (IF) (also called the integrated autocorrelation time), IF:=1+2j=1ρψ(j), to measure the inefficiency of a PMMH sampler at estimating the posterior expectation of a univariate function ψ(θ) of θ; here, ρψ(j) is the jth autocorrelation of the iterates ψ(θ) in the MCMC chain after it has converged to its stationary distribution. We estimate the IF using the CODA R package of Plummer et al. (Citation2006). A low value of the IF estimate suggests that the Markov chain mixes well. Our measure of the inefficiency of a PMMH sampler that takes computing time (CT) into account for a given parameter θ based on IF is the time normalized inefficiency factor (TNIF) defined as TNIF:=IF×CT. For a given sampler, let IFMAX and IFMEAN be the maximum and mean of the IF values over all the parameters in the model. The relative time normalized inefficiency factor (RTNIF) is a measure of the TNIF relative to the benchmark method, where the benchmark method depends on the example.

3.2 Linear Gaussian State Space Model

This section investigates (a) the efficiency of different approaches for obtaining likelihood estimates from S particle filters, and (b) the performance of different PMMH samplers for estimating a single parameter θ, regardless of the dimension of the states. We consider the linear Gaussian state space model discussed in Guarniero, Johansen, and Lee (Citation2017) and Deligiannidis, Doucet, and Pitt (Citation2018), where {Xt;t1} and {Yt;t1} are Rd valued with Yt=Xt+Wt,Xt+1=AθXt+Vt+1, with X1N(0d,Id), VtN(0d,Id), WtN(0d,Id), and Aθi,j=θ|ij|+1 for i < j; the true value of θ is 0.4. Although it is possible to use the more efficient fully adapted particle filter (Pitt et al. Citation2012), we use the bootstrap filter for all methods to show that the proposed methods are useful for models where it is difficult to use better particle filter algorithms.

The first study compares the 0%, 5%, 10%, 25%, and 50% trimmed means of the likelihood estimates obtained from S particle filters. The simulated data is generated from the model above with T = 200 and T = 300 time periods and d = 1, 5 and 10 dimensions.

reports the sample variance of the log of the estimated likelihood obtained by using the five estimators of the likelihood for the d = 10 dimensional linear Gaussian state space model with T = 300 time periods. The table shows that: (a) there is no substantial reduction in the variance of the log of the estimated likelihood for the mean (0% trimmed mean). For example, the variance decreases by only a factor of 2.31 times when S increases from 20 to 1000 when N = 250. (b) The 5%, 10%, 25%, and 50% trimmed means of the individual likelihood estimates decrease the variance substantially as S and/or N increase. The 25% and 50% trimmed mean estimates have the smallest variance for all cases. Similar results are obtained for d = 5 with T = 200 and T = 300 and d = 10 with T = 200; see Tables S3, S4, and S5 in section S4 of the online supplement.

Table 2 Comparing the variance of the log of the estimated likelihood for five different estimators of the likelihood: I: Averaging the likelihood (0% trimmed mean), II: Averaging the likelihood (5% trimmed mean), III: Averaging the likelihood (10% trimmed mean), IV: Averaging the likelihood (25% trimmed mean), and V: Averaging the likelihood (50% trimmed mean), for d = 10 dimensional linear Gaussian state space model with T = 300.

Table 3 Comparing the performance of different PMMH samplers with different number of particle filters S and different number of particles N in each particle filter for estimating the linear Gaussian state space model using a simulate dataset with T = 300 and d = 10 dimensions.

Table 4 Comparing the performance of different PMMH samplers with different number of particle filters S and different number of particles N in each particle filter for estimating multivariate GARCH diffusion-driven volatility in mean model using a real dataset with T = 100 and d = 30.

Table 5 Comparing the performance of different PMMH samplers using different number of particle filters S and different number of particles N in each particle filter for estimating the small scale DSGE model using a real dataset with T = 124 observations.

We now examine empirically the ability of the proposed MPM samplers to maintain the correlation between successive values of the log of the estimated likelihood for the 10 dimensional linear Gaussian state space model with T = 300. We ran the different MPM approaches for 1000 iterations holding the current parameter at θ=0.4 and the proposed parameter at θ=0.4,0.399,0.385. At each iteration we generated u˜, where u˜=(u1,,uS), and u˜ and obtained logp̂¯N(y|θ,u˜(s)) and logp̂¯N(y|θ,u˜(s)) for the MPM approaches and computed their sample correlations.

Figure S5 in section S4 of the online supplement reports the correlation estimates of the log of estimated likelihood obtained using different MPM approaches. The figure shows that: (a) when the current and proposed values of the parameters are equal to 0.4, all MPM methods maintain a high correlation between the logs of the estimated likelihoods. The estimated correlations between logs of the estimated likelihoods when the number of particle filters S is 100 are about 0.99; (b) when the current parameter θ and the proposed parameter θ are (slightly) different, the MPM methods can still maintain some of the correlations between the log of the estimated likelihoods. The MPM methods with 25% and 50% trimmed means of the likelihood perform the best to maintain the correlation between the logs of estimated likelihoods.

We now compare the efficiency of the different sampling schemes for estimating the parameter θ of the linear Gaussian state space model. In all the examples, we run the samplers for 25,000 iterations, with the initial 5000 iterations discarded as warm up. The particle filter and the parameter samplers are implemented in Matlab running on a high performance computer cluster with 20 CPU cores. The optimal number of CPU cores required for the MPM algorithm is equal to the number of particle filters S, which means the properties of the sampler can be easily tuned to provide maximum parallel efficiency on a large range of hardware. The adaptive random walk proposal of Roberts and Rosenthal (Citation2009) is used for all samplers.

Figure S2 in section S4 of the online supplement provides the trace plots of the parameter θ estimated using the MPM algorithm with the different trimmed means for estimating the 10 dimensional linear Gaussian state space model with T = 200. The figure shows that: (a) the MCMC chain gets stuck for the 5% trimmed mean approach with S = 100 and N = 250 and N = 500 and for the 10% trimmed mean approach with S = 100 and N = 100; (b) the 10% trimmed mean approach requires at least S = 100 and N = 250 to make the MCMC chain mix well; (c) the 25% trimmed mean approach is the most efficient sampler as its MCMC chain does not get stuck even with S = 100 and N = 100. Figure S1 in section S4 of the online supplement compares the MPM algorithms with and without blocking with S = 100 and N = 250 for estimating the 10 dimensional linear Gaussian state space model with T = 300. The figure shows that the MPM (no blocking) with 5% and 10% trimmed means get stuck and the MPM (no blocking) with 25% trimmed means mixes poorly. The IF̂ of the MPM (blocking) with 25% trimmed mean is 9.624 times smaller than the MPM (no blocking) with 25% trimmed mean. All three panels show that the MPM (blocking) algorithms using trimmed means perform better than the MPM (no blocking), demonstrating the usefulness of the MPM (blocking) sampling scheme.

reports the IF̂, TNIF̂, and RTNIF̂ values for the parameter θ in the linear Gaussian state space model with d = 10 dimensions and T = 300 time periods estimated using the following five MCMC samplers: (a) the correlated PMMH of Deligiannidis, Doucet, and Pitt (Citation2018), (b) the MPM with 5% trimmed mean, (c) the MPM with 10% trimmed mean, (d) the MPM with 25% trimmed mean, and (e) the MPM with 50% trimmed mean. In this article, instead of using the Hilbert sorting method, we implement the correlated PMMH using the fast multidimensional Euclidean sorting method in Section 2.4. The computing time reported in the table is the time to run a single particle filter for the CPM and S particle filters for the MPM approach. The table shows the following. (a) The correlated PMMH requires more than 50,000 particles to improve the mixing of the MCMC chain for the parameter θ. (b) The CPU time for running a single particle filter with N=50,000 particles is 4.09 times higher than running multiple particle filters with S = 100 and N = 500. The MPM method can be much faster than the CPM method if it is run using high-performance computing with a large number of cores. (c) The MPM allows us to use a much smaller number of particles for each independent PF and these multiple PFs can be run independently. (d) The IF̂ values for the parameter θ estimated using the correlated PMMH with N=50,000 particles are approximately 11, 42, 46, and 56 times larger than the 5%, 10%, 25%, and 50% trimmed means approaches with S = 100 and N = 500 particles, respectively. (e) In terms of RTNIF̂, the 5%, 10%, 25%, and 50% trimmed means with S = 100 and N = 500 are approximately 45, 172, 189, and 231 times smaller than the correlated PMMH with N=50,000 particles. (6) The best sampler for this example is the 50% trimmed mean approach with S = 100 and N = 250 particles. Table S6 in section S4 of the online supplement gives similar results for the 10 dimensional linear Gaussian state space model with T = 200. Figure S4 in section S4 of the online supplement plots the kernel density estimates of the parameter θ estimated using Metropolis-Hastings algorithm with the (exact) Kalman filter method and the MPM algorithm with 5%, 10%, 25%, and 50% trimmed means of the estimated likelihood. The figure shows that the approximate posteriors obtained by various approaches using trimmed means are very close to the true posterior. Similar results are obtained for the case d = 10 dimensions and T = 200 time periods given in Figure S3 in section S4 of the online supplement. Figure S3 also shows that the approximate posteriors obtained using the MPM with the 5%, 10%, 25%, and 50% trimmed means of the likelihood are very close to the exact posterior obtained using the correlated PMMH.

In summary, the example suggests that: (a) for a high dimensional state space model, there is no substantial reduction in the variance of the log of the estimated likelihood for the method which uses the average of the likelihood from S particle filters when S and/or N increases. Methods that use trimmed means reduce the variance substantially when S and/or N increases; (b) the 25% and 50% trimmed mean approaches give the smallest variance of the log of the estimated likelihood for all cases; (c) the MPM method with 50% trimmed mean is best at maintaining the correlation between the logs of the estimated likelihoods in successive iterates; (d) the MPM (blocking) method is more efficient than the MPM (no blocking) for the same values of S and N; (e) all approximate approaches that use the trimmed means of the likelihood give accurate approximations to the true posterior; (f) the best sampler for estimating the 10 dimensional linear Gaussian state space model with T = 300 is the 50% trimmed mean approach with S = 100 and N = 250 particles; (g) the MPM allows us to use much smaller number of particles for each independent PF and these multiple PFs can be run independently. These approaches can be made much faster if they are run using a high-performance computer cluster with a large number of cores.

3.3 Multivariate GARCH Diffusion-Driven Volatility in Mean Model

This section investigates the performance of the proposed MPM samplers for estimating large multivariate GARCH diffusion-driven volatility in mean models, where each of the log volatility processes follows a GARCH (Generalized Autoregressive Conditional Heteroscedasticity) diffusion process (Shephard, Chib, and Pitt Citation2004). The GARCH diffusion model does not have a closed form state transition density, making it challenging to estimate using the Gibbs type MCMC sampler used in Cross et al. (Citation2021). It is well-known that the Gibbs sampler is inefficient for generating parameters for a diffusion process, especially for the variance parameter τ2 (Stramer and Bognar Citation2011). Conversely, MPM samplers can efficiently estimate models with non-closed form state transition densities, making them well-suited for the GARCH diffusion model. This section demonstrates that MPM samplers can estimate this model efficiently and overcome the limitations of the Gibbs sampler.

Suppose that Pt is a d×1 vector of daily stock prices and define yt:=logPtlogPt1 as the log-return of the stocks. Let hi,t be the log-volatility process of the ith stock at time t. We also define, h,t:=(h1,t,,hd,t) and hi,:=(hi,1,,hi,T). The model for yt is (7) yt=Ah·,t+ϵt, ϵtN(0,Σt),(7) where A is a d × d matrix that captures the effects of log-volatilities on the levels of the variables. The time-varying error covariance matrix Σt depends on the unobserved latent variables h,t such that (8) Σt:=diag(exp(h1,t),,exp(hd,t)).(8)

Each log-volatility hi,t is assumed to follow a continuous time GARCH diffusion process (Shephard, Chib, and Pitt Citation2004) satisfying (9)  dhi,t={α(μexp(hi,t))exp(hi,t)τ22} dt+τ dWi,t,(9) for i=1,d, where the Wi,t are independent Wiener processes. The following Euler scheme approximates the evolution of the log-volatilities hi,t in (9) by placing M–1 evenly spaced points between times t and t + 1. We denote the intermediate volatility components by hi,t,1,,hi,t,M1, and it is convenient to set hi,t,0=hi,t and hi,t,M=hi,t+1. The equation for the Euler evolution, starting at hi,t,0 is (see, e.g., Stramer and Bognar Citation2011, p. 234) is (10) hi,t,j+1|hi,t,jN(hi,t,j+{α(μexp(hi,t,j))exp(hi,t,j)τ22}δ,τ2δ),(10) for j=0,,M1, where δ=1/M. The initial state of hi,t is assumed normally distributed N(0, 1) for i=1,,d. For this example, we assume for simplicity that the parameters μ, α, and τ2 are the same across all stocks and Ai,j=ψ|ij|+1 for i < j. This gives the same number of parameters regardless of the dimension of the stock returns so that we can focus on comparing different PMMH samplers.

We first report on a simulation study for the above model. We simulated ten independent datasets from the model described above with d = 20 dimensions and T = 100 observations. The true parameters are α = 2, μ=1.81, τ2=0.38, and ψ=0.01. The priors are αG(1,1), τ2G(0.5,0.5), μG(1,1), and ψN(0,1). These prior densities are non-informative and cover almost all possible values in practice. We use M = 3 latent points for the Euler approximations of the state transition densities. For each dataset, we run the MPM sampler with the 25% trimmed mean approach for 25,000 iterations; the initial 5000 iterations are discarded as warm up. The particle filter and the parameter samplers are implemented in Matlab running on a high performance computer cluster with 20 cores. The adaptive random walk proposal of Roberts and Rosenthal (Citation2009) is used for all samplers.

Figures S7 to S16 in section S5 of the online supplement plot the kernel density estimates of the parameters of the multivariate GARCH diffusion-driven volatility in mean model described above estimated by the MPM sampler (S = 100 and N = 250) using a 25% trimmed mean; the vertical lines show the true parameter values. The figures show that the model parameters are accurately estimated. Figure S6 in section S5 of the online supplement shows the inefficiency factors (IF̂) of the parameters over 10 simulated datasets with d = 20 dimensions and T = 100 time periods. The results suggest that the MPM sampler with only N = 250 particles and S = 100 particle filters is efficient because the IF̂ values of the parameters are reasonably small.

We now apply the methods to a sample of daily U.S. industry stock returns data obtained from the Kenneth French website,Footnote1 using a sample from January 4th, 2021 to the 26th of May, 2021, a total of 100 observations. Table S7 in section S5 of the online supplement reports the IF̂, TNIF̂, and RTNIF̂ for the parameters of the 20 dimensional multivariate GARCH diffusion-driven volatility in mean model estimated using (a) the correlated PMMH, (b) the MPM sampler with the 25% trimmed mean of the likelihood estimate, and (c) the MPM sampler with the 50% trimmed mean of the likelihood estimate. The table shows that: (a) the MCMC chain obtained using the correlated PMMH with N=100,000 still gets stuck; (b) the CPU time for the correlated PMMH with N=100,000 particles is 26 times slower than the MPM algorithm with S = 250 and N = 100; (c) the MPM sampler with N = 250 and S = 100 is less efficient than the MPM sampler with N = 100 and S = 250. The MPM sampler with N = 100 and S = 250 can be made more efficient by running it on a high-performance computer cluster with many cores; (4) the best sampler in terms of TNIF̂MEAN and TNIF̂MAX for estimating the 20 dimensional multivariate GARCH diffusion-driven volatility in mean model is the MPM sampler with a 50% trimmed mean with S = 250 and N = 100.

reports the IF̂, TNIF̂, and RTNIF̂ for the parameters of the d = 30 dimensional multivariate GARCH diffusion-driven volatility in mean model estimated using: (a) the MPM sampler with the 25% trimmed mean of the likelihood, and (b) the MPM sampler with 50% trimmed mean of the likelihood. We do not consider the correlated PMMH as it is very expensive to run for this high dimensional model. The table shows that: (a) increasing the number of particle filters from S = 100 to S = 200 reduces the IF̂, but increases the computational time substantially. However, the computational time can be reduced by using a high performance computer cluster with a larger number of cores. (b) the MPM sampler with N = 500 and S = 100 is less efficient than the MPM sampler with N = 250 and S = 200. If there is access to large computing resources, then we recommend increasing the number of particle filters S while keeping the number of particles N in each particle filter reasonably small.

3.4 DSGE Models

This section evaluates the effectiveness of the MPM samplers for estimating two nonlinear DSGE models: a stylized small-scale and a medium-scale DSGE model described in sections S8 and S9 of the online supplement. We use the disturbance filter method introduced in section 2.3 because the dimension of the state vector is much larger than that of the disturbances in these models. The state transition densities of the two nonlinear DSGE models are intractable. Section S3 of the online supplement describes the state space representation of these models, which are solved using Dynare.

3.4.1 Small Scale Nonlinear DSGE Model

This section reports on how well a number of PMMH samplers of interest estimate the parameters of a nonlinear (second order) small-scale DSGE model. This small-scale DSGE model is stylized and similar to the canonical three-equation New Keynesian model. The modeling framework follows Herbst and Schorfheide (Citation2019) with nominal rigidities through price adjustment costs following Rotemberg (Citation1982), and three structural shocks. There are four state variables, four control variables (8 variables combining the state and control variables) and 3 exogenous shocks in the model. We estimate this model using 3 observables, which means that the dimensions of yt , the state vector, and the disturbance vector are 3, 8, and 3, respectively. Section S8 of the online supplement describes the model.

In this section, we use the delayed acceptance version of the MPM algorithm, calling it “delayed acceptance multiple PMMH” (DA-MPM), to speed up the computation. The delayed acceptance sampler (Christen and Fox Citation2005) avoids computing the expensive likelihood estimate if it is unlikely that the proposed draw will ultimately be accepted. The first accept-reject stage uses a cheap (or deterministic) approximation to the likelihood instead of the expensive likelihood estimate in the MH acceptance ratio. The particle filter is then used to estimate the likelihood only for a proposal that is accepted in the first stage; a second accept-reject stage ensures that detailed balance is satisfied with respect to the true posterior. We use the likelihood obtained from the central difference Kalman filter (CDKF) proposed by Norgaard, Poulsen, and Ravn (Citation2000) in the first accept-reject stage of the delayed acceptance scheme. See section S2 of the online supplement for further details. The dimension of states is usually much larger than the dimension of the disturbances for the DSGE models. Two different sorting methods are also considered. The first sorts the state particles and the second sorts the disturbance particles using the sorting algorithm in Section 2.4.

The PMMH samplers considered are: (a) the MPM (0% trimmed mean) with the auxiliary disturbance particle filter (ADPF), disturbance sorting, and the correlation between random numbers used in estimating the log of estimated likelihoods set to ρu=0.9; (b) the MPM (10% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9; (c) the MPM (25% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9; (d) the MPM (25% trimmed mean) with the ADPF, disturbance sorting, and ρu=0, (e) the MPM (25% trimmed mean) with the ADPF, state sorting, and ρu=0.9; (f) the delayed acceptance MPM (DA-MPM) (25% trimmed mean) with ADPF, disturbance sorting, and the ρu=0.9; (g) the MPM (50% trimmed mean) with the ADPF, disturbance sorting, and the ρu=0.9; (h) the MPM (0% trimmed mean) with the bootstrap particle filter, disturbance sorting, and the ρu=0.9; (i) the correlated PMMH with ρu=0.9, the bootstrap particle filter, and disturbance sorting. Each sampler ran for 25,000 iterations, with the initial 5000 iterations discarded as burn-in. We use the adaptive random walk proposal of Roberts and Rosenthal (Citation2009) for q(θ|θ) and the adaptive scaling approach of Garthwaite, Fan, and Sisson (Citation2015) to tune the Metropolis-Hastings acceptance probability to 20%. Section S8 of the online supplement gives details on model specifications and model parameters. We include measurement error in the observation equation. The measurement error variances are estimated together with other parameters. All the samplers ran on a high performance computer cluster with 20 CPU cores. The real dataset is obtained from Herbst and Schorfheide (Citation2019), using data from 1983Q1 to 2013Q4, which includes the Great Recession with a total of 124 observations for each series.

reports the relative time normalized inefficiency factor (RTNIF̂) of a PMMH sampler relative to the DA-MPM (25% trimmed mean) with the ADPF, disturbance sorting, and ρu=0.9 for the nonlinear small scale models with T = 124 observations. The computing time reported in the table is the time to run a single particle filter for the CPM and S particle filters for the MPM approach. The table shows that: (a) The MPM sampler (0% trimmed mean) with the ADPF, disturbance sorting, ρu=0.9, and N = 250 particles is more efficient than the MPM sampler (0% trimmed mean) with the bootstrap particle filter, disturbance sorting, ρu=0.9, and N = 250 particles with the MCMC chain obtained by the latter getting stuck. (b) The running time taken by the MPM method with S = 100 particles filters, each with N = 100 particles and disturbance sorting is 9.73 times faster than the CPM with N=20,000 particles. (c) In terms of RTNIF̂MAX, the performance of MPM samplers with 0%, 10%, 25%, and 50% trimmed means, ADPF, disturbance sorting, and ρu=0.9 are 3.15, 90.77, 103.04, and 91.50 times more efficient respectively, than the correlated PMMH with N=20,000 particles. (d) In terms of RTNIF̂MAX, the MPM (25% trimmed mean) with disturbance sorting, and ρu=0.9 is 1.38 times more efficient than the MPM (25% trimmed mean) method with state sorting method and performs similarly to the MPM (25% trimmed mean) approach with ADPF, disturbance sorting, and ρu=0. (e) The delayed acceptance MPM (25% trimmed mean) is slightly more efficient than the MPM (25% trimmed mean) approach. The delayed acceptance algorithm is 5 times faster on average because the target Metropolis-Hastings acceptance probability is set to 20% using the Garthwaite, Fan, and Sisson (Citation2015) approach, but it has higher maximum IF̂ value. Table S9 in section S6 of the online supplement gives the details. (f) It is possible to use the tempered particle filter (TPF) of Herbst and Schorfheide (Citation2019) within the MPM algorithm. However, the TPF is computationally expensive because of the tempering iterations and the random walk Metropolis-Hastings mutation steps. In summary, the best sampler with the smallest RTNIF̂MAX and RTNIF̂MEAN is the delayed acceptance MPM (DA-MPM) (25% trimmed mean) with ADPF, disturbance sorting, and ρu=0.9.

3.4.2 Nonlinear Medium Scale DSGE Model

Section 3.4.1 evaluates the performance of different PMMH samplers in estimating the stylized three equation New Keynesian model. A vast majority of studies however require a medium-scale model with a higher dimensional state space. This section, therefore, applies the MPM samplers with 50% trimmed mean for estimating the parameters of the second-order medium-scale DSGE model of Gust et al. (Citation2017) discussed in section S9 of the online supplement.

The medium-scale DSGE model considered is similar to the models in Christiano, Eichenbaum, and Evans (Citation2005) and Smets and Wouters (Citation2007) with nominal rigidities in both wages and prices, habit formation in consumption, investment adjustment costs, and five structural shocks. We solve the model using perturbation methods and consider a second-order accurate solution. In the DSGE model examples, we consider, as before, the source of nonlinearity stems from the second-order interaction terms between endogenous and exogenous variables in the model. Our article does not consider occasionally binding constraints such as the zero lower bound (ZLB) on the nominal interest and solve the model using perturbation methods while focusing on the efficiency of the estimation algorithm. We restrict our sample to the period before the zero lower bound constraint on the nominal interest rate binds.

The baseline model in Gust et al. (Citation2017) consists of 12 state variables, 18 control variables (30 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using five observables; that means that the dimensions of yt , the state vector and the disturbance vector are 5, 30, and 5, respectively. We include measurement error in the observation equation and consider two cases. The measurement error variances are: (a) estimated and (b) fixed at the 25% of the sample variance of the observables over the estimation period. We use quarterly U.S. data from 1983Q1 to 2007Q4, with a total of 100 observations for each of the five series.

The PMMH samplers considered are: (a) the MPM (50% trimmed mean) with the auxiliary disturbance particle filter (ADPF), disturbance sorting, and the correlation between random numbers used in estimating the log of estimated likelihoods set to ρu=0.9; (b) the correlated PMMH of Deligiannidis, Doucet, and Pitt (Citation2018). Each sampler ran for 60,000 iterations, with the initial 30,000 iterations discarded as burn-in. We use the adaptive random walk proposal of Roberts and Rosenthal (Citation2009) for q(θ|θ). All the samplers ran on a high performance computer cluster with 20 CPU cores. shows the RTNIF̂ of the correlated PMMH relative to the MPM sampler with 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9) for the nonlinear medium scale DSGE model for the cases where the measurement error variances are estimated and fixed at 25% sample variance of the observables. The computing time reported in the table is the time to run a single particle filter for the CPM and S = 250 particle filters for the MPM approach. The table shows that in terms of RTNIF̂MAX, the MPM sampler is 26 and 14 times more efficient than the correlated PMMH for the cases where the measurement error variances are estimated and fixed at 25% sample variance of the observables, respectively. Table S12 in section S7 of the online supplement gives the details on the inefficiency factors for each parameter.

Table 6 Comparing the performance of different PMMH samplers using different numbers of particle filters S and different numbers of particles N in each particle filter for estimating the medium scale DSGE model using the U.S. quarterly dataset from 1983Q1 to 2007Q4.

Table S11 in section S7 of the online supplement gives the posterior means, 2.5%, and 97.5% quantile estimates of all the parameters in the medium scale DSGE model estimated using the MPM sampler with 50% trimmed mean (ρu=0.9, disturbance sorting, ADPF). The measurement error variances are fixed to 25% of the sample variance of the observables. The table shows that the standard errors are relatively small for all parameters indicating that the parameters are estimated accurately. shows the kernel density estimates of some of the parameters of the medium scale DSGE model estimated using the MPM with 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9) and the correlated PMMH. The figure shows that the approximate posteriors obtained using the MPM with 50% trimmed mean is very close to the exact posteriors obtained using the correlated PMMH.

Fig. 1 Kernel density estimates of the posterior densities of some parameters of the medium scale DSGE model for the U.S. quarterly dataset from 1983Q1 to 2007Q4 estimated using: (a) MPM with 50% trimmed mean (ρu=0.9, disturbance sorting, ADPF) with S = 250 and N = 100; (b) correlated PMMH with N=50,000. The measurement error variances are fixed to 25% of the variance of the observables.

Fig. 1 Kernel density estimates of the posterior densities of some parameters of the medium scale DSGE model for the U.S. quarterly dataset from 1983Q1 to 2007Q4 estimated using: (a) MPM with 50% trimmed mean (ρu=0.9, disturbance sorting, ADPF) with S = 250 and N = 100; (b) correlated PMMH with N=50,000. The measurement error variances are fixed to 25% of the variance of the observables.

We also consider a variation of the medium scale DSGE model of Gust et al. (Citation2017) described in section S9 of the online supplement. To study how well our approach performs with increasing state dimension, we estimate the model with the canonical definition of the output gap defined as the difference between output in the model with nominal rigidities and output in the model with flexible prices and wages. This variation is identical to the baseline model in Gust et al. (Citation2017) except for the Taylor rule specification, which now uses the canonical definition of the output gap. This extension of the medium scale DSGE model consists of 21 state variables, 30 control variables (51 variables combining the state and control variables) and 5 exogenous shocks. We estimate this model using five observables: this means that the dimensions of yt , the state vector and the disturbance vector are 5, 51, and 5, respectively. The measurement error variances are fixed at 25% of the variance of the observables over the estimation period. Table S13 of the online supplement shows the posterior means, 2.5% and 97.5% quantiles of the parameters of the extended medium scale DSGE model estimated using the MPM algorithm with a 50% trimmed mean (ADPF, disturbance sorting, and ρu=0.9) with S = 250 and N = 100. The table shows that the IF̂s of the parameters are similar to the simpler medium scale DSGE model, indicating the performance of the MPM algorithm does not deteriorate with the extended model having higher state dimensions.

4 Summary and Conclusions

The article proposes a general pseudo marginal approach (MPM) for estimating the posterior density of the parameters of complex and high-dimensional state-space models. It is especially useful when the single bootstrap filter is inefficient, while the more efficient auxiliary particle filter cannot be used because the state transition density is computationally intractable. The MPM method is a general extension of the PMMH method and makes four innovations: (a) it is based on the mean or trimmed mean of unbiased likelihood estimators; (b) it uses a novel block version of PMMH that works with multiple particle filters; (c) an auxiliary disturbance particle filter sampler is proposed to estimate the likelihood which is especially useful when the dimension of the states is much larger than the dimension of the disturbances; (d) a fast Euclidean sorting algorithm is proposed to preserve the correlation between the logs of the estimated likelihoods at the current and proposed parameter values. The MPM samplers are applied to complex multivariate GARCH diffusion-driven volatility in mean and DSGE models and are shown to work well.

The source of nonlinearity we consider in estimating DSGE models is due to the interaction terms in the model solution. The article aims to demonstrate the proposed algorithm’s efficacy in estimating complicated state space models with a high-dimensional state space. There are other sources of nonlinearities in the DSGE model that we do not consider in the present article. For instance, the model does not consider the zero lower bound on nominal interest rates. With occasionally binding constraints, such frameworks require global methods to solve the model before evaluating the likelihood. However, solution methods using global techniques are time-intensive. To reduce the computational time, we focus exclusively on models which can be solved using perturbation methods. Given a solution technique, the proposed algorithm can be used to estimate nonlinear DSGE models incorporating important features such as occasionally binding constraints. The other feature we do not address is the case of big shocks such as the Covid-19 pandemic. Modeling rare-disaster shocks is challenging when solving models using perturbation methods. Fernández-Villaverde and Levintal (Citation2018) show that even at a third order, the model solution may suffer from accuracy issues when considering big shocks. These are interesting avenues for future work.

Supplementary Materials

The article has an online supplement. The commented Matlab code and readme.txt files are available at https://github.com/davidgun1000/Block-Correlated-Pseudo-Marginal to enable the reader to replicate all the empirical examples in the article.

Supplemental material

SubmittedCode 2.zip

Download Zip (6 MB)

Acknowledgments

We would like to thank the anonymous associate editor and two referees for their comments which helped improve both the presentation and technical content of the article.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research of Robert Kohn and David Gunawan was partially supported by an ARC Center of Excellence grant CE140100049.

Notes

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

  • Andrieu, C., Doucet, A., and Holenstein, R. (2010), “Particle Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society, Series B, 72, 269–342. DOI: 10.1111/j.1467-9868.2009.00736.x.
  • Andrieu, C., and Roberts, G. (2009), “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations,” The Annals of Statistics, 37, 697–725. DOI: 10.1214/07-AOS574.
  • Choppala, P., Gunawan, D., Chen, J., Tran, M. N., and Kohn, R. (2016), “Bayesian Inference for State-Space Models Using Block and Correlated Pseudo Marginal Methods,” arXiv:1612.07072v1.
  • Christen, J. A., and Fox, C. (2005), “Markov Chain Monte Carlo Using an Approximation,” Journal of Computational and Graphical Statistics, 14, 795–810. DOI: 10.1198/106186005X76983.
  • Christiano, L. J., Eichenbaum, M., and Evans, C. L. (2005), “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113, 1–45. DOI: 10.1086/426038.
  • Cross, J. L., Hou, C., Koop, G., and Poon, A. (2021), “Macroeconomic Forecasting with Large Stochastic Volatility in Mean VARs,” BI Norwegian Business School. https://www.oru.se/contentassets/b70754c1297e4b65993aa6115c0f5d0c/cross-macroeconomic-forecasting-with-large-stochastic-volatility-in-mean-vars.pdf.
  • Deligiannidis, G., Doucet, A., and Pitt, M. (2018), “The Correlated Pseudo-Marginal Method,” Journal of Royal Statistical Society, Series B, 80, 839–870. DOI: 10.1111/rssb.12280.
  • Fernández-Villaverde, J., and Levintal, O. (2018), “Solution Methods for Models with Rare Disasters,” Quantitative Economics, 9, 903–944. DOI: 10.3982/QE744.
  • Fernández-Villaverde, J., and Rubio-Ramírez, J. F. (2007), “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74, 1059–1087. DOI: 10.1111/j.1467-937X.2007.00437.x.
  • Garthwaite, P. H., Fan, Y., and Sisson, S. A. (2015), “Adaptive Optimal Scaling of Metropolis-Hastings Algorithms Using the Robbins-Monro Process,” Communications in Statistics - Theory and Methods, 45, 5098–5111. DOI: 10.1080/03610926.2014.936562.
  • Gordon, N., Salmond, D., and Smith, A. (1993), “A Novel Approach to Nonlinear and non-Gaussian Bayesian State Estimation,” IEE-Proceedings F, 140, 107–113.
  • Guarniero, P., Johansen, A. M., and Lee, A. (2017), “The Iterated Auxiliary Particle Filter,” Journal of American Statistical Association, 112, 1636–1647. DOI: 10.1080/01621459.2016.1222291.
  • Gust, C., Herbst, E., López-Salido, D., and Smith, M. E. (2017), “The Empirical Implications of the Interest-Rate Lower Bound,” American Economic Review, 107, 1971–2006. DOI: 10.1257/aer.20121437.
  • Hall, J., Pitt, M. K., and Kohn, R. (2014), “Bayesian Inference for Nonlinear Structural Time Series Models,” Journal of Econometrics, 179, 99–111. DOI: 10.1016/j.jeconom.2013.10.016.
  • Herbst, E., and Schorfheide, F. (2019), “Tempered Particle Filtering,” Journal of Econometrics, 210, 26–44. DOI: 10.1016/j.jeconom.2018.11.003.
  • Hesterberg, T. (1995), “Weighted Average Importance Sampling and Defensive Mixture Distributions,” Technometrics, 37, 185–194. DOI: 10.1080/00401706.1995.10484303.
  • Johnson, A. A., Jones, G. L., and Neath, R. C. (2013), “Component-Wise Markov Chain Monte Carlo: Uniform and Geometric Ergodicity under Mixing and Composition,” Statistical Science, 28, 360–375. DOI: 10.1214/13-STS423.
  • Kitagawa, G. (1996), “Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models,” Journal of Computational and Graphical Statistics, 5, 1–25. DOI: 10.2307/1390750.
  • Murray, L. M., Jones, E. M., and Parslow, J. (2013), “On Disturbance State-Space Models and the Particle Marginal Metropolis-Hastings Sampler,” SIAM/ASA Journal on Uncertainty Quantification, 1, 494–521. DOI: 10.1137/130915376.
  • Norgaard, M., Poulsen, N. K., and Ravn, O. (2000), “New Developments in State Estimation for Nonlinear Systems,” Automatica, 36, 1627–1638. DOI: 10.1016/S0005-1098(00)00089-3.
  • Pitt, M. K., Silva, R. S., Giordani, P., and Kohn, R. (2012), “On Some Properties of Markov Chain Monte Carlo Simulation Methods based on the Particle Filter,” Journal of Econometrics, 171, 134–151. DOI: 10.1016/j.jeconom.2012.06.004.
  • Plummer, M., Best, N., Cowles, K., and Vines, K. (2006), “CODA: Convergence Diagnosis and Output Analysis of MCMC,” R News, 6, 7–11.
  • Roberts, G. O., and Rosenthal, J. S. (2009), “Examples of Adaptive MCMC,” Journal of Computational and Graphical Statistics, 18, 349–367. DOI: 10.1198/jcgs.2009.06134.
  • Rotemberg, J. J. (1982), “Sticky Prices in the United States,” Journal of Political Economy, 90, 1187–1211. DOI: 10.1086/261117.
  • Shephard, N., Chib, S., and Pitt, M. K. (2004), “Likelihood-based Inference for Diffusion-Driven Models,” Technical Report. https://www.nuff.ox.ac.uk/economics/papers/2004/w20/chibpittshephard.pdf.
  • Sherlock, C., Thiery, A. H., and Lee, A. (2017), “Pseudo-Marginal Metropolis-Hastings Sampling Using Averages of Unbiased Estimators,” Biometrika, 104, 727–734. DOI: 10.1093/biomet/asx031.
  • Skilling, J. (2004), “Programming the Hilbert Curve,” AIP Conference Proceedings, 707, 381–387.
  • Smets, F., and Wouters, R. (2007), “Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach,” The American Economic Review, 97, 586–606. DOI: 10.1257/aer.97.3.586.
  • Stramer, O., and Bognar, M. (2011), “Bayesian Inference for Irreducible Diffusion Processes Using the Pseudo-Marginal Approach,” Bayesian Analysis, 6, 231–258. DOI: 10.1214/11-BA608.
  • Tran, M. N., Kohn, R., Quiroz, M., and Villani, M. (2016), “Block-Wise Pseudo-Marginal Metropolis-Hastings,” arXiv:1603.02485v2.