370
Views
0
CrossRef citations to date
0
Altmetric
Articles

Monte Carlo Simulation for Trading Under a Lévy-Driven Mean-Reverting Framework

&
Pages 207-230 | Received 15 Sep 2023, Accepted 04 Feb 2024, Published online: 21 Feb 2024

Abstract

We present a Monte Carlo approach to pairs trading on mean-reverting spreads modelled by Lévy-driven Ornstein-Uhlenbeck processes. Specifically, we focus on using a variance gamma driving process, an infinite activity pure jump process to allow for more flexible models of the price spread than is available in the classical model. However, this generalization comes at the cost of not having analytic formulas, so we apply Monte Carlo methods to determine optimal trading levels and develop a variance reduction technique using control variates. Within this framework, we numerically examine how the optimal trading strategies are affected by the parameters of the model. In addition, we extend our method to bivariate spreads modelled using a weak variance alpha-gamma driving process, and explore the effect of correlation on these trades.

1. Introduction

Many empirical studies have demonstrated examples of mean-reverting spreads in different markets. They can be generated by taking long and short positions in stocks and ETFs (Avellaneda and Lee Citation2010; Gatev, Goetzmann, and Rouwenhorst Citation2006; Huck and Afawubo Citation2015), futures contracts (CitationBrennan and SchwartzCitationDai, Zhong, and Kwok), physical commodity and commodity ETFs (Kanamura, Rachev, and Fabozzi Citation2010; Leung and Li Citation2016), as well as cryptocurrencies (Leung and Nguyen Citation2019).

Another popular approach, the stochastic spread method captures the path behaviour of the spread through a stochastic process with mean reversion. Given a mean-reverting spread process, we consider the problem of pairs trading, that is, extracting trading signals from the model based on the idea that a trade should be entered when the spread is far from the mean, and exited when it returns to the mean, so that the trader profits based on the tendency of the process to mean revert. This is a form of statistical arbitrage. The construction of spreads and extraction of trading signals are derived from the analysis of the underlying model. For instance, Elliott, Van Der Hoek, and Malcolm (Citation2005) propose a mean-reverting Gaussian Markov chain model to describe spread dynamics. The model's estimates are compared with observations of the spread to determine appropriate trading decisions. They solve an optimal double-stopping problem to analyse the timing of entry and exit subject to transaction costs.

Most prominently, Ornstein-Uhlenbeck (OU) processes have been used to model spreads for pairs trading. For classical OU processes, which are driven by Brownian motion, closed-form solutions to the pairs trading problem were obtained in Leung and Li (Citation2015) and Lipton and López de Prado (Citation2020).

Lévy processes have been widely applied in finance to model price processes with jumps (Cont and Tankov Citation2004; Schoutens Citation2003), which leads us to model price spreads with jumps. These Lévy models often provide a better fit than models based on Brownian motion (Madan, Carr, and Chang Citation1998; Michaelsen and Szimayer Citation2018; Schoutens Citation2003), where the first two references focus on variance gamma-based models. Jumps can also arise from hard-to-borrow stocks (Avellaneda and Lipkin Citation2009) or illiquid stocks (Gardini, Sabino, and Sasso Citation2022). This motivates going beyond the classical OU processes to consider Lévy-driven Ornstein-Uhlenbeck processes (LDOUP) to model mean-reverting spreads with jumps in pairs trading.

LDOUPs have been applied to other areas of mathematical finance such as modelling stochastic volatility (Barndorff-Nielsen and Shephard Citation2001), energy prices and energy derivatives (Benth and Schmeck Citation2014; Cummins, Kiely, and Murphy Citation2018; Sabino Citation2020), credit risk (Cariboni and Schoutens Citation2009), among others. LDOUPs also have increased flexibility for capturing skewness, kurtosis, and dependence in the multivariate setting. Parametric estimation of LDOUPs using maximum likelihood has been considered in the univariate and multivariate cases by Valdivieso, Schoutens, and Tuerlinckx (Citation2009) and Lu (Citation2022), respectively, and these methods can be applied to fit LDOUPs to spreads. However, in the context of pairs trading, LDOUPs have been applied restricting the driving process to compound Poisson processes with double exponential jumps. This setup is used since Borovkov and Novikov (Citation2008) derived an analytic formula for the expected exit time which relies on the memorylessness property of the exponential jumps to deal with overshoots. Without this assumption, overshoots present a difficult technical hurdle to finding analytic formulas for the passage times of these processes in general, which requires Monte Carlo methods to overcome. Based on Borovkov and Novikov (Citation2008) and Wu, Zang, and Zhao (Citation2020) provide an analytic value function for pairs trading, while Endres and Stübinger (Citation2019) use the analytic formula of Borovkov and Novikov (Citation2008), while ignoring overshoots, to conduct an empirical study of optimal pairs trading.

In this paper, we discuss a framework for pairs trading using Lévy-driven Ornstein-Uhlenbeck processes to model the mean-reverting spread, which works with any LDOUP that can be simulated. Specifically, we focus on a variance gamma driving process, which is a time-changed Brownian motion and a pure-jump, infinite activity Lévy process with skewness and kurtosis parameters. By using this driving process, these properties extend to the LDOUP, allowing for more flexible models of the price spread than is available in the classical OU model, and in contrast to the finite activity processes considered previously. However, the generalization comes at the cost of not having analytic formulas. Hence, we develop a Monte Carlo method to evaluate the trading strategies and determine the optimal trading levels. In particular, we introduce an array of control variates to reduce the variance of the Monte Carlo estimator of the value function. We derive a formula for the variance reduction ratio when the control variate is used to estimate the mean and variance of the profit simultaneously, which extends the usual approach where only one parameter is estimated by control variates. Furthermore, in our numerical implementation, we examine how the parameters of the model affect the optimal trading levels and optimal expected profit in various scenarios. Specifically, we demonstrate numerically how skewness in the distribution of the spread leads to asymmetry in the optimal trading levels, how the optimal exit level can differ from the stationary mean when the process starts sufficiently far away from it and the discount rate is large, and it is otherwise difficult to find other situations where this occurs. And lastly, we demonstrate how the jumps can increase the optimal entry level and optimal expected profit.

We note that all the studies mentioned above focus on the trading performance of a single spread. The recent paper by Lee, Leung, and Ning (Citation2023) introduces a framework for trading multiple mean-reverting spreads simultaneously, where capital is dynamically allocated among different spreads based on their statistical characteristics. In this paper, we also consider the trading problem involving two mean-reverting spreads modelled by an LDOUP driven by a weak variance alpha-gamma process, which is a multivariate generalization of the variance gamma process introduced in Buchmann, Lu, and Madan (Citation2019). For bivariate pairs trading, we estimate the expected profit via Monte Carlo simulation and examine the effect of the correlation between spreads on the optimal trading levels and optimal expected profit. We demonstrate an example where the optimal strategy is to effectively pick the best spread mostly ignoring correlation, and an example where having low absolute correlation is more profitable when monitoring multiple spreads with marginal components that follow the same law.

The rest of the paper is structured as follows. In Section 2, we review Lévy-driven Ornstein-Uhlenbeck processes and required properties used in this study. The trading problem is formulated in Section 3. We discuss our Monte Carlo estimation framework in Section 4 and present the numerical results in Section 5. In Section 6, we examine the trading problem involving two mean-reverting spreads. Concluding remarks are provided in Section 7.

2. Lévy-Driven Ornstein-Uhlenbeck Process

Let ZLn(μ,Σ,Z) denote an n-dimensional Lévy process with characteristic triplet (μ,Σ,Z), where μRn, ΣRn×n is a covariance matrix and Z is an n-dimensional Lévy measure. See Bertoin (Citation1996), Sato (Citation1999), and Schoutens (Citation2003) for references on Lévy processes.

A Lévy-driven Ornstein-Uhlenbeck process (LDOUP) X is defined by the stochastic differential equation (1) dX(t)=λX(t)dt+dZ(λt),X(0)=X0,t0,(1) where ZLn(μ,Σ,Z) is the background driving Lévy process (BDLP), λ>0, and X0 is a random vector independent of Z. Note that there is no loss in generality in using the drift term λX(t)dt instead of λ(μ¯X(t))dt for some μ¯Rn, since the stationary mean of X is controlled by E[Z(1)]. Also, there is no loss in generality in using the BDLP Z(λI), where I is the identity function, instead of Z, since any Lévy process Z~Ln(μ~,Σ~,Z~) can be written in the form Z~=DZ(λI), where ZLn(μ~/λ,Σ~/λ,Z~/λ). However, the former is more convenient since it leads to a stationary distribution that does not depend on λ. The exposition on LDOUP here follows Lu (Citation2022), while some additional references for LDOUPs include Masuda (Citation2004), Sato and Yamazato (Citation1983), and Sato and Yamazato (Citation1985).

The solution of (Equation1) is (2) X(t)=eλtX(0)+eλt0teλsdZ(λs),t0.(2) For t0=0,t1=Δ,,tq=qΔ, at the observation time ti, we have X(ti)=eλΔ(X(ti1)+ti1tieλsdZ(λs)),i=1,,q. The stochastic integral term is iid, and we let Z(Δ) be the random vector with this distribution, so it is interpreted as the innovation term, up to a factor eλΔ, of an AR(1) process. Specifically, we define Z(Δ)=0ΔeλsdZ(λs)=Dti1tieλsdZ(λs),Δ>0. The random vector Z(Δ) is infinitely divisible, so it is determined by its characteristic exponent, which is given by (3) ΨZ(Δ)(θ)=0λΔΨZ(etθ)dt,θRn,(3) where ΨZ is the characteristic exponent of the BLDP Z. Combining these results, the LDOUP can be written as (4) X(ti)=eλΔ(X(ti1)+Z(Δ)(i)),i=1,,q,(4) where Z(Δ)(i) is iid with the distribution of Z(Δ).

Next, we consider specific examples of LDOUP processes where the BDLP is a tractable pure-jump infinite activity Lévy process to model price spreads for pairs trading. We consider both the univariate and multivariate cases, where the BDLP is a variance gamma and a weak variance alpha-gamma process, respectively.

2.1. Univariate LDOUP Driven by a Variance Gamma Process

The variance gamma process was introduced in Madan and Seneta (Citation1990) and additional details can be found in Madan, Carr, and Chang (Citation1998). Let ΓS(a,b) denote a gamma subordinator with shape a>0 and rate b>0, and BMn(μ,Σ) denote a Brownian motion with drift μ=(μ1,,μn)Rn and covariance matrix Σ=(Σkl)Rn×n. A Lévy process VVGn(b,μ,Σ) is a variance gamma (VG) process if V=DηI+B(G,,G), where BBMn(μ,Σ) and GΓS(b,b) are independent, ηRn, and I is the identity function. The parameter η adds a drift.

We call the univariate LDOUP XOU-VG(λ,b,μ,σ2,η) an OU-VG process if its BDLP is ZVG1(b,μ,σ2,η).

2.2. Multivariate LDOUP Driven by a Weak Variance Alpha-Gamma Process

The weak variance alpha-gamma (WVAG) process was introduced in Buchmann, Lu, and Madan (Citation2019). Let n2, a>0, α=(α1,,αn)(0,1/a)n, βk:=(1aαk)/αk, k=1,,n and ηRn. Let αμ:=(α1μ1,,αnμn)Rn and αΣ:=(Σkl(αkαl))Rn×n. We define ZWVAGn(a,α,μ,Σ,η) as a weak variance alpha-gamma (WVAG) process if (5) Z=DηI+V0+(V1,,Vn),(5) where V0VGn(a,aαμ,aαΣ) and VkVG1(βk,αkβkμk,αkβkΣkk), k=1,,n, are independent. This formulation gives a direct method of simulating the WVAG process in terms of VG processes, and seeing that the process has common and idiosyncratic jumps.

The original definition of the WVAG process (with η=0), (6) Z=DBT,(6) where BBMn(μ,Σ), T an n-dimensional alpha-gamma subordinator, and ⊙ is the weak subordination operation, gives a multivariate time-change interpretation to the process, generalizing the univariate VG process. Indeed, the marginal components of the WVAG process are general univariate VG processes with ZkVG1(1/αk,μk,Σkk,ηk).

We call a multivariate LDOUP XOU-WVAGn(λ,a,α,μ,Σ,η) an OU-WVAG process if its BDLP is ZWVAGn(a,α,μ,Σ,η).

Based on the moment formulas (see Michaelsen and Szimayer Citation2018), the parameters of the WVAG process can be interpreted as follows. The marginal parameters αk,μk,Σkk,ηk are kurtosis, skewness, variance, and location parameters, respectively, that affect the marginal component distribution, while a,Σkl,kl, are joint parameters. In particular, as αk0 (equivalently, the VG parameter bk=1/αk in the kth component), the kth marginal component of the WVAG process converges in law to Brownian motion so the sample paths become closer to continuous.

2.3. Moment Formulas

For a random vector U=(U1,,Un), denote the mean by m1(U):=E[U], and the kth central moment of U by mk(U):=(E[(U1E[U1])k],,E[(UnE[Un])k]), k2.

For a general BDLP Z, Lu (Citation2022, Lemma 1) gives formulas for the moments of Z(Δ) in terms of the moments of Z. Combining this with (Equation2) and conditional on X0 (or assuming it is constant), the moments of the LDOUP X are given by m1(X(t))=eλtX0+(1eλt)m1(Z(1)),m2(X(t))=(1e2λt2)m2(Z(1)),m3(X(t))=(1e3λt3)m3(Z(1)),Cov(Xk(t),Xl(t))=(1e2λt2)Cov(Zk(1),Zl(1)),kl. The stationary distribution Y of X is the distribution such that if X0=DY, then X(t)=DY for all t0. Thus, the stationary mean, which is the mean of the stationary distribution, is μ¯=E[Z(1)]. This is the level that we expect the process to return to in pairs trading.

Now specializing to the case where XOU-WVAGn(λ,a,α,μ,Σ,η), and using the moment formulas for the WVAG process in Michaelsen and Szimayer (Citation2018, Remark 4 and Appdendix A.1), we have (7) E[X1(t)]=eλtX0+(1eλt)(η1+μ1),(7) (8) Var(X1(t))=(1e2λt2)(Σ11+α1μ12),(8) (9) Skew(X1(t))=(1e3λt3)(3Σ11μ1α1+2μ13α12)((1e2λt2)(Σ11+μ12α1))3/2(9) (10) Cov(X1(t),X2(t))=(1e2λt2)a((α1α2)Σ12+α1α2μ1μ2).(10) Furthermore, μ¯=μ+η. These formulas will be used for control variates in the Monte Carlo method. Note that (Equation7) to (Equation9) apply to OU-VG processes with b1=1/α1.

3. Pairs Trading Problem

Suppose the spread of a pair of assets or the price of a portfolio of assets is mean-reverting and follows a univariate LDOUP X.

Now we proceed to determine the levels at which trades should be entered or exited. Consider the following pairs trading strategy on the spread X: Enter a short position in X when it is above μ¯+d, and close the position by buying when it is below μ¯+c, where 0c<d. Enter a long position when it is below μ¯d, and close the position by selling when it is above μ¯c. Fix a terminal time T>0, where the trade is closed by time T if it has not been closed already. Thus, d and c are the entry and exit levels relative to the mean reversion level μ¯.

This trading strategy is depicted for a sample path of X in Figure . Note that the price at which the trade enters and exits is not the same as the entry and exit levels due to the presence of jumps. In this model when X is an OU-VG process, there is always some degree of overshoot.

Figure 1. For the spread process X (black line), a trade is entered when the price passes ±d, whichever happens first (red lines), and exited when it passes ±c=0 (blue line). The value of the spread at the entry (red point) and exit (blue point) times are shown. Here, XOU-VG(1,5,0,0.015,0).

Figure 1. For the spread process X (black line), a trade is entered when the price passes ±d, whichever happens first (red lines), and exited when it passes ±c=0 (blue line). The value of the spread at the entry (red point) and exit (blue point) times are shown. Here, X∼OU-VG(1,5,0,0.015,0).

More generally, we can consider possibly different entry levels d+, d from above and below μ¯, and their respective exit levels c+,c. These satisfy 0c±<d±.

The relevant passage times are τd+=inf{t0:X(t)>μ¯+d+}T,τc+=inf{tτd+:X(t)<μ¯+c+}T,τd=inf{t0:X(t)<μ¯d}T,τc=inf{tτd:X(t)>μ¯c}T. Let rR be a discount rate, then P is the discounted profit over one trade cycle given by (11) P:=P(c+,c,d+,d)=1{τd+τd}erτc+(X(τd+)X(τc+))+1{τd+>τd}erτc(X(τc)X(τd)).(11) For convenience, we will refer to this simply as the profit. The value function is (12) V:=V(c+,c,d+,d)=E[P]γVar(P)=E[P]γE[P2]+γE[P]2,(12) where γ0 is a parameter which penalizes trading strategies with high variance. If γ=0, the value function is the expected profit.

We want to find the optimal levels c±,d± for the pairs trading strategy to maximize the value function, that is, maxc+,c,d+,dV(c+,c,d+,d). We develop a Monte Carlo approach to simulate sample paths of the LDOUP X, to estimate V(c+,c,d+,d) on 0c±<d±, and in turn numerically solve the maximization problem for the optimal trading levels. Given the use of simulation, we also discuss control variates for variance reduction.

4. Monte Carlo Method

Let m be the number of Monte Carlo simulations. For a spread process following a univariate LDOUP X, suppose we have m simulations of X(t0),,X(tq) with a small stepsize Δ>0. Then we can compute the random variable P, and hence obtain the Monte Carlo estimate of E[P] and E[P2] to estimate the value function V in (Equation12). The same sample paths are used for all evaluations of the same value function. While the Monte Carlo method presented here works for any LDOUP that can be fitted and simulated, we specifically focus on the OU-VG process.

4.1. Simulating the Spread

Let the spread process be XOU-VG(λ,b,μ,σ2,η). We assume that the parameters are known. For example, they may be estimated using maximum likelihood as in Valdivieso, Schoutens, and Tuerlinckx (Citation2009) and Lu (Citation2022).

It is possible to exactly simulate Z(Δ) for the OU-VG process, and hence X(t0),,,X(tq) by (Equation4), using the method of Sabino (Citation2020) and Qu, Dassios, and Zhao (Citation2021). For GΓS(a,b), define the random variable G(Δ)=DeλΔ(G~+C~), where G~Γ(Δ,beλΔ) is gamma where the parameters are the shape and rate, respectively, and C~CP(aλ2Δ2/2,PJ) is compound Poisson where the parameters are the rate and jump distribution, respectively. Here, PJ is the probability law of J such that J|UExponential(beλΔU) where the parameter is the rate, and UUniform(0,1). Then Z(Δ)=Dη(eλΔ1)+G+(Δ)G(Δ), where G+ΓS(b,b+), GΓS(b,b), and b±:=2bμ2+2σ2b±μ.

4.2. Control Variates

Next, we use control variates to reduce the variance of the Monte Carlo estimators of the value function. We refer to Glasserman (Citation2003, Chapter 4) for the details of the method. Specifically, to estimate E[P], the method of control variates defines the random variable PC=Pβ(CE[C]), where C is the control variate for which E[C] has an analytic representation, and β is chosen to minimize Var(PC), resulting in an unbiased estimator with reduced variance. Since the true value of β is unknown, the control variate estimator P^C is the sample mean of PC with β estimated by β^, the least squares estimator (LSE) of the slope in a linear regression where C is the regressor and P is the response variable. Note that β^ is also the sample estimate of the true β.

This method can be extended to multiple control variates, and the control variate estimator is equivalent to the prediction of a linear regression. See Appendix for details.

We apply the control variate method to estimate both E[P] and E[P2] in the value function (Equation12). Consider the control variates X(t1)μ¯,,X(tp)μ¯, where 0,t1,,tp are equally spaced points on [0,T]. When estimating E[P2], we include the additional control variates (X(t1)μ¯)2,,(X(tp)μ¯)2. Note that E[X(t)μ¯] and E[(X(t)μ¯)2] can be analytically computed using (Equation7) and (Equation8).

Further, we may want to consider the event of entering and exiting a trade, and the event of entering and not exiting over the observations X(t1),,X(tp) as additional control variates to approximate the indicator variable appearing in the profit function (Equation11). But the event that X(t1),,X(tp) passes some level cannot be computed analytically for use as a control variate. Instead, the probability of a related event using the iid sequence of innovations Z(Δ)(i), i=1,,p can be. Specifically, this motivates the inclusion of two additional control variates 1A and 1B. Define A as the event that Z(Δ)(1),,Z(Δ)(p) “enters a trade” (even though there is no actual trading on this sequence) by passing the level eλΔ(μ¯+d)μ¯ or eλΔ(μ¯d)μ¯ and then “exits the trade” by passing the level eλΔ(μ¯+c)μ¯ or eλΔ(μ¯c)μ¯, respectively.

The modified levels for Z(Δ)(i) are the levels such that X(ti)=eλΔ(X(ti1)+Z(Δ)(i)) passes the entry levels μ±d when X(ti1)=μ¯, and similarly for the exit levels. Further, B is the same as A except without exit. Thus, A and B are the events that Z(Δ)(i), i=1,,p, enters and exits these modified levels, and enters and does not exit, respectively. Due to collinearity, the complementary event that the sequence does not enter is not included, and if p = 1, the event A, which has probability 0, is also not included. We have expressed this for symmetric levels c=c+=c and d=d+=d, but this can be modified in an obvious way for the asymmetric case.

Since Z(Δ)(i), i=1,,p, are iid, we can compute the probabilities of A and B since we can compute P(Z(Δ)x) by applying the Fourier inversion to the characteristic function of Z(Δ) determined by (Equation3). The Fourier inversion methodology is described in Michaelsen and Szimayer (Citation2018, Section 4.1).

4.3. Variance Reduction

Consider the two separate linear regression models, (13) Y1=X1β1+ϵ1,Y2=X2β2+ϵ2,(13) where Y1 and Y2 are the vector of m simulated values of P and P2, respectively. In this section only, we let X1R(p1+1)×m and X2R(p2+1)×m be the design matrices, including an intercept. As outlined in Section 4.2, X1 includes the control variates X(ti)μ¯, i=1,,p, 1A, 1B, and X2 includes the control variates in X1 in addition to (X(ti)μ¯)2, i=1,,p. Since each simulated sample path of the LDOUP is independent, we have Cov(ϵi)=Σϵ,iiI, i = 1, 2 and Cov(ϵ1,ϵ2)=Σϵ,12I for some 2×2 covariance matrix Σϵ=(Σϵ,ij)i,j=12, where I is the identity matrix.

For each of the linear regression models, i = 1, 2, let μC,i be the mean vector of the control variates computed as outlined in Section 4.2, then the predictions xi=(1,μC,i) are Y^i=xiβ^i. As explained in Appendix, the control variate estimators of E[P] and E[P2] are the predictions Y^1 and Y^2, respectively. Thus, for each argument, the control variate estimator of the value function in (Equation12) is V^C:=Y^1γY^2+γY^12=bY^+Y^AY^, where A=(γ000),b=(1γ),Y^=(Y^1Y^2). Let Y¯1,Y¯2 be the sample mean of Y1,Y2 respectively, then V^M:=Y¯1γY¯2+γY¯12 is the corresponding Monte Carlo estimator of the value function.

Given that our approach involves parameters that are estimated by a function of two control variate estimators, rather than the usual situation of only one, we now derive the variance reduction factor in this context. For a large number of Monte Carlo simulations m, the asymptotic normality of the LSE gives Y^N(μC,ΣC) approximately for some μCR2 and ΣCR2×2. Using results on multivariate linear regression, and the law of total variance, noting that X1 and X2 are random, we have μC=(x1β1x2β2),ΣC=(Σϵ,11x1E[(X1X1)1]x1Σϵ,12x1E[(X1X1)1X1X2(X2X2)1]x2Σϵ,22x2E[(X2X2)1]x2), where denotes the rest of the matrix is completed by symmetry. By the plug-in principle, these parameters are estimated using μ^C=(x1β^1x2β^2),Σ^C=(Σ^ϵ,11x1(X1X1)1x1Σ^ϵ,12x1(X1X1)1X1X2(X2X2)1x2Σ^ϵ,22x2(X2X2)1x2), where Σϵ,ij is estimated using Σ^ϵ,ij=ϵ^iϵ^jmr1, and ϵ^1,ϵ^2 are the residuals from the linear regressions in (Equation13), r=p1 for i = j = 1, and r=p2 otherwise.

Now since V^C is a quadratic form in Y^, and Y^N(μC,ΣC) holds approximately for large m, by Rencher and Schaalje (Citation2008, Theorem 5.2c–d), the variance of the control variate estimator is (14) Var(V^C)bΣCb+2tr((AΣC)2)+4μAΣCAμC+4bΣCAμC=(1+4μC,1γ+4(μC,1)2γ2)ΣC,11+2γ2(ΣC,11)22γ(1+2μC,1γ)ΣC,12+γ2ΣC,22.(14) Then estimated variance Var^(V^C) is obtained by replacing μC and ΣC with μ^C and Σ^C, respectively. By the asymptotic normality of V^M, (Equation14) is also used to compute Var^(V^M) by replacing μ^C and Σ^C with μ^M=(Y¯1,Y¯2) and Σ^M=Σ^Y/m, respectively, where Σ^YR2×2 with the sample covariance Cov^(Yi,Yj) as its (i,j)-entry.

Thus, the estimated variance reduction factor of using control variates relative to Monte Carlo without control variates is R=Var^(V^C)Var^(V^M). In the context of a single parameter estimated using control variates, while the true variance reduction factor using the true value of βi is no greater than 1, when using LSE β^i instead, R>1 is possible. This topic is discussed in Glasserman (Citation2003, Section 4.1.3).

5. Numerical Implementation and Analysis

In this section, we implement the Monte Carlo method outlined above for various scenarios. The spread XOU-VG(λ,b,μ,σ2,η) is simulated with stepsize Δ=0.01 and terminal time T = 50. In all cases, μ=η, and any changes to μ adjusts η accordingly to ensure that the stationary mean is μ¯=0. Unless otherwise stated, the initial value is X0=0, γ=0, the discount rate is r = 0.01, and it is assumed for simplicity that the entry level d=d+=d is symmetric and the exit level is fixed at c=c+=c=0. Thus, we optimize for the entry level d. All results use m = 10, 000 Monte Carlo simulations.

5.1. Effect of Control Variates

Consider two examples using control variates.

Example 5.1

Suppose the spread X has parameters (λ,b,μ,σ2,η)=(1,1,0.5,0.015,0.5), and γ=0.1. The number of time points p used in the control variate method can be chosen to obtain the optimal variance reduction factor, however, this also depends on the entry level d. We optimize this iteratively by using no control variates to optimize d, then use this value of d to optimize p with the resulting optimal value denoted p. Doing this, Figure  shows that for d = 1.086, the optimal number of time points for the control variates is p=130 when searching over p=10,20,,200. This gives a moderate optimal variance reduction factor of R=0.735.

Figure 2. Estimated variance reduction factor R as a function of the number of time points for the control variates p. The parameters are (λ,b,μ,σ2,η)=(1,1,0.5,0.015,0.5), X0=0, γ=0.1, r = 0.01.

Figure 2. Estimated variance reduction factor R as a function of the number of time points for the control variates p. The parameters are (λ,b,μ,σ2,η)=(1,1,−0.5,0.015,0.5), X0=0, γ=0.1, r = 0.01.

Using p=130, Figure  shows the estimate of the value function V(d), which has optimal entry level d=1.086. The control variate reduces Var(V^C), however, it does not change the optimal entry level d in this example because the variability is already quite low, in fact, the coefficient of variation of V^C(d), the estimate of V(d) conditional on d, is 0.0038 after control variates.

Figure 3. The estimate V^C(d) of the value function V(d) and the optimal entry level d=1.086. The parameters are (λ,b,μ,σ2,η)=(1,1,0.5,0.015,0.5), X0=0, γ=0.1, r = 0.01.

Figure 3. The estimate V^C(d) of the value function V(d) and the optimal entry level d∗=1.086. The parameters are (λ,b,μ,σ2,η)=(1,1,−0.5,0.015,0.5), X0=0, γ=0.1, r = 0.01.

Different choices in the parameters result in different variance reductions. In Table , results are shown varying b,μ (which also affects η), and for the parameters, decreasing b>0 increases R, while decreasing μ<0 reduces and then increases R.

Table 1. For various values of (b,μ), the optimal number of time points for the control variates p (searching over p=10,20,200) and the optimal variance reduction factor R are shown.

Consider the case shown in the second row of Table  where (b,μ)=(2,0.05) and p=120. Surprisingly, for large γ, such as γ=1.5, it is possible that the control variate method reduces the variance of the estimator of E[P] with an estimated variance reduction factor 0.910, and E[P2] with estimated variance reduction factor 0.816, but increases the variance of the estimator of V(d)=E[P]γE[P2]+γE[P]2 with variance reduction factor 1.139. In comparison, when γ=0.1, this does not occur, and those estimated variance reduction factors are 0.892, 0.793, 0.904, respectively.

Example 5.2

We use the same setup as Example 5.1, except X now has parameters (λ,b,μ,σ2,η)=(0.01,50,0.5,4,0.5). Since λ0, this process has little mean reversion, and is close in law to a variance gamma process, so the value function is estimated with a relatively large amount of simulation error as shown in Figure . To address this, a locally estimated scatterplot smoothing (loess) smoother is applied to the estimated value function before maximizing it. Then the optimal entry level is d=0.506 without using control variates, and d=0.456 using the optimal number of control variates, which is p=1. The optimal variance reduction factor is R=0.817 and the coefficient of variation of V^C(d) is 0.0281, which is consistent with the much larger variability of the estimated value function compared to Example 5.1, Figure . Unlike in Figure , using control variates here has an effect on the estimate of d, the size of this effect varies with the simulation. Note that R is the variance reduction of estimating V(d) for fixed d using control variates, and the method is not designed to reduce the standard error of d, although reducing the former should be expected to reduce the latter. For these optimal values, only 63.91% of simulations resulted in a trade that is entered and exited before the terminal time due to the slow mean reversion, compared to 90.88% in Figure .

Figure 4. The estimate of the value function V(d) (solid lines) and the corresponding loess smooth (dashed lines) with (red lines) and without (black lines) control variates. The optimal entry level is d=0.506 without using control variates, and d=0.456 using the optimal number of control variates p=1. The parameters are (λ,b,μ,σ2,η)=(0.01,50,0.5,4,0.5), X0=0, γ=0.1, r = 0.01.

Figure 4. The estimate of the value function V(d) (solid lines) and the corresponding loess smooth (dashed lines) with (red lines) and without (black lines) control variates. The optimal entry level is d∗=0.506 without using control variates, and d∗=0.456 using the optimal number of control variates p∗=1. The parameters are (λ,b,μ,σ2,η)=(0.01,50,0.5,4,−0.5), X0=0, γ=0.1, r = 0.01.

5.2. Asymmetry of Optimal Trading Levels

From now on, we set γ=0 and do not include control variates. In the previous examples, we fixed d=d+=d for simplicity. The parameter μ controls the asymmetry of the stationary distribution Y of the LDOUP X. If μ=0, then the stationary distribution is symmetric and hence the optimal levels are equal, d+=d. However, under asymmetry, we may want to optimize both d+ and d separately.

Here, we consider the optimal level in four cases: from a large negative skew to no skew. The other parameters of X are (λ,b,σ2)=(1,5,0.015). The results are shown in Table , where Skew(Y) is the skewness of the stationary distribution, which by (Equation9), is Skew(Y)=23/23(3σ2μ/b+2μ3/b2(σ2+μ2/b)3/2). The results in Table  show how d+ and d may differ due to skewness. For negatively skewed spreads, the trade is more likely to enter from the negative level, and so the positive level is slightly lower to compensate. Figures  and  show the sample paths of the spread when μ=0.5 (large negative skew) and μ=0 (no skew), respectively.

Figure 5. A sample path of the spread XOU-VG(1,5,0.5,0.015,0.5) using the optimal entry levels in the first row of Table .

Figure 5. A sample path of the spread X∼OU-VG(1,5,−0.5,0.015,0.5) using the optimal entry levels in the first row of Table 2.

Table 2. For various value of μ, the optimal entry levels d+ and d are shown.

In the classical OU spread model, which has no skew, Leung and Li (Citation2015) derive the optimal entry and exit levels with transaction costs, where the two levels can also be asymmetric.

5.3. Exit Level Different From the Mean

Recall that we set c=c+=c for simplicity. So far, we have set c = 0. However, it is possible to find examples where c is much greater than 0. Such a situation can occur when X0>μ¯ and the discount rate r is very large. Let X0=0.25 and r = 1 while the other parameters are as before with (λ,b,μ,σ2,η)=(1,5,0,0.015,0). We optimize c, d for 0<c<d, and find that the optimal exit level is c=0.105, while the optimal entry level d is any value c<d<0.25 as shown in Figure , which plots the estimate of the value function V(c,d).

Figure 6. The estimate of the value function V(c,d) with entry level d and exit level c. The parameters are (λ,b,μ,σ2,η)=(1,5,0,0.015,0), X0=0.25, γ=0, r = 1.

Figure 6. The estimate of the value function V(c,d) with entry level d and exit level c. The parameters are (λ,b,μ,σ2,η)=(1,5,0,0.015,0), X0=0.25, γ=0, r = 1.

Thus, the pairs trading strategy is to enter the trade immediately, and because the large discount rate greatly reduces the profit if the trade is not exited quickly, the trade is exited at c=0.105 above the stationary mean μ¯ rather than waiting for it to fall to μ¯.

Figure  shows how the optimal exit level c changes for different values of the discount rate r. For the plotted values, the point at which c>0.000 (to 3 decimal places) is r = 0.3. For all these plotted values of r, if the initial value is instead set to X0=μ¯=0, then the optimal exit level would be c=0.000. This demonstrates how having an initial value different from the stationary mean and a large discount rate can make the optimal exit level c different from 0. It is difficult to find other situations where this occurs, which suggests optimizing c may have little benefit beyond this, and fixing c = 0, which corresponds to exiting the trade when the spread passes μ¯, simplifies the optimization.

Figure 7. The optimal exit level c for various value of discount rate r=0.01,0.1,0.2,,1.5. The other parameters are (λ,b,μ,σ2,η)=(1,5,0,0.015,0), X0=0.25, γ=0.

Figure 7. The optimal exit level c∗ for various value of discount rate r=0.01,0.1,0.2,…,1.5. The other parameters are (λ,b,μ,σ2,η)=(1,5,0,0.015,0), X0=0.25, γ=0.

5.4. Effect of Jumps

Now we consider the effect of the parameter b, which affects the jumps of the LDOUP X, on the optimal entry level d. As before, let (λ,μ,σ2,η)=(1,0,0.015,0). As b, the OU-VG process for the spread converges in law to a classical OU process driven by Brownian motion, and the sample paths are closer to continuous. As b0, roughly speaking, the sample paths become more jumpy (specifically, for each t, the BDLP Z(t) converges in distribution to a constant while its kurtosis diverges). Define the overshoot for the entry level d as the random variable O={|X(τd)(μ¯+d)|ifτd<τd,|X(τd)(μ¯d)|ifτdτd. Table  shows the effect of the sample paths of the spread, roughly speaking, being very jumpy (b = 1), moderately jumpy (b = 5), and approximately continuous (b=100). A plot of a sample path in each of these cases is given in Figures (a),  and (b), respectively. The results show that the more jumpy the sample path, the higher the level at which the trade is entered to compensate for overshooting. When b = 1, the mean overshoot is relatively large (0.0640) compared to the optimal entry level d (0.246). This demonstrates the importance of accounting for jumps by using LDOUPs compared to the classical OU process. Both Figure  and Table  show that as the sample path becomes closer to continuous, the mean of the overshoots becomes smaller and the optimal expected profit V^M(d) becomes close to d as expected, although they do not converge since r is nonzero. In this example, V^M(d) increases as b decreases. Note that the stationary variance remains constant, so the increased profitability is not due to changes in the variance.

Figure 8. A sample path of the spread XOU-VG(1,b,0,0.015,0), where (a) b = 1 and (b) b = 100, using the optimal entry levels in the first and third row of Table .

Figure 8. A sample path of the spread X∼OU-VG(1,b,0,0.015,0), where (a) b = 1 and (b) b = 100, using the optimal entry levels in the first and third row of Table 3.

Table 3. For various values of b, the optimal entry level d, the estimate V^M(d) of the optimal expected profit, and overshoot statistics are shown.

6. Bivariate Pairs Trading

We now consider pairs trading in a bivariate setting, where the spread X=(X1,X2) is a bivariate LDOUP with stationary mean μ¯=(μ¯1,μ¯2). While we focus on the 2-dimensional case, the concepts here can in principle be extended to the n-dimensional case.

6.1. Paris Trading Problem

While there are many ways to set up a pairs trading strategy on multiple spreads, we use the following setup to generalize the notion of one trade cycle used in the univariate setting. A trade is entered for the first time that either spread Xk is above μ¯k+dk or below μ¯kdk for k = 1, 2. If a trade on spread Xk is entered, then exit occurs when the spread passes μ¯k+ck or μ¯kck, respectively. When a trade is entered, no further trades can be initiated, even if the other spread later passes the entry level.

More generally, it is possible to have asymmetric entry and exit levels for both spreads, where d+,1, d,1 are the levels for entering a trade on spread X1 from the positive and negative sides, respectively, and similarly for the other entry and exit levels. For each spread k = 1, 2, there are 8 relevant passage times defined by τd+,k=inf{t0:Xk(t)>μ¯k+d+,k}T,τc+,k=inf{tτd+,k:Xk(t)<μ¯k+c+,k}T,τd,k=inf{t0:Xk(t)<μ¯kd,k}T,τc,k=inf{tτd,k:Xk(t)>μ¯kc,k}T, with 0c±,k<d±,k. However, we will not need to use this much generality in the examples below.

The nonzero probability of common jumps for the WVAG process used as the BDLP, and hence for X, means that the event {τd+,1=τd+,2} also has nonzero probability, that is, both spreads can pass their corresponding entry levels simultaneously. In this case, we invest with 50% weight in each of the spreads. An alternative approach would be to invest fully in the spread furthest from the mean-reverting level. However, in the numerical examples we consider below, both approaches produce virtually the same results, since it is rare that both spreads pass the optimal entry levels simultaneously. Therefore, we will only consider the former approach.

Let τ=min{τd+,1,τd,1,τd+,2,τd,2}, c=(c+,1,c,1,c+,2,c,2) and d=(d+,1,d,1,d+,2,d,2). The profit over one trade cycle is given by P:=P(c,d)=1{τ{d+,1,d+,2}}k=1,2erτc+,kWk(Xk(τd+,k)Xk(τc+,k))+1{τ{d,1,d,2}}k=1,2erτc,kWk(Xk(τc,k)Xk(τd,k)), where the weights are (W1,W2)={(1,0)ifτd+,1<τd+,2orτd,1<τd,2,(0,1)ifτd+,1>τd+,2orτd,1>τd,2,(0.5,0.5)otherwise, and r is the discount rate.

The objective function is the value function V(c,d)=E[P]γVar(P),γ0. We want to find the optimal level c,d for the pairs trading strategy to maximize the value function, that is maxc,dV(c,d).

6.2. Simulating the Spread

Now we specifically let the spread be XOU-WVAG2(a,α,μ,Σ,η). To simulate X(t0),,X(tq), we use the Euler scheme approximation in Lu (Citation2022, Sections 4.2 and 5.1), which is based on taking a stochastic integral representation of Z(Δ), which can be approximately simulated whenever Z(t) can be simulated, in this case using (Equation5). This is in contrast to the univariate OU-VG process which was simulated exactly in Section 4 with a method that has no multivariate generalization. We now determine the optimal trading levels using Monte Carlo methods.

6.3. Numerical Implementation and Analysis

We consider two examples of bivariate pairs trading on the spread XOU-WVAG2(λ,a,α,μ,Σ,η) simulated using the approximate method above with stepsizes Δ=0.01 and Δ~=0.001, and terminal time T = 50. In all cases, the initial value is X0=0, γ=0, we do not consider control variates, and μ=η so that μ¯=0. Unless stated otherwise, we assume for simplicity that the entry levels are symmetric with dk=d+,k=d,k, k = 1, 2, and the exit levels are c=0, so we optimize the value function over d1,d2. We use m = 10,000 Monte Carlo simulations.

In our numerical examples, we explore the effect of correlation on the optimal entry level and the optimal expected profit. In the first example, the correlation has little effect, whereas in the second example, it has a larger effect. The correlation of the components of X is in part controlled by the correlation parameter ρ=Σ12/Σ11Σ22. However, note that ρ is the correlation of the Brownian motion subordinate in (Equation6), and is related to the correlation of the components of X by (15) Corr(X1(t),X2(t))=Corr(Z1(t),Z2(t))=a((α1α2)Σ12+α1α2μ1μ2)Σ11+α1μ12Σ22+α2μ22(15) for all t>0, due to (Equation10) and (Equation8).

Example 6.1

Let (16) λ=1,a=2.5,α=(0.20.3),μ=(00.2),Σ=(0.015ρΣ11Σ220.02),(16) and r = 0.01.

Table  shows the optimal entry levels d1 and d2, the optimal expected profit V^M(d1,d2), and the estimated probability of which spread is traded, and each of these quantities is approximately equal across the values of ρ considered.

Table 4. For various values of ρ, the optimal entry levels d1,d2, the estimate V^M(d1,d2) of the optimal expected profit, and the estimated probability of which spread X1,X2 is traded are shown.

When pairs trading on the univariate spreads X1 and X2, the optimal expected profit is 0.227 (see the second row of Table ) and 0.331 respectively, while it is 0.334–0.336 in this bivariate pairs trading example. Bivariate pairs trading is more profitable than univariate pairs trading on either spread, although it is approximately as profitable as trading only X2 since this accounts for 79–84% of trades. The optimal expected profit of the bivariate pairs trade must be at least that of the best univariate pairs trade, as the latter strategy can be implemented within bivariate pairs trading by setting the other spread to have an arbitrarily large optimal entry level. Indeed, this is happening to some extent here as the optimal entry level of the spread X1 has increased from 0.220 in the univariate case to 0.305–0.318 in the bivariate case. Thus, this decreases the probability of only X1 entering a trade from 96.79% to 11–16% in Table  (the latter accounts for the probability that a trade on X1 fails to be entered because a trade on X2 is entered first, while the former is univariate pairs trading and does not include this). So in this example, the method has essentially identified that X2 is the more profitable spread, regardless of correlation, and has accordingly increased the optimal entry level of X1 to reduce the instances where it is traded.

Example 6.2

Now we consider an example where different values of the correlation parameter ρ can have a larger effect on the optimal entry level and optimal expected profit. Suppose the parameters are now (17) λ=1,a=6.65,α=(0.150.15),μ=0,Σ=(0.015ρΣ11Σ220.015),(17) and r = 1.

Recall the parameter boundary a<1α11α2. In Example 6.1, the upper bound of Corr(X1(t),X2(t)) as a function of ρ(1,1) is 0.395 by (Equation15). The upper bound converges to 1 as a1αk while keeping αk,μk,Σkk,ηk constant for all k. Since a1α1=1α2 here, this example represents monitoring two spreads X1=DX2 that are equal in law for each fixed ρ, with X1 and X2 converging pathwisely as ρ1, since in the limit, Corr(X1(t),X2(t))=1 for all t with X0=0. Also, appealing to the symmetry of X(t) for each fixed t, all components of the optimal entry levels d are equal to d, say. Thus, we optimize over d, the symmetric entry level for all components.

Table  shows the results for this example over different values of ρ. Here, the optimal expected profit V^M(d) increases from 0.032 to 0.040 as ρ decreases from 0.99 to 0, this represents a 26% increase in the optimal expected profit. The case where ρ=0.99 is approximately the case of univariate pairs trading, which occurs in the limit as ρ1. Since V^M(d) is larger when ρ is much less than 1, the results demonstrate a situation where bivariate pairs trading is much more profitable than what is effectively pairs trading on a single spread, and where correlation has a larger effect.

Table 5. For various value of ρ, the optimal entry level d and the estimate V^M(d) of the optimal expected profit.

These results are consistent with intuition. Monitoring multiple spreads of the same law presents additional opportunities to enter into the trade sooner, which increases the optimal expected profit relative to monitoring only one spread, while having a larger discount rate r substantially rewards entering the trade sooner. Thus, in this example, using bivariate pairs trading by choosing pairs that are uncorrelated (as represented by the ρ=0 case) is much more profitable than either univariate pairs trading or choosing highly correlated spreads (both represented by the ρ1 case).

7. Conclusions

We have presented a Monte Carlo framework for evaluating pairs trading strategies where the mean-reverting spread follows an LDOUP. We numerically demonstrated how different parameters affect the optimal trading level and value function and capture various aspects of the trading strategy. The framework is flexible to accommodate a wide class of LDOUPs, provided that they can be fitted and simulated, as well as different ways of trading the spread. For instance, methods for fitting and simulating univariate LDOUPs with tempered stable and normal inverse Gaussian stationary distributions are provided in Valdivieso, Schoutens, and Tuerlinckx (Citation2009). Our proposed control variates for variance reduction, can also be applied to similar problems.

For future research, one useful direction is to incorporate transaction costs into the trading problems. To that end, Do and Faff (Citation2012) examine the profitability of pairs trading accounting for transaction costs, and Leung and Li (Citation2015) analyse an optimal stopping approach to pairs trading with transaction costs and stop loss, and this work could be extended to spreads following LDOUPs. Another possibility is to consider large jumps in stock prices that trigger immediate liquidation due to the need to return the stocks. This can also be incorporated into the stochastic model and value function. Additional extensions of our framework include portfolio optimization problems arising from trading multiple spreads and spread selection based on statistical and other characteristics. Theoretical results on pairs trading under LDOUP models are very limited, and additional work in this direction would be useful. However with LDOUPs, these problems are complicated and unlikely to admit analytic solutions, so simulation-based approaches can be both mathematically interesting and practically useful.

Acknowledgements

Kevin Lu thanks the Department of Applied Mathematics at the University of Washington, where the substantial majority of his work on this article was completed. We thank an anonymous referee for the useful comments which have helped improve the paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Avellaneda, M., and J.-H. Lee. 2010. “Statistical Arbitrage in the US Equities Market.” Quantitative Finance 10 (7): 761–782. https://doi.org/10.1080/14697680903124632.
  • Avellaneda, M., and M. Lipkin. 2009. “A Dynamic Model for Hard-to-Borrow Stocks.” Risk 22 (6): 92–97.
  • Barndorff-Nielsen, O. E., and N. Shephard. 2001. “Non-Gaussian Ornstein-Uhlenbeck-Based Models and Some of Their Uses in Financial Economics.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 167–241. https://doi.org/10.1111/1467-9868.00282.
  • Benth, F. E., and M. D. Schmeck. 2014. “Pricing Futures and Options in Electricity Markets.” In The Interrelationship Between Financial and Energy Markets, edited by S. Ramos and H. Veiga, 233–260. Berlin-Verlag: Springer.
  • Bertoin, J. 1996. Lévy Processes. Cambridge: Cambridge University Press.
  • Borovkov, K., and A. Novikov. 2008. “On Exit Times of Lévy-Driven Ornstein-Uhlenbeck Processes.” Statistics and Probability Letters 78 (12): 1517–1525. https://doi.org/10.1016/j.spl.2008.01.017.
  • Brennan, M. J., and E. S. Schwartz. 1990. “Arbitrage in Stock Index Futures.” Journal of Business 63 (1): S7–S31. https://doi.org/10.1086/jb.1990.63.issue-S1.
  • Buchmann, B., K. W. Lu, and D. B. Madan. 2019. “Weak Subordination of Multivariate Lévy Processes and Variance Generalised Gamma Convolutions.” Bernoulli 25 (1): 742–770. https://doi.org/10.3150/17-BEJ1004.
  • Cariboni, J., and W. Schoutens. 2009. “Jumps in Intensity Models: Investigating the Performance of Ornstein-Uhlenbeck Processes in Credit Risk Modeling.” Metrika 69:73–198. https://doi.org/10.1007/s00184-008-0213-4.
  • Cont, R., and P. Tankov. 2004. Financial Modelling with Jump Processes. Boca Raton: Chapman & Hall/CRC.
  • Cummins, M., G. Kiely, and B. Murphy. 2018. “Gas Storage Valuation Under Multifactor Lévy Processes.” Journal of Banking and Finance 95:167–184. https://doi.org/10.1016/j.jbankfin.2018.02.012.
  • Dai, M., Y. Zhong, and Y. K. Kwok. 2011. “Optimal Arbitrage Strategies on Stock Index Futures Under Position Limits.” Journal of Futures Markets 31 (4): 394–406. https://doi.org/10.1002/fut.v31.4.
  • Do, B., and R. Faff. 2012. “Are Pairs Trading Profits Robust to Trading Costs?” Journal of Financial Research 35 (2): 261–287. https://doi.org/10.1111/jfir.2012.35.issue-2.
  • Elliott, R., Van Der Hoek J., and W. Malcolm. 2005. “Pairs Trading.” Quantitative Finance 5 (3): 271–276. https://doi.org/10.1080/14697680500149370.
  • Endres, S., and Stübinger J.. 2019. “Optimal Trading Strategies for Lévy-Driven Ornstein-Uhlenbeck Processes.” Applied Economics 51 (29): 3153–3169. https://doi.org/10.1080/00036846.2019.1566688.
  • Gardini, M., P. Sabino, and E. Sasso. 2022. “The Variance Gamma++ Process and Applications to Energy Markets.” Applied Stochastic Models in Business and Industry 38 (2): 391–418. https://doi.org/10.1002/asmb.v38.2.
  • Gatev, E., W. N. Goetzmann, and K. G. Rouwenhorst. 2006. “Pairs Trading: Performance of a Relative-Value Arbitrage Rule.” Review of Financial Studies 19 (3): 797–827. https://doi.org/10.1093/rfs/hhj020.
  • Glasserman, P. 2003. Monte Carlo Methods in Financial Engineering. New York: Springer.
  • Huck, N., and K. Afawubo. 2015. “Pairs Trading and Selection Methods: Is Cointegration Superior?” Applied Economics 47 (6): 599–613. https://doi.org/10.1080/00036846.2014.975417.
  • Kanamura, T., S. Rachev, and F. Fabozzi. 2010. “A Profit Model for Spread Trading with an Application to Energy Futures.” The Journal of Trading 5 (1): 48–62. https://doi.org/10.3905/JOT.2010.5.1.048.
  • Lee, K., T. Leung, and B. Ning. 2023. “A Diversification Framework for Multiple Pairs Trading Strategies.” Risks 11 (5). https://doi.org/10.3390/risks11050093.
  • Leung, T., and X. Li. 2015. “Optimal Mean Reversion Trading with Transaction Costs and Stop-Loss Exit.” International Journal of Theoretical and Applied Finance 18 (03): 1550020. https://doi.org/10.1142/S021902491550020X.
  • Leung, T., and X. Li 2016. Optimal Mean Reversion Trading: Mathematical Analysis and Practical Applications. Modern Trends in Financial Engineering, World Scientific Publishing Company.
  • Leung, T., and H. Nguyen. 2019. “Constructing Cointegrated Cryptocurrency Portfolios for Statistical Arbitrage.” Studies in Economics and Finance 36 (3): 581–599. https://doi.org/10.1108/SEF-08-2018-0264.
  • Lipton, A., and López de Prado M.. 2020. “A Closed-Form Solution for Optimal Ornstein-Uhlenbeck Driven Trading Strategies.” International Journal of Theoretical and Applied Finance 23 (8): 2050056. https://doi.org/10.1142/S0219024920500569.
  • Lu, K. W. 2022. “Calibration for Multivariate Lévy-Driven Ornstein-Uhlenbeck Processes with Applications to Weak Subordination.” Statistical Inference for Stochastic Processes 25 (2): 365–396. https://doi.org/10.1007/s11203-021-09254-4.
  • Madan, D. B., P. P. Carr, and E. C. Chang. 1998. “The Variance Gamma Process and Option Pricing.” European Finance Review 2 (1): 79–105. https://doi.org/10.1023/A:1009703431535.
  • Madan, D. B., and E. Seneta. 1990. “The Variance Gamma (v.g.) Model for Share Market Returns.” The Journal of Business 63 (4): 511–524. https://doi.org/10.1086/jb.1990.63.issue-4.
  • Masuda, H. 2004. “On Multidimensional Ornstein-Uhlenbeck Processes Driven by a General Lévy Process.” Bernoulli 310 (1): 97–120.
  • Michaelsen, M., and A. Szimayer. 2018. “Marginal Consistent Dependence Modeling Using Weak Subordination for Brownian Motions.” Quantitative Finance 18 (11): 1909–1925. https://doi.org/10.1080/14697688.2018.1439182.
  • Nelson, B. L. 1990. “Control Variate Remedies.” Operations Research 38 (6): 974–992. https://doi.org/10.1287/opre.38.6.974.
  • Qu, Y., A. Dassios, and H. Zhao. 2021. “Exact Simulation of Gamma-Driven Ornstein–Uhlenbeck Processes with Finite and Infinite Activity Jumps.” Journal of the Operational Research Society 72 (2): 471–484. https://doi.org/10.1080/01605682.2019.1657368.
  • Rencher, A. C., and G. B. Schaalje. 2008. Linear Models in Statistics. Hoboken: John Wiley & Sons, Inc.
  • Sabino, P. 2020. “Exact Simulation of Variance Gamma-Related OU Processes: Application to the Pricing of Energy Derivatives.” Applied Mathematical Finance 27 (3): 207–227. https://doi.org/10.1080/1350486X.2020.1813040.
  • Sato, K. 1999. Lévy Processes and Infinitely Divisible Distributions. Cambridge: Cambridge University Press.
  • Sato, K., and M. Yamazato. 1983. “Stationary Processes of Ornstein-Uhlenbeck Type.” In Probability Theory and Mathematical Statistics, edited by K. Itô and J. V. Prohorov, 541–551. Berlin: Springer.
  • Sato, K., and M. Yamazato. 1985. “Completely Operator-Selfdecomposable Distributions and Operator-Stable Distributions.” Nagoya Mathematical Journal 97:71–94. https://doi.org/10.1017/S0027763000021267.
  • Schoutens, W. 2003. Lévy Processes in Finance: Pricing Financial Derivatives. East Sussex: John Wiley & Sons Ltd.
  • Valdivieso, L., W. Schoutens, and F. Tuerlinckx. 2009. “Maximum Likelihood Estimation in Processes of Ornstein-Uhlenbeck Type.” Statistical Inference for Stochastic Processes 12 (1): 1–19. https://doi.org/10.1007/s11203-008-9021-8.
  • Wu, L., X. Zang, and H. Zhao. 2020. “Analytic Value Function for a Pairs Trading Strategy with a Lévy-Driven Ornstein-Uhlenbeck Process.” Quantitative Finance 20 (8): 285–1306. https://doi.org/10.1080/14697688.2020.1736613.

Appendix. Equivalence of control variates and linear regression

Suppose we have multiple control variate C=(C1,,Cp) to estimate E[P], where (C,P) is jointly simulated as we want the control variates to be correlated with P. As explained in Glasserman (Citation2003, Section 4.1.2), the control variate estimator is P^C=Y¯β^C(X¯μC), where Y is the vector of m simulated values of P, Y¯ is its sample mean, X1,,Xp are the vectors of m simulated values of C1,,Cp, respectively, X¯ is the vector of the sample means of X1,,Xp, μC=E[C], β^C=SXX1SXY, SXXRp×p with the sample covariance Cov^(Xi,Xj) as its (i,j)-entry, and SXYRp with the sample covariance Cov^(Xi,Y) as its ith entry. This generalizes the method of Section 4.2.

Consider the linear regression model Y=Xβ+ϵ, where XR(p+1)×m is the design matrix, which includes an intercept and the control variates C1,,Cp, and βRp+1 with LSE β^. The prediction of this linear regression at the point x=(1,μC) is Y^=β^x=Y¯+β^C(μCX¯)=P^C, where the second equality follows from the centred form of linear regression by Rencher and Schaalje (Citation2008, pg 156, Equations (7.36), (7.46)). This reference also provides that β^1=β^C, where β^=(β^0,β^1).

Thus, the control variate estimator P^C is the prediction of this linear regression. This fact is well-known and mentioned in Glasserman (Citation2003, Section 4.1.1) for a single control variate, and it is unsurprising that it is also true for multiple control variates. Nelson (Citation1990) gives a different but equivalent statement where the regressors are CμC.