327
Views
0
CrossRef citations to date
0
Altmetric
Research Papers

A static replication approach for callable interest rate derivatives: mathematical foundations and efficient estimation of SIMM–MVA

, &
Received 16 Jun 2023, Accepted 20 Jan 2024, Published online: 26 Feb 2024

Abstract

The computation of credit risk measures such as exposure and Credit Value Adjustments (CVA) requires the simulation of future portfolio prices. Recent metrics, such as dynamic Initial Margin (IM) and Margin Value Adjustments (MVA) additionally require the simulation of future conditional sensitivities. For portfolios with non-linear instruments that do not admit closed-form valuation formulas, this poses a significant computational challenge. This problem is addressed by proposing a static replication algorithm for interest rate options with early-exercise features under an affine term-structure model. Under the appropriate conditions, we can find an equivalent portfolio of vanilla options that replicate these products. Specifically, we decompose the product into a portfolio of European swaptions. The weights and strikes of the portfolio are obtained by regressing the target option value with interpretable, feed-forward neural networks. Once an equivalent portfolio of European swaptions is determined, we can leverage on closed-form expressions to obtain the conditional prices and sensitivities, which serve as an input to exposure and SIMM-driven MVA quantification. For a consistent forward sensitivity estimation, this involves the differentiation of the portfolio-weights. The accuracy and convergence of the method is demonstrated through several representative numerical examples, benchmarked against the established least-square Monte Carlo method.

1. Introduction

This research contributes to the quantification of risk measures related to counterparty credit risk (CCR) for interest rate derivatives that are traded over-the-counter (OTC). CCR refers to the risk that the counterparty in a bilateral agreement may default before the final settlement of every cash-flow and will fail to meet its financial obligations (Bank for International Settlements Citation2019). This type of risk has gained a lot of attention in the aftermath of the global financial crisis of 2007–08. Since then, the regulators have introduced numerous measures to mitigate systemic credit risk that have resulted in quantifying CCR through various value adjustments (xVA) (Gregory Citation2020). The family of xVAs includes CVA, which is defined as the expected loss that is incurred from a counterparty defaulting. Other xVAs quantify the cost related to funding, collateral and capital requirements. A general introduction to the challenges related to CCR quantification can be found in Zhu and Pykhtin (Citation2007).

An important means to reduce CCR for uncleared OTC derivatives is to post collateral (Gregory Citation2020). For interest rate derivatives we can in general distinguish two types of collateral: Variation Margin (VM) and Initial Margin (IM). VM matches the current value of the underlying portfolio and needs to be updated on a regular basis. The funding cost related to VM is known as FVA and is reflected by the expected exposure of the portfolio (Burgard and Kjaer Citation2013), similar to CVA. Posting VM significantly reduces the exposure to counterparty risk but does not bring it down to zero. Typically there will be a delay between the event of default and the settlement of all outstanding positions, which is known as the margin period of risk (Gregory Citation2020). IM serves as a protection against exposure changes due to market moves during this close-out period. In general, IM is supposed to cover the potential future exposure of a contract over a time-horizon of 10 business days with a 99% confidence. In that regard, the volume of IM typically reflects the 99% value-at-risk of the 10-day move in portfolio value. A portfolio's VaR varies over time and should therefore (just like VM) be updated on a regular basis.

A VaR measure is highly model-dependent and computationally intensive to quantify. To address these problems, the International Swaps and Derivatives Association (ISDA) has developed the Standard Initial Margin Model (SIMM). Since its publication in 2016, SIMM has become an industry-standard approach to IM quantification. Benefits of such a common methodology include a transparent dispute resolution and a consistent regulatory governance of collateral (International Swaps and Derivatives Association, Inc Citation2016). In short, SIMM is a sensitivity-based approach, in which VaR measures are approximated using the instrument's sensitivity to the shock of an underlying risk-factor. For the user of SIMM, this means that IM quantification reduces to the calculation of a set of bucketed portfolio sensitivities, such as Delta and Vega.

The exchange of collateral needs to be funded, which comes at a cost for the dealer. The total expected funding cost over the life-time of a derivative contract is known as the Margin Value Adjustment (MVA) (Green and Kenyon Citation2015) and is a recent addition to the xVA-collection. An ISDA margin survey reported that market participants collected over $300 billion of IM for non-cleared derivatives at year-end 2021 (International Swaps and Derivatives Association, Inc Citation2021). Due to its volume, an adequate quantification and management of IM-induced funding cost is highly relevant. Under the assumption that the size of IM is implied by SIMM and thus by sensitivities, it follows that MVA is driven by the future distribution of portfolio sensitivities over the life-time of the trade. Within a Monte Carlo simulation framework, this means sensitivities need to be calculated along the Monte Carlo path. For vanilla derivatives, which admit closed-form Greeks, the computational cost may be manageable. However, for exotic derivatives, such as options with early-exercise features, this is a demanding problem.

In this work, we present a universal approach to the computation of prices, Greeks, exposures and sensitivities along the Monte Carlo path for callable interest rate derivatives. Our focus will be on Bermudan swaptions, which is a class of exotic OTC derivatives that is heavily traded in the market. Our methodology is an extension of the work presented in Lokeshwar et al. (Citation2022) and Hoencamp et al. (Citation2023) and relies of the concept of static replication. A static replication is a portfolio of vanilla instruments, that mirrors the value of the original exotic option, until it is either exercised or matured. The portfolio composition of a static replication is constant throughout the life-time of the trade. This is in contrast with a dynamic replication, which needs to be continuously re-balanced (such as a traditional Delta hedge), or a semi-static replication, which needs to be re-balanced on a finite number of instances. The decomposition of a complex product into a portfolio of vanilla options greatly simplifies its risk analysis and allows for an efficient price and sensitivity calculation.

In the context of CVA, methodologies that facilitate efficient exposure calculations, have received a lot of attention in the recent literature. Ordinary least-square regression techniques for American-style options embedded in a simulation framework, have been popularized by Carriere (Citation1996), Tsitsiklis and Van Roy (Citation2001) and Longstaff and Schwartz (Citation2001). For exposure purposes, this regression methodology is extended by Joshi and Kwon (Citation2016) and Feng et al. (Citation2016) to reduce regression bias and noise along the Monte Carlo path. In other works, Monte Carlo methods have been combined with finite-difference estimation (De Graaf et al. Citation2014, Simaitis et al. Citation2016), the COS-method (Shen et al. Citation2013, Feng et al. Citation2016), the Stochastic Grid Bundling Method (SGBM) (Karlsson et al. Citation2016) and Chebyshev interpolation (Glau et al. Citation2021). In recent advances, machine learning has been applied to facilitate efficiency in exposure calculations. This includes a Deep Optimal Stopping algorithm to estimate the exercise strategy (Andersson and Oosterlee Citation2021) and a Deep xVA Solver, which relies on a neural network-based BSDE solver.

The publication of SIMM and the increasing significance of MVA have been incentives to investigate methodologies that facilitate efficient sensitivity calculation. For practitioners, the default approach is ‘bump-and-reval’, due to its simplicity and straight-forward implementation. For spot calculations, this is feasible, but for an application such as MVA, it would imply path-wise model re-calibration and option re-evaluation for each scenario, time-step and sensitivity-component. Thus, the computational burden would be excessive. In Glasserman (Citation2004), unbiased Monte Carlo sensitivity estimators are proposed, which yield better convergence properties than the traditional finite-difference estimator. These are known as the likelihood ratio method, which relies on differentiating the payoff probability density, and the path-wise derivative method, which relies of differentiating the payoff function itself. Yet the computational improvement of these methods can be limited for path-dependent instruments.

The application of algorithmic differentiation (AD) to the path-wise derivative method has been shown to yield a significant efficiency gain. The concept of AD is that the estimation routine is decomposed into a series of basic operations, which are differentiated by a repeated application of the chain rule (Griewank and Walther Citation2008). The differentiation is either performed in a forward iteration (tangent mode) or backward iteration (adjoint mode, AAD). See for example Giles and Glasserman (Citation2006), Capriotti (Citation2011) or Capriotti et al. (Citation2017) for an application with American-style derivatives. The tangent mode is efficient if the number of independent variables (i.e. the dimensionality of the function input) is small. It is therefore less suitable for the calculations of large gradients. In contrast, the adjoint mode is efficient if the number of dependent variables (i.e. the dimensionality of the function output) is small. A limitation of AAD, however, is that it is demanding in terms of memory and its implementation is often technical. Others have combined adjoint differentiation with LSMC (Antonov et al. Citation2017, Caspers and Lichters Citation2018), stochastic automatic differentiation (Fries Citation2019) or SGBM (Jain et al. Citation2019). Alternative approaches to path-wise sensitivity approximations include principal component analysis (Kappen Citation2017), Chebyshev interpolation (Zeron and Ruiz Citation2018) and least-square regression (Lakhany and Zhang Citation2021).

With a focus on Bermudan swaptions under an affine term-structure model, our contribution to the literature is threefold. First we extend the semi-static replication algorithm that is presented in Lokeshwar et al. (Citation2022) and Hoencamp et al. (Citation2023) to a replication approach that is truly static. The method is based on formulating the portfolio optimization as the regression of a shallow neural network to the target's option value. As a result, the trained parameters represent the weights and strikes of the portfolio composition. Second, we show that with the proper constraints on the regression, a Bermudan can be replicated with a portfolio of European swaptions. The accuracy is demonstrated through several numerical examples. Thirdly, we derive efficient estimators for sensitivities along the Monte Carlo path, which serve as an input to dynamic IM quantification. Here we exploit the fact that Deltas and Vegas of European swaptions are available in closed-form. We also show that for complete and accurate sensitivities, a differentiation of the portfolio weights needs to be incorporated. We demonstrate the performance of the algorithm through several numerical examples and convergence analyzes, benchmarking the results to the established least-square Monte Carlo method (Longstaff and Schwartz Citation2001).

The outline of the article is as follows. In section 2, we define the modeling landscape, the considered interest rate derivatives and the risk metrics of interest. In section 3, we present the regression algorithm and show how we can achieve a static replication for Bermudan swaptions. In section 4, we derive the estimators used for price and sensitivity calculation along the Monte Carlo path. Section 5 is subject to several representative numerical examples. We conclude with a summary of the results in section 6.

2. Problem formulation

A problem overview is provided in figures .

Figure 1. Definition initial margin (IM).

Figure 1. Definition initial margin (IM).

Figure 2. Standard initial margin model (SIMM).

Figure 2. Standard initial margin model (SIMM).

Figure 3. Definition margin value adjustment (MVA).

Figure 3. Definition margin value adjustment (MVA).

Figure 4. MVA through static replication.

Figure 4. MVA through static replication.

Problem overview

  1. Initial Margin (IM) is a form of collateral, which serves as a protection against the 99% value-at-risk of the exposure change during the margin period of risk (figure ).

  2. The industry standard approach to IM quantification is the Standard Initial Margin Model (SIMM), which takes the portfolio sensitivities as input (figure ).

  3. The expected funding cost of posting IM is called Margin Value Adjustment (MVA). Quantifying MVA requires the simulation of future IM (i.e. portfolio sensitivities), which is a computational challenge (figure ).

  4. We propose a method to statically replicate a callable derivative with a portfolio of vanilla options. This simplifies the computation of path-wise sensitivities and future IM (figure ).

2.1. Model and assumptions

First, we fix a finite time-horizon [0,T¯], on which we consider a continuous-time financial market. We assume the market is frictionless and free of arbitrage. We let the probability space (Ω,F,P) represent all possible states of the economy and the filtration F=(Ft)t[0,T¯] all information generated by the economy up to time-t.

We consider the notion of a bank account or money market account. Investments in the money market are assumed to compound a continuous, risk-free interest rt. We refer to rt as the short-rate. The time-t value of a unit of currency invested in the money market at time-zero is denoted as Mt, and we assume it satisfies the following dynamics dMt=rtMtdt,M0=1 We denote by Q the risk-neutral measure equivalent to P, which is associated with Mt as the numéraire. Attainable claims denominated by the numéraire are assumed to be martingales under Q, which guarantees the absence of arbitrage (Harrison and Pliska Citation1981).

Throughout this paper we shall consider short-rate dynamics that are captured by an affine term-structure model, as introduced in Duffie and Kan (Citation1996). That means that the short-rate can be written as rt=g(xt) for some function g:RdR. The vector xt:={x1(t),,xd(t)} denotes a Markovian system of state-variables in Rd. The state-variables are assumed to satisfy an SDE of the form (1) dxt=μ(t,xt)dt+η(t)dWt(1) with Wt:={W1(t),,Wd(t)} denoting a d-dimensional Brownian motion under Q and functions μ:[0,T¯]×RdRd and η:[0,T¯]Rd×d. We let g() and μ(t,) be affine functions of xt, satisfying the standard regularity conditions, such that xt admits a strong solution. Note that we restrict our scope to the Gaussian subclass of affine short-rate models. This is done to impose an intuitive relationship between model parameters and model-implied option volatilities (see section 3.5). This will facilitate the tractability of the computations; the estimation of Vega in particular. Generalizations to state-dependent diffusion coefficients should be possible but will involve the differentiation of Fourier transforms. This increases the complexity of the problem and falls outside the scope of this paper.

A zero-coupon bond is a contract that guarantees the holder one unit of currency at a pre-specified maturity date T. We will denote its time-t value as P(t,T) for tT. An important result for affine term-structure models, is that zero-coupon bond prices are exponential affine in xt. See for example Andersen and Piterbarg (Citation2010) and Duffie and Kan (Citation1996) for details. Next to zero-coupon bonds, we will often refer to a related quantity, known as the zero-rate. We define a zero-rate as R(t,T):=logP(t,T)τ where τ denotes the year fraction between date t and T. For simplicity, we will assume that the collateral rate used for discounting and the instantaneous rate used to derive term-rates are both implied by the same short-rate rt. In other words, our set-up will be a classic single-curve model environment. As term rates, we consider the classic, forward-looking LIBOR. Similarly one could consider a backward-looking, RFR-based term rate. In certain markets, the IBOR benchmark is discontinued, in favor of such set-in-arrears term rates (see Lyashenko and Mercurio Citation2019). As forward rates of forward- and backward-looking term rates are equivalent before the start of the accrual period, their treatment within this framework would mostly be the same.

2.2. Interest rate derivatives

The focus of this work will be on modeling Bermudan swaptions. Here we briefly introduce related derivatives and notation.

  1. Interest rate swap: We consider fixed-for-floating interest rate swaps, where floating LIBOR payments are exchanged against fixed rate payments on dates T0,,TM. The year-fraction between Tj1 and Tj will be denoted τj.

  2. European swaption: A European swaption is a contract, which gives the holder the right, but not the obligation to enter an fixed-for-floating interest rate swap with pre-specified fixed rate K at a pre-specified future inception date T0. Considering the annuity A0,M(t):=m=1MτmP(t,Tm) as numéraire, the risk-neutral value of a swaption (see Filipovic Citation2009) can be expressed as (2) Vt=A0,M(t)E0,M[(δ(S0,M(T0)K)|)+|Ft](2) where the expectation above is taken under the annuity measure Q0,M. The parameter δ distinguishes between a payer (δ=1) and a receiver (δ=1) swaption. Under the assumption that the swap rate has Gaussian dynamics, the swaption price can evaluated using Bachelier's formula (see Musiela and Rutkowski Citation1997). Given a swaption has market price Vmkt(t,K), the implied normal volatility is defined as the unique scalar σimp that solves the equation VBach(t,σimp,K)=Vmkt(t,K)

  3. Bermudan swaption: A Bermudan swaption is a contract, which gives the holder the right to enter a fixed-for-floating interest rate swap with maturity TM at any of the dates T0,,TM1 at a pre-specified fixed rate K. Should the holder of the Bermudan choose to exercise at Tm, the payoff is given by h(Tm,xTm):=hm(xTm)=δAm,M(Tm;xTm)[Sm,M(Tm;xTm)K] The value of a Bermudan swaption can be represented through a dynamic programming formulation, in which the price is captured by means of a backward recursion (Glasserman Citation2004), given by V(t;xt)={max{hM1(xt), 0}if t=TM1max{hm(xt), Cm(t,xt)}if t=Tm,m=0,,M2Cm1(t,xt)if t(Tm1,Tm) where Cm represents the continuation value of the contract.

2.3. Credit risk metrics

In this work, we will consider credit exposure and forward sensitivity profiles of Bermudan swaptions. Exposure is essential for CVA computation. Product sensitivities, Delta and Vega in particular, serve as an input to ISDA SIMM calculations for initial margin.

2.3.1. Exposure and CVA

The expected positive exposure (EPE) of a financial contract with price Vt at time t0 is EPE(t):=EQ[e0trudumax{Vt,0}|F0] Expected positive exposure represents the expected loss on a claim in the event of a counterparty defaulting at a given time t0 (Gregory Citation2020). The expectation is evaluated under the risk-neutral measure Q. Exposure is a key ingredient in the quantification of counterparty risk at trade or counterparty level through CVA. CVA is the difference between the total value of a derivative in a market that is completed with counterparty risk and its default-free value (Burgard and Kjaer Citation2013). Let τ0 denote the default time of the counterparty. Assuming τ is independent of exposure (i.e. ignoring wrong- or right-way risk), CVA can be computed as follows Green (Citation2015) (3) CVA=(1RR)0TEPE(u)dQ(τ=u)LGDi=1nEPE(ti)×PD(ti1,ti)(3) with RR denoting the recovery rate, LGD the loss given default and PD the probability of default of the counterparty. A common approach in the literature is to model the default time τ as an Ftadapted stopping time using a Cox-construction (Jeanblanc and Li Citation2020). In that case, the survival function of the counterparty follows an exponential distribution, i.e. Q(τ>t)=exp{Λ(t)} with Λ(t) denoting a hazard function. The hazard function can be specified in terms of a hazard rate λ(t), which yields Λ(t)=0tλ(u)du. Therefore, the probability of default can be written as (4) PD(ti1,ti)=e0ti1λ(u)due0tiλ(u)du(4)

2.3.2. Initial margin and MVA

An important means to reduce the exposure at risk is the exchange of collateral. In this work, we will consider initial margin (IM) and its related funding costs. IM is a protection against exposure changes during the margin period of risk, the time interval after a default when outstanding positions are not yet settled, but VM is no longer updated (Green and Kenyon Citation2015). Typically, IM reflects the 99% value-at-risk of the portfolio change over a 10-day time interval. Throughout this work, we will consider the Standard Initial Margin Model (SIMM) to be the default method for IM computations. The main take-away from SIMM–IM is that it is a sensitivity-based approach. The idea is that the response of an instrument to a risk-factor shock is efficiently approximated by multiplying the instrument's risk-factor sensitivity by the corresponding shock size (International Swaps and Derivatives Association, Inc Citation2016). We will provide a brief summary of SIMM below, but refer to International Swaps and Derivatives Association, Inc (Citation2020) for details.

Consider risk-factors θ:=(θ1,,θK) and denote the 10-day risk-factor increments as Δθ1,,ΔθK. The variance of the total response of an instrument Vt to each shock Δθi is computed as Var(i=1KVtθiΔθi)=[DθVt]Σθ[DθVt] where DθVt denotes the gradient of Vt w.r.t. θ and Σθ denotes the covariance matrix of θ. If the total response defined above, is assumed to be approximately Gaussian with mean zero, the VaR can be estimated as follows VaR0.99(ΔVt)Φ1(0.99)[DθVt]Σθ[DθVt] Where Φ1 denotes the inverse of the standard normal CDF. For interest rate derivatives, ISDA distinguishes three relevant types of risk-factors, namely Delta-risk, Vega-risk and Curvature-risk, the latter referring to the second-order Gamma impact. The total IM is given by (5) IM:=MarginDelta+MarginVega+MarginCurvature(5) where each component represents a VaR estimation w.r.t. a type of underlying risk-factors, i.e. market interest rates and market implied volatilities. The risk-factors of each type are subdivided into twelve buckets, corresponding to the yields at the tenors T:={2W,1M,3M,6M,1Y,2Y,3Y,5Y,10Y,15Y,20Y,30Y}. Hence for each component one needs to compute the corresponding 12-dimensional sensitivity vector, which are specified as (see also International Swaps and Derivatives Association, Inc Citation2020):

Delta

: The PV01 of the instrument Vt w.r.t. the risk-free interest rate Rk at tenor τkT, i.e. Δk=Vt(Rk+1bp)Vt(Rk).

Vega

: The sensitivity of the instrument Vt w.r.t. volatility risk-factor σk, i.e. Vk=Vt(σk+1%)Vt(σk). Here, σk denotes the the implied at-the-money volatility of a swaption with an expiry time equal to τkT.

Curvature

: Curvature margin represents the second-order impact of the interest rate risk-factors to the VaR estimate. In SIMM, the Gamma is approximated with a Vega-Gamma relationship. Therefore, the computation of Curvature margin takes the Vega sensitivity vector as input, similar to Vega margin.

For SIMM–IM quantification in practice, the user only needs to compute the portfolio sensitivities. The other parameters are provided by ISDA in an annual publication.

A key difference between VM and IM is that the former is symmetric. VM reflects the value of the underlying trade and can be rehypothecated. This means the receiver can reuse the collateral, for example in the margin agreement of an opposite transaction. In contrast, IM reflects the risk of a trade and must be exchanged in both directions, without netting the amounts. The collateral is often required to be posted on a segregated account and therefore cannot be rehypothecated (Gregory Citation2020). As a result, IM must be funded by the dealer, which comes at a cost. The expected lifetime cost of posting IM against a portfolio is called Margin Value Adjustment (MVA) (Green and Kenyon Citation2015).

The size of MVA is driven by the dynamic IM profile of the portfolio and the funding spread that applies to the IM posting. As the volume of IM is dependent on the state of the economy, {IM(t):t[0,T]} represents a stochastic process. The expected initial margin (EIM) of a financial contract at time t0 is defined as EIM(t):=EQ[e0truduIM(t)|F0] EIM represents the expected volume of IM that needs to be posted by the dealer at a given time t0 (Gregory Citation2020). With this expression at hand, MVA can be computed as follows (6) MVA=0TFS(u)EIM(u)dui=1nFS(ti)×EIM(ti)×(titi1)(6) with FS denoting the funding spread, which reflects the spread on the collateral rate w.r.t. the risk-free rate (Green and Kenyon Citation2015).

3. The static replication method

In this section, we describe the algorithm that is used to compose a static replication portfolio for a Bermudan swaption. The algorithm is embedded in a Monte Carlo framework and relies on a regress-later technique. See Glasserman and Yu (Citation2004) for a general introduction to regress-later. The method is strongly inspired by the work of Jain and Oosterlee (Citation2015) and extends the approach that is presented in Lokeshwar et al. (Citation2022) and Hoencamp et al. (Citation2023).

In Jain and Oosterlee (Citation2015), it is shown that early-exercise options can be priced with an application of a regress-later technique. As such, the ‘later’ option price VTm+1 is regressed against the ‘later’ risk-factor realization xTm+1. The continuation value is subsequently estimated by evaluating the conditional expectation of the regressed option value, i.e. Cm(Tm)=MTmEQ[VTm+1MTm+1|FTm]MTmEQ[k=1Kαkϕk(xTm+1)MTm+1|FTm] which can be computed in (semi) closed-form for an appropriate choice of basis-functions ϕk.

In Lokeshwar et al. (Citation2022), it is proposed to perform the regression with a shallow neural network (i.e. a feed-forward neural network with a single hidden layer). By considering the appropriate network structure, it is shown that the regression can be interpreted as a portfolio of short-maturity options. As a consequence, the continuation value can be evaluated by simply pricing the replication instruments. In Hoencamp et al. (Citation2023), this approach is extended to an interest rate modeling framework. There it is shown that a Bermudan swaption can be semi-statically replicated by an options portfolio written on zero-coupon bonds. The central concept of the latter two studies is that if a portfolio perfectly reproduces the value function of a derivative security at some future time Tj for every realization of the state-variables, the no-arbitrage condition implies that this portfolio will also replicate the security at any time t<Tj, as long as no cash-flows can occur between t and Tj. Where a dynamic replicating portfolio needs to be rebalanced continuously through time, the semi-static replication only needs rebalancing on a finite number of instances.

The algorithm that is proposed in this paper builds further on the foundations that were laid by Jain and Oosterlee (Citation2015), Lokeshwar et al. (Citation2022) and Hoencamp et al. (Citation2023). We suggest two main novelties compared to the earlier studies

  • We propose to use the swap rate as the regression variable, which is an implicit risk-factor in our model set-up. As a consequence, the regression can be interpreted as the payoff of a portfolio of European swaptions. We further elaborate on this in Section 3.4.

  • We propose a new variation to the algorithm, which allows us to compose a replication that is not semi-static, but in fact truly static. In other words, this algorithm lets one compose a portfolio at time zero, which will mirror the value of the Bermudan until it is either exercised or expired, without the need of updating the portfolio at intermediate time-points. We further elaborate on this in Section 3.3.

3.1. The algorithm

The algorithm is executed in an iterative manner, moving backwards in time. Starting at the final exercise date TM1, a regression is performed to determine the weights and strikes of an options portfolio which replicates the Bermudan's payoff. This replication is subsequently used to estimate the value of the Bermudan at the preceding monitor date, yielding the target function of the consecutive regression. This process is repeated until the first monitor date T0 and a portfolio has been composed which statically replicates the Bermudan between time-zero and its final exercise date. Below we will describe the algorithm in more detail.

Start by sampling N trajectories of the underlying state-variable. For each realization of xTM1, compute the Bermudan swaption payoff at TM1 (the final exercise opportunity) fM1(xTM1)=max{hM1(xTM1),0}. This will act as the target function of the first regression. Now recursively execute the following steps for m=M1,,0:

1. Select and sample the regression variable

Select an asset zm(t)={zm,1(t,xt),,zm,d(t,xt)} taking value in Rd that will act as the independent regression variable. For compactness, we will suppress the dependency of zm on the state variable xt, when clear from the context. The asset(s) should satisfy the following conditions:

  • zm(Tm) is a square integrable random variable on the probability space (Ω,FTm,Q).

  • For each m=0,,M1, the realization fm(xTm) is uniquely determined by zm(Tm).

  • The parametrization {(zm(Tm;x),fm(x)|):xRd} is continuous.

where fm denotes a target function, which we define in (Equation8) below. Subsequently compute for each sample path the realizations of zm(Tm).

2. Regress the target function

Consider a regression function Gm:RdR that is defined as (7) Gm=ψ2A2ψ1A1(7) Here we denote by A1:RdRn, A2:RnR affine functions of the form A1(y)w1y+b1andA2(y)w2y+b2 with w1Rd×n, w2Rn×1, b1Rn and b2R. Furthermore we denote by ψ1:RnRn, ψ2:RR activation functions of the form ψ1(y)(max{y1,0},,max{yn,0})andψ2(y)y Additionally, consider a target function fm:RdR of the form (8) fm(x)={qm(x)max{hm(x),0}if m=M1qm(x)max{hm(x)C~m(Tm,x),0}if m=M2,,0(8) where qm(x) denotes a possible scaling factor. The rationale behind this target function is that we aim to regress the fraction of the option value that is lost, when an exercise date passes, but the option is not exercised. We provide an elaborate motivation and interpretation in section 3.3.

Subsequently, optimize the parameters βm={w1,w2,b1,b2}, such that Gm(zm(Tm);βm) fits the target function fm(xTm). The parameters are determined by minimizing the cost function (9) Lm(β)=EQ[(Gm(zm(Tm);β|)fm(|xTm)|)2](9) (10) 1Nn=1N(Gm(zm(n);β|)fm(|xTm(n))|)2(10) where zm(n) and xTm(n) denote sampled realizations of the asset and the state variable, respectively, on a corresponding MC path.

3. Estimate the continuation value

Finally, the continuation value of the Bermudan swaption at the preceding exercise date Tm1 is estimated by calculating (11) C~m1(Tm1,xTm1)=j=mM1MTm1EQ[Gj(zj(Tj);βj)MTjqj(xTj)|xTm1](11) With an appropriate choice of zm, this can be computed in closed-form or approximated with an efficient numerical routine.

3.2. Interpretation of the regression function

The regression function as presented in (Equation7) can be considered a shallow neural network, i.e. a feed-forward neural network with a single hidden layer. The structure is graphically represented in figure (a). The outcome of the first hidden layer is a vector in Rn, consisting of n neurons. Each individual neuron Om,i is calculated as Om,i=max{j=1dwijzm,i(Tm)+bi,0} The functional form of each hidden node thus corresponds to the payoff of an arithmetic basket option written on zm,1,,zm,d.

Figure 5. Neural network structures. (a) Fully connected. (b) One-dimensional and (c) Locally connected.

Figure 5. Neural network structures. (a) Fully connected. (b) One-dimensional and (c) Locally connected.

The outcome of the second layer is calculated as i=1nwiOm,i+b The functional form of the outcome of the neural network, hence corresponds to the payoff of a weighted portfolio of basket options, plus a zero-coupon bond maturing at Tm. Computing the conditional expectation given in (Equation11) can be interpreted as evaluating the risk-neutral price of this portfolio.

3.3. A static replication

Let Πm(t) denote an options portfolio maturing at Tm with payoff equal to (12) Πm(Tm)=1qm(xTm)Gm(zm(Tm);βm)(12) By the deliberate choice of the target function fm in (Equation8), the payoff of this portfolio yields the approximation Πm(Tm)max{hm(xTm)C~m(Tm,xTm),0}=max{hm(xTm),C~m(Tm,xTm)}C~m(Tm,xTm) which is graphically represented by the red area in figure (a). As a consequence of the no-arbitrage condition, it follows that at any time t<Tm the risk-neutral portfolio value yields the approximation Πm(t)Cm1(t,xt)Cm(t,xt). By a repeated argument, whenever t(Tm1,Tm) we have i=mM1Πi(t)Cm1(t,xt)=V(t) The combined portfolio Π(t):=i=0M1Πi(t), composed at t = 0, can be interpreted as a fully static replication of the Bermudan swaption. Whenever a monitor date Tm is reached, subportfolio Πm expires. Then, two things could happen:

  • C(Tm,xTm)hm(xTm): The holder exercises the Bermudan and the value of i=m+1M1Πi plus the payoff of Πm will equal the exercise value.

  • C(Tm,xTm)>hm(xTm): The holder holds the Bermudan, subportfolio Πm expires worthless and i=m+1M1Πi will continue to replicate the Bermudan.

Figure 6. Target functions regress-later: two approaches. (a) Fully static replication and (b) Semi-static replication.

Figure 6. Target functions regress-later: two approaches. (a) Fully static replication and (b) Semi-static replication.

Thus, portfolio Π will mirror the Bermudan swaption value until it is either matured or exercised.

The algorithm presented in this work is a variation to the semi-static replication approach presented in Lokeshwar et al. (Citation2022) and Hoencamp et al. (Citation2023). There, the target function fm is selected such that the portfolio payoff Πm yields the approximation Πm(Tm)max{hm(xTm),C~m(Tm,xTm)} which is graphically represented by the red area in figure (b). In that case, subportfolio Πm is set-up on Tm1, it replicates the Bermudan on the time-interval (Tm1,Tm) and expires at Tm. Its payoff will equal the exercise value of the Bermudan, or suffice to set-up subportfolio Πm+1 in case the Bermudan is continued. The advantage of the latter approach is that only Πm(t) needs to be priced in order to value the Bermudan, rather than i=mM1Πi(t). A disadvantage is that the replication is only semi-static and needs rebalancing on each monitor date Tm.

3.4. Estimation of the continuation value

How the continuation value as given in (Equation11) is computed, depends on the specification of the model and the selection of the regression asset. In the general case, the portfolio Πm can be interpreted as a portfolio of arithmetic basket options. The valuation of a basket option is typically still not an easy exercise. However, with the appropriate constraints on the neural network structure used for the regression, the replicating portfolio can be reduced to a set of European options. A first approach would be to consider a one-dimensional regression asset, which corresponds to the design depicted in figure (b). This would be appropriate under a one-factor model. A second approach would be to constrain matrix A1 to have only one non-zero entry in each row. This corresponds to the neural network design depicted in figure (c). In the numerical examples included in this work, we will be considering a 1-factor model environment. Generalizations to multi-factor models have been described in Hoencamp et al. (Citation2023). We will consider the following assumptions:

  • Regression asset: select zm(t)=Sm,M(t).

  • Scaling factor: set qm(xt)=1Am,M(t,xt).

Under the assumptions stated above, the (sub)portfolio Πm as defined in (Equation12), can be interpreted as a portfolio of European swaptions, written on the swap rate Sm,M(t).Footnote1 The valuation of Πm(t) for some t<Tm comes down to Πm(t) =MtEQ[Gm(zm(Tm);βj)MTmqm(xTm)|xt] =MtEQ[Am,M(Tm)i=1nw2,imax{w1,iSm,M(Tm)+bi,0}MTj|xt] =i=1nw2,iAm,M(t)Em,M[max{w1,iSm,M(Tm)+bi,0}|xt] where the expectation in the last line is taken under the annuity measure Qm,M associated to the numéraire Am,M(t). The expression above should be recognized as a linear combination of European swaption prices (see equation (Equation2)). The parameter w1,i distinguishes between a payer (w1,i>0) and receiver (w1,i<0) swaption. The parameter w2,i distinguishes between a long (w2,i>0) and a short (w2,i<0) position in the swaption contract.

3.5. Valuation of the replicating portfolio

Under the assumptions of section 3.4, portfolio Πm consists of European swaptions, which means the valuation comes down to pricing each individual option. We apply a coefficient freezing technique, described in Andersen and Piterbarg (Citation2010) and Schrager and Pelsser (Citation2006). With this technique, the swap rate process is approximated as a generalized arithmetic Brownian motion, by freezing the stochastic terms in its diffusion. The option's implied volatility can then be approximated by integrating the ‘frozen’ diffusion coefficient. We briefly summarize the method below, and refer to Andersen and Piterbarg (Citation2010) and Schrager and Pelsser (Citation2006) for details.

Consider the dynamics of the swap rate. By Itô's lemma, it can be established that Sm,M satisfies the SDE dSm,M(t)=()dt+[DxSm,M(t)]η(t)dWt Considering the annuity as numéraire, the swap rate can regarded as the quotient of some tradable assets and the numéraire itself. As a consequence, the swap rate is a Martingale under the annuity measure Qm,M (Jamshidian Citation1997). Hence, for 0tTTm we can write (13) Sm,M(T)=Sm,M(t)+tTΣm,M(u)η(u)dWum,M(13) where Wm,M denotes a d-dimensional Qm,M Brownian motion and Σm,M(t):=[DxSm,M(t)]. Working out the diffusion for i=1,,d yields (14) Σm,M(t)=B(t,TM)P(t,TM)Am,M(t)B(t,Tm)P(t,Tm)Am,M(t)Sm,M(t)l=m+1MτlB(t,Tl)P(t,Tl)Am,M(t)(14) The diffusion coefficient is stochastic, due to the P(t,Ti)Am,M(t) terms. However, these terms again represent the quotient of a tradable asset and the numéraire. It is conjectured in Schrager and Pelsser (Citation2006), that these terms are Martingales with low quadratic variation. A good approximation is therefore achieved by freezing them at their time-t value, resulting in a deterministic diffusion coefficient Σ~m,M(u) for u[t,T]. It follows (15) Sm,M(T)S~m,M(T):=Sm,M(t)+tTΣ~m,M(u)η(u)dWum,M(15) The approximation S~m,M(T) conditioned on Ft is normally distributed. Applying Itô isometry allows us to approximate the implied volatility of an option written on Sm,M, which yields (16) σ~imp(t,T)=1TtEm,M[tTΣ~m,M(u)η(u)dWum,M22|xt](16) (17) =1TttTΣ~m,M(u)η(u)22du(17) Substituting σ~imp(t,T) into Bachelier's formula results is the desired swaption price.

4. Numerical algorithms and mathematical consistency

In this section, we describe the numerical routines for computing sensitivities along the Monte Carlo path. In short, we show how to compute the sensitivities of the replicating portfolio in a time-consistent manner. This involves the differentiation of the replicating instrument values (i.e. greeking) and the differentiation of the portfolio weights.

Our approach to the latter is inspired by the work of Jain et al. (Citation2019). In Jain et al. (Citation2019) a regress-later approach is considered, which utilizes ordinary least-square regression. As such, the weights of the basis-functions can be obtained explicitly as a function of the risk-factor and consistently differentiated. In this paper, only an implicit relation between the risk-factors and the portfolio weights is known through the minimized cost function (equation (Equation10)). Yet, we show that an estimator of the weight sensitivities can still be obtained through an application of the implicit function theorem.

An advantage of our approach is that it should be easier to evaluate sensitivities to implicit risk-factors, which are not directly modeled. In the context of MVA this concerns the sensitivities to the realizations of future market rates and future implied volatilities. With regard to static replication, the replicating portfolio is selected such that the required future Delta and Vega are tractable to compute. In contrast, with, for example, an application of AD on LSM, this may not be straight-forward. It would require the computation pseudo Jacobian-inverses or adding plenty pseudo-interpolation-nodes to the term-structure of zero-rates and instantaneous volatilities, which may be a tedious task.

4.1. Notation and assumtions

Let t[0,T¯) and let Vt denote the time-t risk-neutral value of a Bermudan swaption with strike K and tenor structure T0,T1,TM, conditioned on the event that the option is not yet exercised at t. Next to the Bermudan swaption, we consider a static replicating portfolio Π consisting of European swaptions. We write Π(t)={i=0M1Πi(t)if 0tT0i=mM1Πi(t)if Tm1<tTm0if TM1<t where Πm denotes a subportfolio of swaptions, written on the swap rate Sm,M. The subportfolios are defined as Πm(t):=wmπm(t)=i=1nwm,iπm,i(t)πm,i(t)=Am,M(t)Em,M[(δm,i(Sm,M(Tm)Km,i|)|)+|xt] Where we denote πm(t):=(πm,1(t),,πm,n(t)), wm:=(wm,1,,wm,n) and km:=(Km,1,,Km,n)

To keep computations feasible, we will consider two assumptions:

  • The time-zero zero-rate curve τR(0,τ) is assumed to be an interpolation between nodes {R0,R1,,RK}, with Rk:=R(0,τk). The nodes are assumed to be the result of a calibration bootstrap against market rates, such as swap- or deposit rates.

  • The instantaneous volatility η(t) is assumed to be piece-wise constant between dates {τ0,τ1,,τK}. The volatility level ηk between τk1 and τk, is assumed to be the result of a bootstrap against market implied volatilities σk of European swaptions with maturity τk.

4.2. Sensitivities along the monte carlo path

In this work, we focus on two types of sensitivities, namely Delta and Vega. The time-zero sensitivities are specified as follows:

Delta:

Let Δ:=(Δ0,Δ1,,ΔK), where Δk denotes the sensitivity of V0 to an infinitesimal bump of the zero-rate node Rk. It is defined as Δk:=limϵ0V0(Rk+ϵ)V0(Rk)ϵ The impact of a bump in a zero-rate node, while keeping other nodes fixed, is graphically represented in figure (a).

Vega:

Let V:=(V0,V1,,VK), where Vk denotes the sensitivity of V0 to an infinitesimal bump of implied volatility σk corresponding to the swaption with maturity τk. It is defined as Vk:=limϵ0V0(σk+ϵ)V0(σk)ϵ A bump in σk, while keeping other implied volatilities fixed will impact the level of ηk and ηk+1. This impact is graphically represented in figure (b).

Figure 7. Impact of bumping the market rates (left) and implied volatilities (right) at t-zero (top) and a future simulation date (bottom). All bumps correspond to bucket τk=10. (a) Zero-rate bump at t = 0. (b) Instant. volatility bump t = 0. (c) Zero-rate bump at t = 7 and (d) Instant. volatility bump at t = 7.

Figure 7. Impact of bumping the market rates (left) and implied volatilities (right) at t-zero (top) and a future simulation date (bottom). All bumps correspond to bucket τk=10. (a) Zero-rate bump at t = 0. (b) Instant. volatility bump t = 0. (c) Zero-rate bump at t = 7 and (d) Instant. volatility bump at t = 7.

The sensitivities along the Monte Carlo path are understood as future-time generalizations of the quantities defined above. That is, future Delta Δk(t) is interpreted as the sensitivity of Vt to an infinitesimal bump of the hypothetical zero-rate node Rk(t):=R(t,t+τk), conditional on Ft. This is graphically represented in figure (c). Future Vega Vk(t) is interpreted as the sensitivity of Vt to an infinitesimal bump of the time-t implied volatility σk(t):=σimp(t,t+τk) corresponding to an ATM swaption maturing at t+τk, conditional on Ft. This is graphically represented in figure (d).

4.3. Computing the sensitivity

First we consider the general case and denote θt:=(θ1(t),θ2(t),,θK(t)|) where either θk(t):=Rk(t) or θk(t):=σk(t). Later on we will distinguish between the two cases. Without loss of generality, assume that t(Tm1,Tm]. The aim is to compute DθΠ(t)|θ=θt=i=mM1DθΠi(t)|θ=θt We iteratively compute the sensitivities of each subportfolio, for i=M1,,m. Starting with i = M−1, a static replication of VTM1 is achieved with a single European swaption, i.e. ΠM1:=πM1,1. Thus, setting wM1,1=1 and KM1,1=K. The Delta and Vega simply correspond to the sensitivities of a European swaption, i.e. DθΠM1(t)=(πM1(t)θ1,,πM1(t)θK) Next, consider the case i = m<M−1 and assume the sensitivities DθΠm+1(t), ,DθΠM1(t) are known. With slight abuse of notation, we have according to the chain rule the following relation (18) DθΠm(t)=Πm(t)θoptionsDelta/Vega+Πm(t)wmwmθweightsensitivities+Πm(t)kmkmθstrikesensitivities(18) We detail each term below:

Options Delta/Vega

The first term represents the standard Delta or Vega of the options composing the portfolio, i.e. (19) Πm(t)θ=wmDθπm(t)(19)

Weight sensitivities

The weights wm are optimized, such that the portfolio payoff matches a target function (equation (Equation8)). This target function is sensitive to parameter changes in the zero-rate curve or volatility term-structure. As a consequence we need to consider the weight sensitivities, i.e. (20) Πm(t)wmwmθ=πm(t)Dθwm(20)

Strike sensitivities

For the same reason as for the weights, the sensitivities for the strikes should be considered. However, we conjecture the impact of the strike sensitivities is relatively small, which is supported by our numerical experiments. We will ignore this term for the remainder of this work.

A further derivation of the Delta/Vega sensitivities is the subject of sections 4.4 and 4.5. A further derivation of the weight sensitivities is subject of section 4.6.

4.4. The delta sensitivities

We start by working out the terms of the form Dθπm and consider the case θk(t)=Rk(t):=R(t,t+τk). We perform the derivation for the general setting 0tTTm and aim to compute Dθπm(T)|θ=θt=(πm,1(T)R1(t)πm,1(T)RK(t)πm,n(T)R1(t)πm,n(T)RK(t)) The Dθπm in (Equation19) corresponds to the case t = T. The case t<T will serve as a path-wise sensitivity estimator, which will be required to compute the weight sensitivities in the subsequent section.

We first characterize the forward Delta sensitivity of a zero-coupon bond in the following lemma.

Lemma 4.1

Let 0tT0T1 and assume that the zero-rate curve τR(t,t+τ) is an interpolation between nodes {R0(t),,RK(t)}, where Rk(t):=R(t,t+τk). Then the forward Delta sensitivity is given by P(T0,T1)Rk(t)=αk(τ0,τ1)P(T0,T1) where τi=Tit and αk denotes an Ft-measurable coefficient, dependent on the interpolation scheme of the zero-rate nodes. If the interpolation is linear, the coefficient is given by αk(τ0,τ1):=αk(τ1)αk(τ0),αk(τ):={ττk1τkτk1τif τ(τk1,τk]τk+1ττk+1τkτif τ(τk,τk+1]0otherwise

A proof to this lemma is given in appendix 1. Now, for brevity denote Hm,i(t):=Em,M[(δm,i(Sm,M(Tm)Km,i)|)+|Ft]. Then by an application of the chain rule and the product rule, we can write (21) πm,i(T)Rk(t)=Am,M(T)Hm,i(T)Sm,M(T)BachelierDeltaSm,M(T)Rk(t)Swapratesensitivity+Hm,i(T)Am,M(T)Rk(t)Annuitysensitivity(21) Working out each of the three partial derivatives above, we find

Bachelier Delta

Under the swap rate dynamics approximation given in (Equation15), this term is computed as the standard Bachelier Delta sensitivity, i.e. (22) Hm,i(T)Sm,M(T)=δm,iΦ(δm,id),d=Sm,M(T)Km,iσimp(T,Tm)TmT(22)

Annuity sensitivity

From Lemma 4.1 it directly follows Am,M(T)Rk(t)=j=m+1Mτjαk(Tt,Tjt)P(T,Tj)

Swap rate sensitivity

Applying the chain rule and Lemma 4.1, it follows Sm,M(T)Rk(t)=αk(Tt,Tmt)P(T,Tm)αk(Tt,Tmt)×P(T,TM)Am,M(T)Sm,M(T)j=m+1Mτjαk(Tt,Tjt)P(T,Tj)Am,M(T)

4.5. The vega sensitivities

Now consider the case θk(t)=σk(t):=σimp(t,t+τk). Just as for Delta, we perform the derivation for the general setting 0tTTm and aim to compute Dθπm(T)|θ=θt=(πm,1(T)σ1(t)πm,1(T)σK(t)πm,n(T)σ1(t)πm,n(T)σK(t)) Again, the Dθπm in (Equation19) corresponds to the case t = T. The case t<T will serve as a path-wise sensitivity estimator, which will be required to compute the weight sensitivities in the subsequent section.

We first characterize the forward implied volatility Vega in the following two lemmas.

Lemma 4.2

Let Sm,M satisfy the Gaussian approximation of (Equation15) and denote σ~(t,T):=1TttTΣm,M(u)η(u)22du for 0tTTm. Then σimp(T,Tm)σk(t)=(Tmt)σ~(t,Tm)σ~(t,Tm)σk(t)(Tt)σ~(t,T)σ~(t,T)σk(t)(TmT)σimp(T,Tm)

A proof is given in appendix 2. The sensitivity of σ~(t,T) w.r.t. σk(t) is an Ft-measurable random variable, which is dependent on the calibration procedure of the instantaneous volatility η(t). If η1,,ηK are scalars obtained by a bootstrap procedure, the sensitivities can be further characterized by the following lemma.

Lemma 4.3

Let η(t) be scalar-valued and piece-wise constant. Let Σm,M be as in (Equation14) and let Σk denote the diffusion of the swap rate underlying the option corresponding to σk(t):=σimp(t,t+τk). Then the forward implied volatility Vega is given by σ~(t,T)σk(t)=αk(T)τkσk(t)τσ~(t,T) where τ:=Tt and αk denotes an Ft-measurable coefficient. If by approximation Σm,MΣkΣk+1, the coefficient is given by αk(T)={Tk1TΣm,M2(u)duTk1TkΣk2(u)duif Tk1<TTkTTk+1Σm,M2(u)duTkTk+1Σk+12(u)duif Tk<TTk+10otherwise where we denote Tk:=t+τk.

A proof is given in appendix 3. By an application of the chain rule, we obtain (23) πm,i(T)σk(t)=πm,i(T)σimp(T,Tm)BachelierVegaσimp(T,Tm)σk(t)Imp.vol.sensitivity+πm,i(T)Sm,M(T)BachelierDeltaSm,M(T)σk(t)Swapratesensitivity(23) Considering each of the partial derivatives above, we find

  • Bachelier Vega Under the swap rate approximation given in (Equation15), this term can be computed as the Bachelier Vega sensitivity, i.e. πm,i(T)σimp(T,Tm)=Am,M(T)TmTφ(d),d=Sm,M(T)Km,iσimp(T,Tm)TmT

  • Bachelier Delta Under the swap rate approximation given in (Equation15), this term can be computed as the Bachelier Delta sensitivity. Hence, similar to (Equation22), we have (24) πm,i(T)Sm,M(T)=δm,iAm,M(T)Φ(δm,id),d=Sm,M(T)Km,iσimp(T,Tm)TmT(24)

  • Imp. vol. sensitivity The forward implied volatility sensitivity is characterized in Lemma 4.2. Under the assumptions of Lemma 4.3, the sensitivity can be approximated as σimp(T,Tm)σˆk(t)=(|αk(Tm)αk(T))τkσk(t)(TmT)σimp(T,Tm)

  • Swap rate sensitivity Under the swap rate approximation given in (Equation15), we can write Sm,M(T)Sm,M(t)+σ~(t,T)(WTm,MWtm,M) where σ~(t,T):=1TttTΣm,M(u)η(u)22du. It follows that the forward swap rate sensitivity can be estimated as Sm,M(T)σk(t)=Sm,M(T)Sm,M(t)σ~(t,T)σ~(t,T)σk(t) Under the assumptions of Lemma 4.3, this can be further approximated as Sm,M(T)σk(t)=τkαk(T)Tt(Sm,M(T)Sm,M(t)|)

4.6. The weight sensitivities

The weights and strikes of Πm are obtained through an optimization scheme as described in section 3.1. The objective function for this optimization can equivalently be expressed as Lm(w,k)=EQ[(wπm(Tm)gm(Tm))2] where we denoted gm(t):=fm(xt)qm(xt). The parameters wm and km are determined by (25) (wm,km)=argminw,k{Lm(w,k)|w,kRn}(25) Assuming Lm attains a local minimum in (wm,km), it follows by the first order optimality condition that Lm=0. In particular, we have λm(wm):=(Lmw1,,Lmwn)|w=wm=0 where we can compute λm(w)=EQ[Dw(wπm(Tm)gm(Tm))2]=2EQ[(wπm(Tm)gm(Tm))πm(Tm)] In order to characterize the weight sensitivities, we first consider the Jacobian of λm (i.e. the Hessian of Lm w.r.t. w). The Jacobian can be obtained by again differentiating the expression above, which yields (26) Dwλm=2EQ[Dw(wπm(Tm)gm(Tm))πm(Tm)](26) (27) =2EQ[πm(Tm)πm(Tm)](27) Hence, note that Dwλm is specified by the auto-correlation matrix of the random vector πm. In other words, it depends on the cross-sectional co-variance matrix of the portfolio payoffs. With this expression at hand, we characterize the weight sensitivities in the following proposition

Proposition 4.4

Consider a swaption portfolio Πm written on swap rate Sm,M and let its weights and strikes satisfy equation (Equation25). Furthermore assume that the strikes K1,,Kn are all distinct. Then the derivative of wm w.r.t. θt is well-defined and its Jacobian matrix at (wm,θt) is given by (28) wmθt=(Dwλm)1Dθλm|w=wm,θ=θt(28)

This result is a consequence of the implicit function theorem. A rigorous proof is given in appendix 4.

Remark 1

The first term in (Equation28) is given by the inverse of the matrix in (Equation27). Note that this term is both path-independent and time-independent. Hence, although its computation requires numerical estimation and matrix inversion, it can be used along each MC path, at each time-step without nested re-evaluation.

What remains to be done is computing Dθλm. We therefore compute Dθλm=2EQ[πmDθ(wπm(Tm)gm(Tm))]+2EQ[(wπm(Tm)gm(Tm))Dθπm(Tm)] Note that the second term is zero if subportfolio Πm yields a perfect fit to the target function gm. Therefore, we can expect this term to be small and we will ignore it. Working out the first term, we find Dθλm2EQ[πmwDθπm(Tm)]2EQ[πmDθgm(Tm)] In the first term of the expression above, Dθπm(Tm) represent the Deltas/Vegas of subportfolio Πm. This term needs to be numerically estimated. The path-wise estimator has been derived in sections 4.4 and 4.5. The second term also needs to be numerically estimated. The path-wise estimator involves Dθgm(Tm), where gm(t):=fm(xt)qm(xt)=max{hm(xt)i=m+1M1Πi(t),0}. Working out this term and denoting the event Em:={hm(xTm)>i=m+1M1Πi(Tm)} yields (29) Dθgm(Tm)=1EmDθhm(xTm)1Emi=m+1M1DθΠi(Tm)(29) In the expression above, the term Dθhm(xTm) is equivalent to Dθπm,0(Tm) with Km,0=K. The terms DθΠi(Tm) for i=m+1,,M1 were assumed to be known from previous iterations.

4.7. The algorithms

Below we summarize the algorithms for composing the replicating portfolios (Algorithm 1) and computing the path-wise sensitivities (Algorithm 2). For the execution of the path-wise sensitivity algorithm, it is assumed that the static replication has been performed and subportfolios Π0,,ΠM1 have been derived from the trained parameters β0,,βM1.

5. Numerical experiments

We demonstrate the performance of the algorithms presented in this work by considering several case studies. We select a one-factor Gaussian model with piece-wise constant instantaneous volatility to perform our numerical experiments (see Andersen and Piterbarg Citation2010). That is, we consider state-variable dynamics given by dxt=(ϕ(t)axt)dt+η(t)dWt,x0=0ϕ(t)=0te2a(tu)η(u)durt=f(0,t)+xt where the scalar a denotes a constant mean-reversion rate, Wt is a one-dimensional Brownian motion under Q and f(t,T):=logR(t,T)T denotes the instantaneous forward rate. The time-zero yield curve is assumed to be a linear interpolation between zero-rates with tenors τk as given in table . We consider a flat yield curve with all zero-rates equal to 3%. The instantaneous forward rate is assumed to be piece-wise constant between the tenor dates τk. Each volatility level ηk is bootstrapped against an ATM European payer swaption with maturity τk and implied volatility σk, such that σ~imp(0,τk)=σk. All implied volatility targets σk are set to 50 bps, resulting in the instantaneous volatilities presented in table . The mean-reversion rate is fixed a priori at a = 0.01. The tenors τk have been selected to coincide with the SIMM tenor buckets specified by ISDA (International Swaps and Derivatives Association, Inc Citation2020).

Table 1. Model parameters.

Sampling required for regression data, pricing, sensitivities, exposures and IM profiles is done using Monte Carlo simulation. The risk factor is simulated through an Euler discretization scheme. The discretization uses weekly time-steps, i.e. Δt=1/52.

As case studies, we consider a 1Y×5Y and a 1Y×10Y Bermudan receiver swaption with unit notional. The underlying of the T0×TM Bermudan is assumed to be a fixed-for-floating interest rate swap with fixed rate K and a tenor structure T={T0,,TM} of annual payments. Unless stated otherwise, the fixed rate K is selected to be ATM, i.e. K=KATM:=S0,M(0). The exercise dates of the Bermudan are set to coincide with the fixing dates of the underlying swap, i.e. T0 being the first and TM1 being the last exercise opportunity.

The composition of the static replication portfolio relies on the regression of feed-forward neural networks with a single hidden layer. Below, we list some details concerning the fitting procedure of the neural networks.

  • As an independent regression variable for Gm we select the swap rate Sm,M(Tm). The regression data is normalized before Gm is fitted to the data, to improve the regression robustness (see Hoencamp et al. Citation2023 for details).

  • Each network Gm is selected to have 8 hidden nodes.

  • Each regression is performed with a data set of 2000 independently sampled training points.

  • The neural network parameters are optimized using the AdaMax optimizer, which is a stochastic gradient-descent method (Kingma and Ba Citation2014). The batch-size (i.e. number of training points used per iteration) is set to 32. The learning rate (i.e. the step-size scaling per iteration) is set to 0.01.

  • The parameters of GM1 are randomly initialized from a uniform distribution U(1,1). For m<M−1, the parameters of Gm are initialized with the optimized parameters of Gm+1.

5.1. Pricing

We start by analyzing the pricing accuracy of the static replication. As each neural network has 8 hidden nodes, every regression yields a subportfolio of 8 European swaptions. Hence at t-zero, the 1Y×5Y Bermudan is replicated with 40 swaptions and the 1Y×10Y Bermudan is replicated with 80 swaptions.

5.1.1. Time-zero prices

Table  reports the V0 price estimates implied by the risk-neutral valuation of the static replication portfolio (SR) for different levels of moneyness. As a benchmark, we report price estimates obtained from the least-square method (LSM) introduced by Longstaff and Schwartz (Citation2001). As basis functions for the ordinary least-square regression, we use {1,xt,xt2,xt3}. The LSM estimate is the mean of 25 independent runs of 80 000 MC paths each. In brackets, we report the standard error generated by these 25 runs.

Table 2. V0 estimates Bermudan swaption for different levels of moneyness, presented in basis points of the notional.

From table , we observe a close correspondence between SR and LSM, in particular for ATM strikes. The accuracy slightly deteriorates for far OTM options. This can be attributed to a limited number of ITM paths at future monitor dates, yielding only a few non-zero data-points for the regression. SR is not subject to a standard error as the portfolios are evaluated at t-zero. SR is however subject to numerical errors due to the inaccuracy of the neural network regression. On the normalized data, the fits typically show a mean squared error in the order of 106105.

5.1.2. Exposure profiles

Consecutively, we consider the exposure profile and related CVA quantities of the Bermudan swaptions. Exposure depends on the future distribution of the option price. Figure  shows the expected positive exposure (EPE) simulated with the static replication next to an LSM implied benchmark. The benchmark is generated with the LSM-bundle algorithm, presented in Feng et al. (Citation2016). We briefly summarize this method below.

Figure 8. Expected positive exposure profile: LSM-bundle (orange) vs. static replication (blue). The LSM and static replication standard error windows are represented by the orange and blue shaded area, respectively, based on 25 independent runs of 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 8. Expected positive exposure profile: LSM-bundle (orange) vs. static replication (blue). The LSM and static replication standard error windows are represented by the orange and blue shaded area, respectively, based on 25 independent runs of 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

The LSM-bundle algorithm allows one to estimate the EPE of a Bermudan option when combined with the LSM algorithm of Longstaff and Schwartz (Citation2001). Within LSM, the continuation value Cm(Tm) is estimated by regressing realized (discounted) cashflows at Tm+1 against the risk-factor realizations at Tm on in-the-money MC paths. However, to compute the exposure, an unbiased estimate of the continuation value is required on each MC path on every exposure date (not just monitor dates). In Feng et al. (Citation2016) it is therefore proposed to first perform an LSM sweep. Then in a second sweep, for each exposure date t the MC paths are divided into two bundles. If t(Tm1,Tm], the two bundles are defined as: U1(t):={xt:hm(xt)>0},U2(t):={xt:hm(xt)0} Subsequently for both bundles an ordinary least-square regression is performed against the discounted realized cashflows that were computed in the first LSM sweep. The regression functions provide an unbiased estimate of the exposure for a given realization of the risk-factor in their respective domains. We refer to Feng et al. (Citation2016) for details. Note that for the SR method, no second sweep of regressions is required. Exposures are obtained by simply pricing the replicating portfolio along the MC path.

The EPE profiles in figure  represent the mean of 25 MC simulations of 3000 paths each. The shaded areas represent a window of one standard error for both respective methods. We observe the typical staircase nature that corresponds to the exposure profile of a Bermudan option. Each discontinuity represents the passing of an exercise opportunity, after which the option is exercised for a certain fraction of the scenarios. Similar to the time-zero prices, we observe a good correspondence between the SR estimates and the benchmark. Additionally, it is noted that the standard error for SR is significantly smaller compared to LSM-bundle. This can be attributed to the benefit of the regress-later approach of SR compared to the regress-now approach of LSM. In the latter, the continuation value is estimated by regressing the (noisy) realizations of the cashflow. In the former, the regression is performed against the deterministic option's target value (numerical errors aside). As a result, the MC noise is integrated out more efficiently with SR. In general, this indicates that a significantly smaller number of MC paths would be required to reach a similar level of accuracy.

We further study the convergence of the standard error, by considering the CVA statistic. To do so, we assume a constant LGD = 1 and constant hazard rate of λ(t)=0.01. The CVA is estimated using equation (Equation3). Figure  shows the mean and standard deviation of the CVA estimate generated with 25 independent MC runs, as a function of the number of MC paths per run. We observe that on average the SR standard error is a factor 4 smaller compared to the LSM-bundle benchmark. For an increasing number of paths, the two methods show a consistent convergence.

Figure 9. Convergence CVA for a 1Y×5Y (top) and 1Y×10Y (bottom) Bermudan swaption: LSM-bundle (orange) vs. static replication (blue). The LSM and static replication standard error windows are represented by the orange and blue shaded area, respectively.

Figure 9. Convergence CVA for a 1Y×5Y (top) and 1Y×10Y (bottom) Bermudan swaption: LSM-bundle (orange) vs. static replication (blue). The LSM and static replication standard error windows are represented by the orange and blue shaded area, respectively.

5.2. Sensitivities

We continue by analyzing the accuracy in the sensitivity computation for the static replication. We distinguish between time-zero sensitivities and future sensitivities along the MC path. The Delta and Vega sensitivities in this section are understood in accordance with the definitions provided in section 4.2.

5.2.1. Time-zero sensitivities

Figure  reports the time-zero Delta sensitivities for the 12 tenor buckets. The sensitivities have been scaled to represent the basis point value w.r.t. the zero-rate Rk. The SR results (presented in blue) are calculated in accordance with the Delta routine described in section 4.2. As a benchmark, we consider an LSM bump-and-reval estimate (presented in orange). For Delta, this means a zero-rate corresponding to one of the predefined tenors is bumped with one basis point and the yield curve is re-interpolated accordingly. The same seed is used for the LSM base run as for the bumped LSM run, to facilitate variance reduction.

Figure 10. Bucketed Delta sensitivities: LSM bump-and-reval (orange) vs. static replication (blue). The LSM benchmark is based on 25 independent runs of 80,000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 10. Bucketed Delta sensitivities: LSM bump-and-reval (orange) vs. static replication (blue). The LSM benchmark is based on 25 independent runs of 80,000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure  reports the time-zero Vega sensitivities for the 12 tenor buckets. These sensitivities have been scaled to represent the basis point value w.r.t. implied volatility σk. The bump-and-reval benchmark for Vega is obtained by bumping one of the implied volatilities with one basis point and re-bootstrapping the volatilty term-structure accordingly. Also here the same seed is used for the base and the bumped LSM run.

Figure 11. Bucketed Vega sensitivities: LSM bump-and-reval (orange) vs. static replication (blue). The LSM benchmark is based on 80,000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 11. Bucketed Vega sensitivities: LSM bump-and-reval (orange) vs. static replication (blue). The LSM benchmark is based on 80,000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

The results shown is figures  and are the mean of 25 independent runs of 80,000 paths each. For the LSM bump-and-reval, this means 13 LSM simulations were involved for each run (1 base simulation and 12 bumped simulations). The SR estimate only requires a single simulation, in order to estimate the weight sensitivities of the replicating portfolio. The swaption sensitivities can be readily evaluated. The error bars indicate the standard error of the LSM estimate. The SR standard errors were each 2 to 3 orders of magnitude smaller compared to LSM and are therefore not included in the figures. Overall, we observe a satisfactory correspondence between the SR sensitivities and the LSM benchmark. For the LSM Vega, we observe relatively large standard errors compared to Delta. This can be attributed to the fact that a bumped yield curve translates one-to-one into bumped zero-coupon bond prices. A bumped volatility term-structure only indirectly impacts the risk-factor distribution and is therefore more susceptible to a higher standard error.

Table  reports the SIMM Delta, Vega and Curvature margins computed with the sensitivity estimates of SR and LSM, respectively, for different levels of moneyness. In brackets we report the standard error of the LSM implied quantities. For the Delta and Vega margin estimates, we see the accuracy of the SR method confirmed. The Curvature margin LSM estimates demonstrate a significantly larger standard error in the relative sense. Curvature margin reflects the Gamma sensitivity of the option, which in SIMM is approximated through a Vega-Gamma relationship (International Swaps and Derivatives Association, Inc Citation2016). Second-order sensitivity approximations are notorious for amplifying numerical errors of the first order sensitivities, which we see reflected in the LSM benchmark. The SR estimate does not suffer from such standard errors.

Table 3. IM estimate Bermudan swaption.

The convergence of the margin estimates is demonstrated in figure . Here we present the mean and standard error as a function of the number of MC paths, obtained from 25 independent runs. We observe that the consistency of the SR method is confirmed for an increasing number of MC paths.

Figure 12. Convergence Delta (top), Vega (middle) and Curvature (bottom) IM: LSM bump-and-reval (orange) vs. static replication (blue). The LSM standard error window is represented by the shaded orange area. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 12. Convergence Delta (top), Vega (middle) and Curvature (bottom) IM: LSM bump-and-reval (orange) vs. static replication (blue). The LSM standard error window is represented by the shaded orange area. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 13. Bucketed Delta (left) and Vega (right) profiles for static replication of a European 5Y×5Y swaption. SR sensitivities (solid) are shown next to an exact benchmark (dashed). (a) Delta profiles and (b) Vega profiles.

Figure 13. Bucketed Delta (left) and Vega (right) profiles for static replication of a European 5Y×5Y swaption. SR sensitivities (solid) are shown next to an exact benchmark (dashed). (a) Delta profiles and (b) Vega profiles.

5.2.2. Future sensitivities: European swaption

We finally analyze the expected future sensitivity profiles of a Bermudan swaption. Calculation of expected future sensitivities through the LSM bump-and-reval method would impose a severe computational burden. A nested MC simulation is required for each time-step, on each MC path, for each sensitivity. Benchmarking future sensitivity profiles is hence not feasible with LSM and out of scope for this paper.

To verify the accuracy of the SR method along the MC path, we therefore start by considering a 5Y×5Y European payer swaption. This can be considered a special case of the Bermudan, with only a single exercise opportunity at T0. We proceed to replicate this swaption, while selecting the 15Y swap rate as regression asset. As a result, the 5Y×5Y swaption is replicated with a portfolio of 5Y×15Y swaptions. The same neural network structure is used as for the Bermudans. Subsequently, we analyze the sensitivity profiles of the replicating portfolio, which we benchmark with the analytical sensitivities of the swaption.

The results are displayed in figure . The Vega profiles on the right show a near-perfect match between SR and the benchmark. We do observe some numerical errors in the Delta 10Y, 15Y and 20Y buckets. Around t = 0, the latter two are dominant for the 5Y×15Y swaptions in the replicating portfolio, but zero for the target 5Y×5Y swaption. If the replication is perfect, the sensitivities in this bucket cancel with the weight sensitivities. Here we note that some small errors remain. Yet, overall, we observe a satisfactory agreement between SR and the analytical profiles.

5.2.3. Future sensitivities: Bermudan swaption

Figure  shows the expected future Delta sensitivity profiles of a Bermudan swaption. The profiles are generated by taking the mean of 25 independent MC runs of 3000 paths each. The shaded areas (where visible) represent a single standard error window. As the underlying is a receiver swap, we note that the Delta profiles are mostly negative. The tenor buckets surrounding the time until maturity of the underlying swap are dominant. As time passes, the time to maturity decreases and the dominant sensitivities shift from one bucket to the next, implying the triangular-shaped profiles. We see this for example reflected in the (red-colored) Delta-profile of the 5Y bucket in figure (a). In the first year, the sensitivity grows (in the absolute sense) to its maximum until the maturity of the underlying swap is exactly 5Y away. Then, the sensitivity of this bucket starts to drop and flows into the 3Y bucket. The discontinuous jumps signify the passing of an exercise opportunity. Despite the small number of paths, we observe a satisfactory low standard error.

Figure 14. Bucketed Delta profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 14. Bucketed Delta profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure  shows the corresponding expected Vega profiles and their standard error window. The sensitivities are scaled by the implied volatility in basis points of each corresponding bucket, which is in line with the SIMM routines. The Vega sensitivities are bucketed according to their time to expiry. Hence, we observe triangular patterns resulting from Vegas shifting from one bucket into the next as the time until the future monitor dates decreases. Also here, we observe that the shaded areas representing the standard errors are minor or even invisible in the figure, even though the number of paths is relatively small.

Figure 15. Bucketed Vega profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 15. Bucketed Vega profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure  shows the expected initial margin profiles, obtained from the simulated sensitivity distributions. The shaded areas again denote a single standard error window. The figure shows the separate profiles corresponding the Delta-, Vega- and Curvature margin. The sum of these three yields the total expected initial margin as a function of time. From this plot, we observe that overall the Delta margin is dominant for the computation of IM and MVA. For Delta, we observe the typical staircase profile. Each discontinuity represents the passing of an exercise date and the subsequent decrease in the option's total Delta sensitivity. The volume of Vega scales with the square root of the times until expiry of the underlying options. Hence the smooth decay of the Vega margin over time, with similar jumps after each monitor date. The contribution of the curvature margin to the total IM and MVA is very limited. In terms of shape, the profile shows a similar nature as Delta.

Figure 16. Delta (green), Vega (orange) and Curvature (blue) margin profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 16. Delta (green), Vega (orange) and Curvature (blue) margin profiles for static replication. The standard error windows are represented by the shaded area, based on 3000 MC paths. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

We study the convergence of the margin estimates by considering MVA as a function of the number of MC paths. Figure  shows the Delta-, Vega- and Curvature components of MVA in separate plots. We report the mean and standard error, obtained from 25 independent runs. The MVA estimates have been generated from the future IM distributions, while assuming a constant funding spread of FS(t)=0.01. From the figures, we observe a consistent convergence and see the low standard errors of the sensitivity profiles confirmed. We conclude that generally, 3000 MC paths suffice to guarantee a relative standard error of below 1%.

Figure 17. Convergence Delta (top), Vega (middle) and Curvature (bottom) MVA for static replication. The standard error windows are represented by the shaded areas. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

Figure 17. Convergence Delta (top), Vega (middle) and Curvature (bottom) MVA for static replication. The standard error windows are represented by the shaded areas. (a) 1Y×5Y Bermudan and (b) 1Y×10Y Bermudan.

6. Conclusion

This paper presented a static replication algorithm for Bermudan swaptions under an affine term-structure model. We showed that under the appropriate conditions, one can construct a portfolio of European swaptions that mirrors the value of the exotic instrument until it is either exercised or matured. This replication is static, i.e. despite the early-exercise features of the instrument, the portfolio does not need to be re-balanced throughout the lifetime of the original contract. We continued to derive efficient estimators for the price and sensitivities of the portfolio along the Monte Carlo path. These quantities are essential for the quantification of modern risk metrics such as exposure, CVA, IM and MVA. Exploiting closed-form price and sensitivity approximations for European swaptions allowed us to avoid cumbersome nested simulations that are associated with a naive bump-and-reval approach. Moreover, this enabled us to accurately estimate high dimensional, bucketed Delta- and Vega sensitivities, despite the fact that the risk-factors may be embedded in a low dimensional model. This yielded a richer handle for sensitivity modeling compared to, for example, a parallel shift in the model variables. Through several representative numerical examples, we demonstrated the performance of our approach under a one-factor model, benchmarked to the established least-square Monte Carlo method of Longstaff and Schwartz (Citation2001). Overall, we observed superior convergence for the static replication method with regard to exposures, sensitivities and IM statistics. Thus, we demonstrated static replication to be a suitable alternative for exposure, dynamic IM and MVA quantification.

Disclosure statement

The opinions expressed in this work are solely those of the authors and do not represent in any way those of their current and past employers. No potential conflict of interest was reported by the authors.

Additional information

Funding

This project has received funding from the NWO under the Industrial Doctorates grant.

Notes

1 For simplicity we will ignore the zero-coupon bond in the replicating portfolio (see section 3.2). This is achieved by constraining b2=0. Our numerical experiments indicate a minimal impact on the accuracy of the replication.

References

  • Andersen, L.B.G. and Piterbarg, V.V., Interest Rate Modeling, Volume II: Term Structure Models, 2010 (Atlantic Financial Press: London, UK).
  • Andersson, K. and Oosterlee, C.W., A deep learning approach for computations of exposure profiles for high-dimensional bermudan options. Appl. Math. Comput., 2021, 408, 126332.
  • Antonov, A., Issakov, S. and McClelland, A., Efficient SIMM-MVA calculations for callable exotics. Available at SSRN 3040061, 2017.
  • Bank for International Settlements, Counterparty credit risk definitions and terminology, 2019.
  • Burgard, C. and Kjaer, M., Funding strategies, funding costs. Risk, 2013, 26, 82.
  • Capriotti, L., Fast Greeks by algorithmic differentiation. J. Comput. Finance, 2011, 14, 3.
  • Capriotti, L., Jiang, Y. and Macrina, A., AAD and least-square Monte Carlo: Fast bermudan-style options and xVA Greeks. Algorithmic Finance, 2017, 6, 35–49.
  • Carriere, J.F., Valuation of the early-exercise price for options using simulations and nonparametric regression. Insur. Math. Econ., 1996, 19, 19–30.
  • Caspers, P. and Lichters, R., Initial margin forecast-bermudan swaption methodology and case study. Available at SSRN 3132008, 2018.
  • De Graaf, C.S., Feng, Q., Kandhai, D. and Oosterlee, C.W., Efficient computation of exposure profiles for counterparty credit risk. Int. J. Theor. Appl. Finance, 2014, 17, 1450024.
  • Duffie, D. and Kan, R., A yield-factor model of interest rates. Math. Financ., 1996, 6, 379–406.
  • Feng, Q., Jain, S., Karlsson, P., Kandhai, D. and Oosterlee, C.W., Efficient computation of exposure profiles on real-world and risk-neutral scenarios for bermudan swaptions. Available at SSRN 2790874, 2016.
  • Filipovic, D., Term-Structure Models. A Graduate Course, 2009 (Springer: Berlin, Heidelberg).
  • Fries, C.P., Fast stochastic forward sensitivities in Monte Carlo simulations using stochastic automatic differentiation (with applications to initial margin valuation adjustments). J. Comput. Finance, 2019, 22, 103–125.
  • Giles, M. and Glasserman, P., Smoking adjoints: Fast Monte Carlo Greeks. Risk, 2006, 19, 88–92.
  • Glasserman, P., Monte Carlo Methods in Financial Engineering, Vol. 53, 2004 (Springer: Berlin, Heidelberg).
  • Glasserman, P. and Yu, B., Simulation for American options: Regression now or regression later? In Monte Carlo and Quasi-Monte Carlo Methods 2002: Proceedings of a Conference held at the National University of Singapore, Republic of Singapore, 25–28 November, 2002, pp. 213–226, 2004, (Springer: Berlin, Heidelberg).
  • Glau, K., Pachon, R. and Pötz, C., Speed-up credit exposure calculations for pricing and risk management. Quant. Finance, 2021, 21, 481–499.
  • Green, A., XVA: Credit, Funding and Capital Valuation Adjustments, 2015 (John Wiley & Sons: Hoboken, NJ).
  • Green, A.D. and Kenyon, C., Mva: Initial margin valuation adjustment by replication and regression. Available at SSRN 2432281, 2015.
  • Gregory, J., The XVA Challenge: Counterparty Risk, Funding, Collateral, Capital and Initial Margin, 2020 (John Wiley & Sons: Chichester, UK).
  • Griewank, A. and Walther, A., Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2008 (SIAM: Philadelphia, USA).
  • Harrison, J.M. and Pliska, S.R., Martingales and stochastic integrals in the theory of continuous trading. Processes and Their Applications¡/DIFdel¿Stoch. Process. Their. Appl., 1981, 11, 215–260.
  • Hoencamp, J., Jain, S. and Kandhai, D., A semi-static replication method for bermudan swaptions under an affine multi-factor model. Risks, 2023, 11, 168.
  • International Swaps and Derivatives Association, Inc, Isda SIMM: From principles to model specification, 2016.
  • International Swaps and Derivatives Association, Inc, Isda SIMM: Methodology, version 2.3, 2020.
  • International Swaps and Derivatives Association, Inc, Isda margin survey year-end 2021, 2021.
  • Jain, S. and Oosterlee, C.W., The stochastic grid bundling method: Efficient pricing of bermudan options and their Greeks. Appl. Math. Comput., 2015, 269, 412–431.
  • Jain, S., Leitao, Á. and Oosterlee, C.W., Rolling adjoints: Fast Greeks along monte carlo scenarios for early-exercise options. J. Comput. Sci., 2019, 33, 95–112.
  • Jamshidian, F., Libor and swap market models and measures. Finance Stoch., 1997, 1, 293–330.
  • Jeanblanc, M. and Li, L., Characteristics and constructions of default times. SIAM J. Financ. Math., 2020, 11, 720–749.
  • Joshi, M. and Kwon, O.K., Least squares monte carlo credit value adjustment with small and unidirectional bias. Int. J. Theor. Appl. Finance, 2016, 19, 1650048.
  • Kappen, C., Computing MVA via regression and principal component analysis, SSRN, 2017.
  • Karlsson, P., Jain, S. and Oosterlee, C.W., Counterparty credit exposures for interest rate derivatives using the stochastic grid bundling method. Appl. Math. Finance, 2016, 23, 175–196.
  • Kingma, D.P. and Ba, J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.
  • Lakhany, A. and Zhang, A., Efficient ISDA initial margin calculations using least squares monte-carlo, 2021, arXiv preprint arXiv:2110.13296.
  • Lokeshwar, V., Bharadwaj, V. and Jain, S., Explainable neural network for pricing and universal static hedging of contingent claims. Appl. Math. Comput., 2022, 417, 126775.
  • Longstaff, F.A. and Schwartz, E.S., Valuing American options by simulation: A simple least-squares approach. Rev. Financ. Stud., 2001, 14, 113–147.
  • Lyashenko, A. and Mercurio, F., Looking forward to backward-looking rates: A modeling framework for term rates replacing LIBOR. Available at SSRN 3330240, 2019.
  • Musiela, M. and Rutkowski, M., Martingale Methods in Financial Modeling, 2nd ed., 1997 (Springer: Berlin, Heidelberg).
  • Schrager, D.F. and Pelsser, A.A., Pricing swaptions and coupon bond options in affine term structure models. Math. Financ., 2006, 16, 673–694.
  • Shen, Y., Van Der Weide, J.A. and Anderluh, J.H., A benchmark approach of counterparty credit exposure of bermudan option under Lévy process: The Monte Carlo-cos method. Procedia. Comput. Sci., 2013, 18, 1163–1171.
  • Simaitis, S., de Graaf, C., Hari, N. and Kandhai, D., Smile and default: The role of stochastic volatility and interest rates in counterparty credit risk. Quant. Finance, 2016, 16, 1725–1740.
  • Tsitsiklis, J.N. and Van Roy, B., Regression methods for pricing complex American-style options. IEEE Trans. Neural Netw., 2001, 12, 694–703.
  • Zeron, M. and Ruiz, I., Dynamic initial margin via Chebyshev spectral decomposition, Tech. Rep., Working paper (24 August), 2018.
  • Zhu, S.H. and Pykhtin, M., A guide to modeling counterparty credit risk, GARP Risk Review, July/August 2007.

Appendices

Appendix 1.

Proof of Lemma 4.1

Proof.

Consider the process yt:=P(t,T1)P(t,T0) As y represents a tradable asset denominated by the numéraire, it follows that y is a Martingale under QT0. An application of Itô's lemma and a subsequent measure change shows that y satisfies the following SDE dyt=yt(B(t,T0)B(t,T1))η(t)dWtT0 where WT0 denotes a d-dimensional Brownian motion under QT0. Denoting ηˆ(t):=(B(t,T0)B(t,T1))η(t) and solving the SDE, conditioned on Ft yields P(T0,T1):=yT0=P(t,T1)P(t,T0)etT0ηˆ(u)dWuT0 Let τ0=T0t and τ1=T1t. Note that by the definition of the zero-rate, the expression above can be rewritten as P(T0,T1)=exp{τ0R(t,t+τ0)τ1R(t,t+τ1)+tT0ηˆ(u)dWuT0} Therefore, the forward Delta sensitivity can be computed as P(T0,T1)Rk(t)=(τ0R(t,t+τ0)Rk(t)τ1R(t,t+τ1)Rk(t))P(T0,T1) where αk(τi):=τiR(t,t+τi)Rk(t) denotes an Ft-measurable coefficient. Under the assumption that the time-t zero-rate curve τR(t,t+τ) is an interpolation between nodes {R0(t),,RK(t)}, it follows that R(t,t+τ0)Rk(t)={ττk1τkτk1if τ(τk1,τk]τk+1ττk+1τkif τ(τk,τk+1]0otherwise This concludes the proof.

Appendix 2.

Proof of Lemma 4.2

Proof.

Let 0tTTm. Note that we can write σimp(T,Tm)=1TmTTTmΣ~m,M(u)η(u)22du=1TmT(tTmΣ~m,M(u)η(u)22dutTΣ~m,M(u)η(u)22du)=1TmT(Tmt)σ~2(t,Tm)(Tt)σ~2(t,T) Subsequently applying the chain rule to this expression yields σimp(T,Tm)σk(t)=1TmT(Tmt)σ~2(t,Tm)σk(t)(Tt)σ~2(t,T)σk(t)2(Tmt)σ~2(t,Tm)(Tt)σ~2(t,T)=(Tmt)σ~(t,Tm)σ~(t,Tm)σk(t)(Tt)σ~(t,T)σ~(t,T)σk(t)(TmT)σ~imp(T,Tm) This concludes the proof.

Appendix 3.

Proof of Lemma 4.3

Proof.

Recall that it is assumed that the instantaneous volatility in (Equation1) takes values in R and is piece-wise constant between dates {τ0,,τK}. Furthermore it is assumed that the volatility levels η1,,ηK are obtained through a bootstrap procedure, such that the implied volatility approximations σ~imp given in (Equation17) agree with the implied volatilities σk observed in the market. Lastly, note that only ηk and ηk+1 are sensitive to a change in σk if other market-implied volatilities are kept fixed. For compactness of notation, we show the proof for t = 0. The generalization to t>0 follows the same line of thought.

First consider the case τk1T<τk. Note that we can write (A1) τkσk2=0Tk1Σk2(u)η2(u)du+ηk2Tk1TkΣk2(u)duηk2=τkσk20Tk1Σk2(u)η2(u)duTk1TkΣk2(u)du(A1) Furthermore, we have the relation σ~(0,T)=1T(0Tk1Σm,M2(u)η2(u)du+ηk2Tk1TΣm,M2(u)du) It follows by the chain rule σ~(0,T)σk=12σ~(0,T)1Tσk×(0Tk1Σm,M2(u)η2(u)du+ηk2Tk1TΣm,M2(u)du)=12Tσ~(0,T)(Tk1TΣm,M2(u)du)ηk2σk=τkσkTσ~(0,T)Tk1TΣm,M2(u)duTk1TkΣk2(u)du where the last equality follows from substituting ηk2 with (EquationA1).

The case τkT<τk+1 follows a similar line of thought. Note that we can write (A2) τk+1σk+12=0Tk1Σk+12(u)η2(u)du+ηk2Tk1TkΣk+12(u)du+ηk+12TkTk+1Σk+12(u)duηk+12=τk+1σk+120Tk1Σk+12(u)η2(u)duηk2Tk1TkΣk+12(u)duTk1TkΣk+12(u)du(A2) Furthermore, we have the relation σ~(0,T)=1T(0Tk1Σm,M2(u)η2(u)du+ηk2Tk1TkΣm,M2(u)du+ηk+12TkTΣm,M2(u)du) Applying the chain rule to the expression above, while substituting ηk2 and ηk+12 with (EquationA1) and (EquationA2), respectively, we find σ~(0,T)σk=τkσkTσ~(0,T)×(Tk1TkΣm,M2(u)duTk1TkΣk2(u)duTkTΣm,M2(u)duTkTk+1Σk+12(u)duTk1TkΣk+12(u)duTk1TkΣk2(u)du) If by approximation Σm,MΣkΣk+1, the expression above reduces to σ~(0,T)σk=τkσkTσ~(0,T)TTk+1Σm,M2(u)duTkTk+1Σk+12(u)du

Appendix 4.

Proof of Proposition 4.4

Proof.

For compactness, we will write πm:=πm(Tm). First, note that πm,i is continuously differentiable w.r.t. θt for all i=1,,n. It follows that λm is continuously differentiable w.r.t. both θt and wm. Secondly, we show that Dwλm is non-singular. By the nature of the ReLu function, the payoffs πm,1,,πm,n are linearly independent when the strikes are disjoint. That is, there is no vRn such that vπm=0 a.s. Now consider the matrix EQ[πmπm]. As a property of the auto-correlation matrix, it is known to be positive semi-definite, i.e. vRn:vEQ[πmπm]v0

Through an argument by contradiction, we can show the matrix is in fact strictly positive definite. In that regard, assume there exists a vRn such that vEQ[πmπm]v=0. It would follow that 0=vEQ[πmπm]v=EQ[vπmπmv]=EQ[|vπm|2] this is in contradiction with the fact that πm,1,,πm,n are linearly independent. Hence we can conclude that EQ[πmπm] is a real-valued, symmetric matrix, which is strictly positive definite. It follows that all its eigenvalues are real and positive. As the determinant of a symmetric matrix is the product of its eigenvalues, we can conclude that Dwλm has a positive determinant and is non-singular. The result now follows by an application of the implicit function theorem.