Full article: Robust Risk-Aware Option Hedging

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The objectives of option hedging/trading extend beyond mere protection against downside risks, with a desire to seek gains also driving agent's strategies. In this study, we showcase the potential of robust risk-aware reinforcement learning (RL) in mitigating the risks associated with path-dependent financial derivatives. We accomplish this by leveraging a policy gradient approach that optimizes robust risk-aware performance criteria. We specifically apply this methodology to the hedging of barrier options, and highlight how the optimal hedging strategy undergoes distortions as the agent moves from being risk-averse to risk-seeking. As well as how the agent robustifies their strategy. We further investigate the performance of the hedge when the data generating process (DGP) varies from the training DGP, and demonstrate that the robust strategies outperform the non-robust ones.

KEYWORDS:

1. Introduction

Option hedging is a foundational issue in the field of mathematical finance, tracing back to the groundbreaking work of Black and Scholes (Citation1973) and Merton (Citation1973). While exact replication of options is achievable in frictionless and complete markets, real-world markets entail transaction costs and necessitate discrete-time trading, rendering perfect replication unattainable. Notably, even under continuous trading conditions, delta hedging ceases to be optimal when trading costs are considered, and it is impossible to replicate arbitrary payoffs. Various studies have addressed the influence of market frictions on option hedging, including those by Whalley and Wilmott (Citation1997) and Martinelli (Citation2000).

Contemporary research has increasingly focussed on developing more robust, model-agnostic approaches to option hedging, with a growing interest in machine learning techniques (see Ruf and Wang Citation2020 for an overview). For instance, Cao et al. (Citation2021) and Kolm and Ritter (Citation2019) have proposed hedging strategies that involve solving mean-variance optimization problems using neural networks, Lütkebohmert, Schmidt, and Sester (Citation2022) study pricing under parameter uncertainty, and Gierjatowicz et al. (Citation2020) find robust bounds for prices of derivatives using neural stochastic differential equations. Recently,Footnote¹ Limmer and Horvath (Citation2023) study robustification by selecting sample paths from a bank of models and penalize models by the amount they deviate from a reference model.

As perfect replication is unattainable, agents must bear some risk exposure when buying/selling options. The agent's risk tolerance and appetite dictate the profit and loss (P&L) profile they are willing to accommodate. Rather than aiming to replicate option payoffs, our approach, drawing upon the work of Ilhan et al. (Citation2009) and Buehler et al. (Citation2019), seeks to minimize measures of risk. We propose employing rank dependent expected utility (RDEU) to more comprehensively represent agents' risk preferences, as suggested by Yaari (Citation1987). RDEU enables agents to incorporate the utility associated with outcomes and account for distortions in the probability of those outcomes, thereby facilitating both risk-averse and gain-seeking behaviour concurrently.

Existing literature on option hedging using machine learning largely neglects model misspecification and potential market condition changes. To safeguard agents against such uncertainties, we capitalize on advances in robust reinforcement learning (RL) as outlined by Rahimian and Mehrotra (Citation2019). Specifically, we employ the framework presented by Jaimungal et al. (Citation2022) to robustify RDEU-based strategies and apply them to the option hedging problem.

In this framework, agents assess a strategy not by the RDEU of terminal wealth, but by distorting its distribution within a predefined ambiguity set. The ambiguity set comprises all distributions of terminal wealth situated within a Wasserstein ball surrounding the terminal wealth. Agents subsequently select actions that are optimal under the worst-case scenario within this ambiguity set. Given the absence of additional assumptions, this approach offers protection against various forms of model uncertainty, including those stemming from underlying asset process misspecification or alterations in market frictions. A similar approach for infinitesimally small uncertainty sets was investigated by Bartl et al. (Citation2021) where they consider sensitivity of a generic stochastic optimization problem to model uncertainty – in a perturbative fashion.

To approximate the agents' strategies, we employ neural networks. At each trading time, the neural network receives features representing the state and produces the hedge and we take a policy gradient approach to improve the policy.

We apply our algorithm to hedge barrier options, demonstrating its capacity to generate sound hedges for stochastic volatility models such as Heston, in the presence of market frictions. Importantly, our algorithm is not contingent upon a specific model and can be applied to any price process or derivative. Moreover, we illustrate that our robust models surpass the classic complete-market outcomes provided by Black–Scholes formulas for pricing barrier options.

The remainder of this article is organized as follows. We provide the relevant mathematical background such as RDEU in Section 2 and Wasserstein distances in Section 3. Section 4 formalizes the robust and non-robust option hedging problems. Section 5 provides policy gradient formulae for solving the optimization problems. We present results for hedging barrier call options in Section 6, where we also investigate how robustness changes optimal strategies. Further examples demonstrate that robust hedges outperform non-robust hedges under model misspecification.

2. Risk Measures

This section provides the basic definitions of the risk measures that we study and their key representation in terms of the distortion and quantile functions. We work on a probability space $(Ω, F, P)$ , moreover, let $X$ represent the set of $F$ -measurable, $P$ -integrable, random variables. Let $Z_{1}, Z_{2} \in X$ be interpreted as random costs imposed on the agent.

Definition 2.1

A risk measure is a mapping $ρ : X \to R$ and is said to be

Monotone if $Z_{1} \leq Z_{2}$ implies $ρ (Z_{1}) \leq ρ (Z_{2})$ ;
Translation Invariant if for $m \in R, ρ (Z_{1} + m) = ρ (Z_{1}) + m$ ;
Positive Homogeneous if for $β > 0, ρ (β Z_{1}) = β ρ (Z_{1})$ ;
Subadditive if $ρ (Z_{1} + Z_{2}) \leq ρ (Z_{1}) + ρ (Z_{2})$ ;
Coherent if and only if conditions 1–4 are satisfied

In this work, we focus on measuring risk using RDEU risk measures (Yaari Citation1987). These measures fall outside the class of coherent risk measures (Artzner et al. Citation1999) but also include many special cases such as distortion risk measures and expected utility. Agents value wealth according to a utility function, but also consider probabilities of the events subjectively through a distortion function. Formally, RDEU is defined as follows.

Definition 2.2

Let g be an increasing function with g(0) = 0 and g(1) = 1. Let U be a non-decreasing concave an differentiable almost everywhere utility function. The RDEU of a random variable Z is defined as $R_{g}^{U} [Z] := \int_{- \infty}^{0} (1 - g [P (U (Z) > z)]) d z - \int_{0}^{+ \infty} g [P (U (Z) > z)] d z$

Under suitable conditions, $R_{g}^{U} [Z]$ has a more convenient representation as shown in Theorem 2.1.

Theorem 2.1

Representation of RDEU

If the distortion function g is left-differentiable, the RDEU can be rewritten as $R_{g}^{U} [Z] := - \int_{0}^{1} U (F_{Z}^{- 1} (s)) γ (s) d s$ where γ is given by $γ (u) := \partial_{-} g (x) |_{x = 1 - u}$ , $\partial_{-}$ denotes the left hand derivative, and $F_{Z}^{- 1} (s)$ denotes the quantile function of the random variable Z.

By utilizing the agent's utility function and allowing for distortions in probabilities, the RDEU framework enables us to consider the agent's risk/gain seeking behaviour and ultimately provide a more comprehensive representation of their risk preferences. As option payoffs are not replicable outside of complete market settings, we utilize risk measures to quantify the agent's preferences when hedging.

2.1. Risk Avoiding While Profit Seeking: $α - β$ Risk Measures

In this subsection, we provide a prototypical example of an RDEU risk measure that we utilize in the hedging constructions. We employ them as they allow us to showcase how the investor reacts to avoiding risk while seeking gains with only a few parameters. Henceforth, we call such risk measures simply as $α - β$ risk measures.

Definition 2.3

$α - β$ risk measure Jaimungal et al. Citation2022

Let $γ_{p}^{α, β} (u) = \frac{1}{η} [p I (u \leq α) + (1 - p) I (u \geq β)]$ for $0 < α \leq β < 1$ and $p \in [0, 1]$ . $η := p α + (1 - p) (1 - β)$ is a normalizing constant s.t. $\int_{0}^{1} γ_{p}^{α, β} (u) = 1$ . The $α - β$ risk measure of a r.v. $Z \in X$ is defined as: (1) $R_{p}^{α, β} [Z] := R_{γ_{p}^{α, β}}^{f} [Z],$ (1) where f is the identify function, i.e., $f (x) = x$ .

Figure provides a visual representation of $γ (u)$ for select parameters. Different values of α, β and p result in U-shaped distortion functions, which model agents who are concerned about profits below the α quantile and above the β quantile. Moreover, if $p > \frac{1}{2}$ , the investor emphasizes losses over gains. The opposite is true for $p < \frac{1}{2}$ .

Figure 1. Depictions of $γ (u)$ for $α = 0.2$ and $β = 0.9$ as p varies.

The $α - β$ risk measures contain several notable special cases, such as expected utility and Conditional-Value-at-Risk (CVaR). For example, when p = 1, as shown on the leftmost panel of Figure , the RDEU reduces to CVaR at level α. Furthermore, such risk measures are translation invariant and positive homogeneous. These properties play a role when analyzing the optimal hedging strategies.

3. Wasserstein Distances

In this section, we present a brief overview of Wasserstein distances, which help establish a metric on the set of probability distributions and allow us to define our ambiguity set. For a more detailed coverage of the topic, the reader is referred to, e.g., Santambrogio (Citation2015). In many settings, and in particular for option hedging, the asset price process may be estimated using past data. Once estimated, we can select a strategy and determine its terminal P&L distribution. A natural way to protect the agent from model uncertainty is to consider a family of distributions within a neighbourhood, almost like a confidence interval, of our estimated distribution. Wasserstein distances provide a way to do so. Let $P (X)$ be the space of probability measures on $X \subset R^{d}$ , and define: $P_{p} (X) = {μ \in P (X) | \int_{X} | x |^{p} dμ (x) < + \infty}$ with $p \geq 1$ , i.e., the space of probability measures with bounded pth moment.

Definition 3.1

Given measures $μ, ν \in P_{p} (X)$ , the Wasserstein distance of order p is defined as (2) $d_{p} [μ, ν] = inf_{π \in Π (μ, ν)} {(\int_{X \times X} {| x - y |}^{p} d π (x, y))}^{\frac{1}{p}},$ (2) where $Π (μ, ν)$ denotes the set of transport plans and contains all measures π such that $π (A \times X) = μ (A)$ and $π (X \times B) = ν (B)$ , for all measurable sets $A \subseteq X$ and $B \subseteq X$ .

Intuitively, the Wasserstein distance measures the cheapest way to transport mass between distributions. The constraints on the transport plans ensures that the total mass removed from a measurable set $A \subseteq X$ is equal to $μ (A)$ , and total mass transferred to $B \subseteq X$ is equal to $ν (B)$ .

We focus on the single asset hedging problem which leads to a univariate P&L random variable. In one-dimension, the optimal coupling that attains the inf in (Equation2(2) $d_{p} [μ, ν] = inf_{π \in Π (μ, ν)} {(\int_{X \times X} {| x - y |}^{p} d π (x, y))}^{\frac{1}{p}},$ (2) ) is the comonotonic coupling. This leads to a tractable representation of the Wasserstein distance. Consider two measures $μ, ν \in P (R)$ having cumulative distribution functions (cdf) F and G, respectively. Define the generalized inverse of a function F on $[0, 1]$ as $F^{- 1} (t) = inf {x \in R ∣ F (x) > t}$ , i.e., when F is a cdf then $F^{- 1}$ is the quantile function. The Wasserstein distance is given by (see, e.g., Ambrosio et al. Citation2003, Chap. 2) $d_{p} [μ, ν] = (\int_{0}^{1} {| F^{- 1} (u) - G^{- 1} (u) |}^{p} d u) .$

4. Problem Formulation

In this section, we provide an overview of the option hedging problem and how robust methods can be used to protect against model uncertainty.

Consider an agent who shorts an option with payoff $V_{T}$ at maturity date. The price earned from selling the option at t = 0 is denoted by $V_{0}$ . We assume that the asset does not pay dividends, but this may be easily incorporated into the setup. Let $(S_{t})_{t \in [0, T]}$ denote the underlying asset price process and – with a slight abuse of notation – let $(S_{0}, S_{1}, \dots, S_{N - 1}, S_{N})$ represent the asset price at time points $0 = t_{0} < t_{1} < t_{2} \dots < t_{N - 1} < T$ . At each $t_{i}$ , the agent observes the stock price and chooses the amount $Δ_{i}$ of the asset to hold. For convenience, let $(Δ_{i}^{ϕ})_{i = 0, 1 \dots N - 1}$ represent the series of actions, and φ represent the set of all hedging strategies.

In some examples, we include market frictions in the form of proportional transaction costs. Formally, given a constant $c \in R_{+}$ , if the stock price is $S_{t}$ and q shares are purchased, we impost a cost of $c S_{t} q$ . Then, given a sequence of trades ϕ the agent's terminal wealth is given by (3) $X^{ϕ} = V_{0} + \sum_{i = 0}^{N - 1} Δ_{i}^{ϕ} (S_{i + 1} - S_{i}) - \sum_{i = 0}^{N - 1} c | Δ_{i}^{ϕ} - Δ_{i - 1}^{ϕ} | S_{i} - V_{N},$ (3) where $Δ_{- 1}^{ϕ} := 0$ represents the asset holdings before any trading.

For simplicity, we set interest rates to 0, however, positive or even stochastic interest rates can be easily incorporated. Furthermore, as $α - β$ risk measures are translation invariant, we can safely ignore $V_{0}$ in (Equation3(3) $X^{ϕ} = V_{0} + \sum_{i = 0}^{N - 1} Δ_{i}^{ϕ} (S_{i + 1} - S_{i}) - \sum_{i = 0}^{N - 1} c | Δ_{i}^{ϕ} - Δ_{i - 1}^{ϕ} | S_{i} - V_{N},$ (3) ) when optimizing $X^{ϕ}$ .

4.1. Non-robust Formulation

We first provide the non-robust formulation of the problem. The agent seeks to minimize the risk $R_{g}^{U} [X^{ϕ}]$ of their terminal wealth over the parameter set $ϕ \in φ$ . Assuming the agent is certain that the model is correctly specified (i.e., the distribution of $X^{ϕ}$ is known with certainty for all actions ϕ), then the their optimization problem may be stated as (4) $inf_{ϕ \in φ} R_{g}^{U} [X^{ϕ}],$ (4) where the strategy ϕ is parametrized by a neural network that takes observed states as inputs and provides the strategy as output – e.g., a natural choice of states would be the current time and asset price. Thus, the goal is to adjust the weights of the neural net to select the actions that induce the lowest risk.

4.2. Robust Formulation

We also provide a robust formulation of the problem, as agents often do not have full knowledge of the model that drives the asset prices. Even if the agent uses historical data, they should robustify strategies to avoid obtaining strategies that are tuned to the historical price dynamics and work poorly out of sample. If agents use parametric models, then there is uncertainty in the value of those parameters, and even uncertainty whether they chose the correct parametric class of models. A third example of model uncertainty stems from financial markets not being stationary, hence a model that is well calibrated to some slice of historical data may not explain the dynamics well in the future.

As before, we use neural networks to learn the optimal mapping between observed states to ϕ. To build robustness into the model, we consider all probability distributions in an ambiguity set. Specifically, let $ϑ \subseteq R^{n}$ parameterize the ambiguity set, and $X^{θ}$ be a $R$ -valued random variable parameterized by $θ \in ϑ$ . When the agent considers model uncertainty, $X^{ϕ}$ from the previous section serves as an approximation to the true wealth random variable. Thus, the agent builds uncertainty around this base model and uses an ambiguity set $ϑ_{ϕ} \subset ϑ$ defined below. To protect against uncertainty, the agent picks actions to minimize the risk associated with the worst case distribution in $ϑ_{ϕ}$ . Under this framework, we define the agent's optimization as (5) $inf_{ϕ \in φ} sup_{θ \in ϑ_{ϕ}} R_{g}^{U} [X^{θ}] where ϑ_{ϕ} := {θ \in ϑ ∣ d_{p} [X^{θ}, X^{ϕ}] \leq ϵ} .$ (5) This formulation may be viewed as an adversarial attack. The agent picks actions $ϕ \in φ$ which induces a terminal wealth of $X^{ϕ}$ . The adversary then distorts $X^{ϕ}$ to $X^{θ}$ that corresponds to the worst performance within a Wasserstein ball around $X^{ϕ}$ . To model this relationship, we set $X^{θ} = H_{θ} (X^{ϕ})$ where $H_{θ} (\cdot)$ is an neural network parameterized by θ. If $X^{ϕ}$ and $X^{θ}$ are both continuous r.v., then, there always exists a neural network θ that achieves equality in distribution – which is all that we need as the risk measures we employ are law invariant. Specifically, let $F_{X^{θ}} (x)$ denote the cdf of $X^{θ}$ , then the r.v. $F_{X^{θ}}^{- 1} (F_{X^{ϕ}} (X^{ϕ}))$ has cdf $F_{X^{θ}} (x)$ and, as under the assumption that $X^{θ}$ and $X^{ϕ}$ are continuous r.v., we have that $F_{X^{θ}}^{- 1} (F_{X^{ϕ}} (x))$ is continuous and hence there exists a neural network that approximates it (on a finite domain) arbitrarily well.

This formulation allows for uncertainty on the r.v. $X^{ϕ}$ . Thus, it incorporates potentially both uncertainty in the underlying processes that generate $X^{ϕ}$ , transaction costs, and the strategy itself.

5. Optimization Algorithm

In this section, we derive policy gradients for solving the non-robust and robust problems. The robust formulation reduces to the non-robust problem as $ϵ \to 0$ . However, solving the non-robust version is less computationally intensive, so we present separate algorithms for the two cases.

5.1. Non-robust Problem

We optimize $R_{g}^{U} [X^{ϕ}]$ over the possible actions by using batch gradient descent where the gradient is taken with respect to the parameters of the policy. However, many risk measures such as CVaR yield a derivative with discontinuities. Naïve back-propagation leads to inaccuracies and we provide a different gradient formula.

Theorem 5.1

Let $F_{ϕ} (x)$ and $f_{ϕ} (x)$ be the cdf and pdf of $X^{ϕ}$ , respectively. Under the assumptions in Theorem 2.1, and further assuming that $X^{ϕ}$ is a continuous r.v., we have that $\nabla_{ϕ} R_{g}^{U} [X^{ϕ}] = E [U^{'} (X^{ϕ}) γ (F_{ϕ} (X^{ϕ})) \frac{\nabla_{ϕ} F_{ϕ} (x) ∣_{x = X^{ϕ}}}{f_{ϕ} (X^{ϕ})}] .$

Proof.

As $F_{ϕ} (F_{ϕ}^{- 1} (s)) = s$ , taking gradients on both sides and using the chain rule we have $f_{ϕ} (F_{ϕ}^{- 1} (s)) \nabla_{ϕ} F_{ϕ}^{- 1} (s) + \nabla_{ϕ} F_{ϕ} (x) |_{x = F_{ϕ}^{- 1} (s)} \nabla F_{ϕ}^{- 1} (s) = 0$ and rearranging yields $\nabla_{ϕ} F_{ϕ}^{- 1} (s) = - {\nabla_{ϕ} F_{ϕ} (x) |}_{x = F_{ϕ}^{- 1} (s)} / f_{ϕ} (F_{ϕ}^{- 1} (s)) .$ Using the representation of $R_{g}^{U} [X^{ϕ}]$ in Theorem 2.1, we have that (6) $\begin{aligned} \nabla_{ϕ} R_{g}^{U} [X^{ϕ}] & = - \int_{0}^{1} U^{'} (F_{ϕ}^{- 1} (s)) \nabla_{ϕ} F_{ϕ}^{- 1} (s) γ (s) d s \end{aligned}$ (6) (7) $\begin{aligned} = \int_{0}^{1} U^{'} (F_{ϕ}^{- 1} (s)) γ (s) \frac{\nabla_{ϕ} F_{ϕ} (x) ∣_{x = F_{ϕ}^{- 1} (s)}}{f_{ϕ} (F_{ϕ}^{- 1} (s))} d s . \end{aligned}$ (7) For any $u \sim U (0, 1)$ , we have that $F^{- 1} (u) \overset{d}{=} X^{ϕ}$ , and the theorem follows by interpreting the integral as an expectation over a uniform random variable.

The gradient formula requires knowing $F_{ϕ} (x)$ and $\nabla_{ϕ} F_{ϕ} (x)$ , but in general there is no analytical expression for these quantities. We instead take advantage of the mini-batch environment and use samples to generate a Gaussian kernel density estimator ${\hat{F}}_{ϕ} (x)$ to approximate $F_{ϕ} (x)$ . Specifically, suppose we have a sample of N points of $X^{ϕ}$ given by ${x_{ϕ}^{(1)}, x_{ϕ}^{(2)}, \dots x_{ϕ}^{(N)}}$ , then ${\hat{F}}_{ϕ} (x) = \frac{1}{N} \sum_{i = 1}^{N} Φ (\frac{x - x_{ϕ}^{(i)}}{h})$ where $Φ (\cdot)$ denotes the cdf of a standardized Gaussian and h is a bandwidth selected, see, e.g., Gramacki (Citation2018) for discussions around selecting bandwidths. In our implementations we use Silverman's rule divided by two : $h = 0.53 \hat{σ} N^{- 1 / 5}$ with $\hat{σ}$ being the sample standard deviation. Using the chain rule, we then have $\nabla_{ϕ} F_{ϕ} (x) \approx \nabla_{ϕ} {\hat{F}}_{ϕ} (x) = - \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{h} Φ^{'} (\frac{x - x_{ϕ}^{(i)}}{h}) \nabla_{ϕ} x_{ϕ}^{(i)} .$ Substituting this expression into Theorem 5.1, and estimating expectation with the sample mean from the same mini-batch of data, the gradient can be approximated by $\nabla R_{g}^{U} [X^{ϕ}] \approx - \frac{1}{N} \sum_{i = 1}^{N} [U^{'} (x_{ϕ}^{(i)}) γ ({\hat{F}}_{ϕ} (x_{ϕ}^{(i)})) \frac{\sum_{j = 1}^{N} Φ^{'} ((x_{ϕ}^{(i)} - x_{ϕ}^{(j)}) / h) \nabla_{ϕ} x_{ϕ}^{(j)}}{\sum_{j = 1}^{N} Φ^{'} ((x_{ϕ}^{(i)} - x_{ϕ}^{(j)}) / h)}] .$ The remaining task is to obtain $\nabla_{ϕ} x_{ϕ}^{(j)}$ . These may be computed using back-propagation numerically and there is no need for explicit formulae.

5.2. Robust Problem

The steps for solving the robust problem are derived in Jaimungal et al. (Citation2022). We briefly summarize the main results here. Denote the cdf and pdf of $X^{ϕ}$ by $G_{ϕ}$ and $g_{ϕ}$ , respectively. Similarly, denote the cdf and pdf of $X^{θ}$ by $F_{θ}$ and $f_{θ}$ , respectively. We define $c [X^{θ}, X^{ϕ}] = ((d_{p} [X^{θ}, X^{ϕ}] - ϵ^{p})_{+}$ to denote a penalty for being outside of the desired Wasserstein ball. To enforce the constraints, the augmented Lagrangian approach (which utilizes both a Lagrange multiplier and a quadratic penalty, see, e.g., Birgin and Mario Martínez Citation2014, Chapter 4) may be used as the loss function, and we optimize $L [θ, ϕ] = R_{g}^{U} [X^{θ}] + λc [X^{θ}, X^{ϕ}] + \frac{μ}{2} {(c [X^{θ}, X^{ϕ}])}^{2} .$ Here, µ is a penalty constant that is to be updated as one traverses the loss function along with the Lagrange multiplier λ.

Theorem 5.2

Inner Gradient Formula Jaimungal et al. Citation2022

Under the assumptions of Theorem 2.1, let $X_{c}^{ϕ}$ be a reordering of the realisations of $X^{ϕ}$ so that $(X^{θ}, X^{ϕ})$ are comonotonic. Then the inner gradient is (8) $\begin{aligned} \nabla_{θ} L [θ, ϕ] & = E [{U^{'} (X^{θ}) γ (F_{θ} (X^{θ})) - p Λ {| X^{θ} - X_{c}^{θ} |}^{p - 1} sgn (X^{θ} - X_{c}^{θ})} \\ \frac{\nabla_{θ} F_{θ} (x) ∣_{x = X^{θ}}}{f_{θ} (X^{θ})}] \end{aligned}$ (8) where $Λ = (λ + μc [X^{θ}, X^{ϕ}]) I (d_{p} [X^{θ}, X^{ϕ}] > ϵ)$ is a constant.

Theorem 5.3

Outer Gradient Formula Jaimungal et al. Citation2022

Under the same assumptions as Theorem 5.2, the outer gradient is (9) $\begin{aligned} \nabla_{ϕ} L [θ, ϕ] & = E [U^{'} (X^{θ}) γ (F_{θ} (X^{θ})) \frac{\nabla_{ϕ} F_{θ} (x) ∣_{x = X^{θ}}}{f_{θ} (X^{θ})} \\ - p Λ {| X^{θ} - X_{c}^{θ} |}^{p - 1} sgn (X^{θ} - X_{c}^{θ}) (\frac{\nabla_{θ} F_{θ} (x) ∣_{x = X^{θ}}}{f_{θ} (X^{θ})} + \frac{\nabla_{ϕ} G_{ϕ} (x) ∣_{x = X^{ϕ}}}{g_{ϕ} (X^{ϕ})})] \end{aligned}$ (9)

The proof of these formula follow utilizes Theorem 5.1 coupled with the results from Jaimungal et al. (Citation2022). Similar to the previous section, we use kernel density estimation to approximate $F_{θ} (x)$ and $G_{ϕ} (x)$ and use sample means in place of the expectation.

6. Numerical Experiments

6.1. The Hedged Options

In this section, we investigate the various hedging strategies of barrier call options under the $α - β$ risk measures defined in Section 2.1. In particular, let B represent the barrier, and K represent the strike price. Then the knock-in and knock-out option payoffs are $\begin{aligned} V_{T}^{knock - in} & = {(S_{T} - K)}_{+} I (min_{t < T} {S_{t}} < B), and \\ V_{T}^{knock - out} & = {(S_{T} - K)}_{+} I (min_{t < T} {S_{t}} > B) . \end{aligned}$ In our experiments, the option expires at T = 1, B = 8.5, and K = 10.

6.2. The Market Model

Asset prices S are modelled using the Heston model (10) $d S_{t} = μ S_{t} d t + \sqrt{ν_{t}} S_{t} d W_{t}^{S},$ (10) where ν is the instantaneous variance satisfying the SDE (11) $d ν_{t} = κ (θ - ν_{t}) d t + ξ \sqrt{ν_{t}} d W_{t}^{ν},$ (11) and where $(W_{t}^{S}, W_{t}^{ν})_{t \geq 0}$ are correlated Brownian motions under the physical measure $P$ . The model parameters carry the following interpretations: $S_{0}$ (initial asset price), $v_{0}$ (initial variance), µ (drift), κ (the rate at which $v_{t}$ reverts to the long run average), θ (long run average variance), η (the volatility of volatility) and ρ (the correlation between $W^{S}$ and $W^{ν}$ ).

We stimulate the Heston model using 200 time steps, however, the agent is only allowed to trade once every 4 steps, i.e., at $(t_{i})_{i = 0, \dots, 49}$ where $t_{i} := \frac{4 i}{200}$ . The model parameters in the simulation are as follows $S_{0} = 10, v_{0} = {0.3}^{2}, μ = 0.08, κ = 3, θ = {0.3}^{2}, η = 2, and ρ = - 0.5.$ When comparing the robust and non-robust models, we simulate prices from the Heston model, but where some parameters are modified.

6.3. The Black–Scholes Model

To evaluate the performance and interpret our neural network results, we compare to classical results derived from the Black–Scholes model. Recall that the model assumes asset prices are geometric Brownian motions. In particular, S satisfies the SDE $d S_{t} = μ S_{t} d t + σ S_{t} d W_{t} .$ Strictly speaking, the Black–Scholes hedge is not for the purpose of minimizing a risk measure, nor does its derivation include transaction costs, however, the resulting hedge serves as a useful benchmark for evaluating our strategies.

It is well known that, when interest rates are zero, the price of a European call option at time t in the Black–Scholes model is given by $C_{BS} (S, K) = S Φ (d_{+}) - K Φ (d_{-}),$ where $d_{\pm} = (\log \frac{S}{K} \pm \frac{σ^{2}}{2} (T - t)) / σ \sqrt{T - t}$ . As well, the price of knock-in and knock-out options, before asset prices touch the barrier are provided, respectively, by Westermark (Citation2009) $C_{Knockin} = \frac{S}{B} C_{BS} (\frac{B^{2}}{S}, K) and C_{Knockout} = C_{BS} - C_{Knockin}$ For knock-in calls, once the barrier is breached, the option becomes a regular call and has value $C_{BS} (S, K)$ . For knock-out calls, once the barrier is breached, the option is worthless. For knock-in calls, the Black–Scholes delta hedge is given by $Δ_{Knock - in} (S) = \frac{1}{B} C_{Call} (\frac{B^{2}}{S}, K) - \frac{B}{S} Δ_{Call} (\frac{B^{2}}{S}, K) .$ The Black–Scholes delta hedge for a knock-out call is $Δ_{Knock - out} (S) = Φ (d_{+}) - Δ_{Knock - in} (S)$ . In our experiments, when we hedge using the Black–Scholes hedge, we do so at the same set of discrete hedging times that the RL agent uses. Moreover, we use the σ estimated by stimulating price paths $S_{t}$ under Heston to generate a distribution for $\log (S_{T})$ and computing its standard-deviation divided by $\sqrt{T}$ . Thus, obtaining a Black–Scholes model that matches the variance of the Heston model.

6.4. Hedging Features

We now consider the agent's problem as posed in Section 4. The strategy, $ϕ = (Δ_{i})_{i = 0, 1 \dots N - 1}$ , is parametrized by a fully connected feed forward neural network with 5 hidden layers each with 35 neurons and SiLU activation functions.Footnote² The output layer has a tanh activation function that clips strategies to the range $Δ_{i} \in [- 2, 2]$ to ensure the agent does not over-leverage. For test cases with no transaction costs, the neural net uses the following five features ${t, S_{t}, M_{t}, I [M_{t} > B]} .$ where $M_{t} := min_{u \leq t} {S_{u}}$ . When we incorporate transaction costs, we add one more feature: the previous holdings $Δ_{t - 1}$ .

While $I [M_{t} < B]$ can be determined through $min_{u < t} {S_{u}}$ , each serve a different purpose in the neural network. In most cases, the optimal strategies have a jump discontinuity once the barrier is breached, but, as activation functions are smooth, neural networks have difficulties modelling such discontinuities. Thus, we include the indicator of whether the barrier was breached or not. We include the minimum asset price up to that point in time, as under the $α - β$ risk measures its effect on optimal strategies is a priori unclear. Under the Black–Scholes model, e.g., only time, price and the indicator matters.

For the robust problem, a second neural net distorts $X^{ϕ}$ and it is parametrized by a fully connected feed forward neural network with 1 hidden layer of 10 neurons and SiLU activation functions. It takes in a realisations of $X^{ϕ}$ and outputs realisations of $X^{θ}$ . For all experiments, we use the Wasserstein-1 distance and set the radius of the ball to 0.02.

6.5. CVaR

In this section, let $α = 0.2$ , p = 1. For these parameters, the $α - β$ risk measure reduces to ${CVaR}_{0.2}$ .

6.5.1. No Transaction Costs

We first set transaction costs to zero to illustrate a few interesting effects that are present even in the case without transaction costs.

Figures and shows the optimal strategy learned by the agent. The top panels show the optimal hedge post knock-out (i.e., once the barrier is breached), while the bottom panels show the pre-knock-out (i.e., before the barrier is breached) hedges. While $M_{t}$ is a feature, the strategies for CVaR only depend on $I [M_{t} < B]$ but do not depend on the exact value of $M_{t}$ . In the lower panels, the hedges stop at $M_{t} = 8.5 = B$ corresponding to the option barrier. Figure shows that, for CVaR, the non-robust strategy are qualitatively and quantitatively fairly similar to the Black–Scholes' delta-hedging. Moreover, Figure shows that, for CVaR, the robust and non-robust are similar, but there are some notable differences. The robust agent does hedges more when the option is out-of-the-money and less when the option is in-the-money. By replicating the option payoff less closely, the agent is better able to handle market uncertainties.

Figure 2. A comparison between trained non-robust strategies (blue) and Black–Scholes delta hedging (orange) for the knock-in call option. The top panel shows post-knock in strategies, while the bottom panels shows pre-knock in strategies.

Figure 3. A comparison between robust (red) and non-robust (blue) strategies for the knock-in call option. The top panel shows post-knock in strategies, while the bottom panels shows pre-knock in strategies.

Figure shows the analogous strategies for knock-out call options for the robust and non-robust cases. They are very similar to Black–Scholes hedges (not shown). Naturally, the agent only hedges when the option has not yet been knocked-out, otherwise, once it is knocked-out, the agent holds no position in the robust case, but holds a small positive position in non-robust case. This is because the agent is hoping to gain some positive return from the asset.

Figure 4. A comparison between robust (red) and non-robust (blue) strategies for the knock-out option. The top panel shows post-knock out strategies, while the bottom panels shows pre-knock out strategies.

6.5.2. The Effect of Transaction Costs

Next, we introduce transaction costs and set c = 0.01 in Equation Equation3(3) $X^{ϕ} = V_{0} + \sum_{i = 0}^{N - 1} Δ_{i}^{ϕ} (S_{i + 1} - S_{i}) - \sum_{i = 0}^{N - 1} c | Δ_{i}^{ϕ} - Δ_{i - 1}^{ϕ} | S_{i} - V_{N},$ (3) . In this case, it is challenging to visualize the strategy as a function of features, as the strategy now becomes a function of t, $S_{t}$ , $M_{t}$ and $Δ_{t - 1}$ . Instead, we depict the optimal strategy for a few sample paths. To this end, Figure shows the optimal hedge along two sample paths of the asset process – both with and without the transaction costs. The two strategies co-move. The right most panel shows a histogram of total variation of hedge position across 5000 paths As the figure shows, the case with transaction costs has less total variation, indicating that the agent trades less in this case. Both total variations have spikes near zero because if the barrier is never breached the agent trades very few shares.

Figure 5. Strategies for hedging the down-and-in call option with transaction costs (TC) and with no transaction costs (NTC) built into the model. (a) Optimal $Δ_{t}$ for two sample asset paths and (b) Histogram of total variation of $(Δ_{t})_{t = 0, 1 \dots N - 1}$ across multiple asset paths.

Figure shows a comparison of P&L between the robust strategy and the Black–Scholes option hedging strategy, under uncertainty. All training price paths are generated using the parameters $S_{0} = 10, v_{0} = {0.3}^{2}, μ = 0.08, κ = 3, θ = {0.3}^{2}, η = 2$ , and $ρ = - 0.5$ . For the Black–Scholes models, we ignore the transaction cost and estimate σ using the method described above. Furthermore, we assume that the model was misspecified with $κ = 1$ and $ρ = - 0.1$ in reality. When there is model uncertainty, the robust strategy outperforms in both cases.

Figure 6. Comparison of Black–Scholes option hedging (Bl–Sch) and robust (Rob) P&L with transaction cost when model is misspecified with $κ = 1$ and $ρ = - 0.1$ in reality. CVaR $_{0.2}$ is also shown. (a) Knock-in Option and (b) Knock-out option.

6.5.3. Pricing

The methodology we develop may also be used to provide a robust price for options. Suppose the agent wishes to target a ${CVaR}_{0.2}$ of $- 0.5$ . As CVaR is translation invariant, after learning a strategy, the agent only needs to adjust the initial option price to reach the required target – the optimal strategy will not be altered – the result price is denoted $P^{Rob}$ .

Next, we compare the price induced by the robust strategy to the price induced by the Black–Scholes strategy. We cannot, however, simply use the Black–Scholes option price as it does not target a CVaR, nor does it include transaction costs. Instead, we generate the P&L of the Black–Scholes and adjust the price until the P&L ${CVaR}_{0.2} = - 0.5$ – we denote this price by $P^{BS}$ . As a benchmark comparison, we use the classical Black–Scholes barrier option price – denoted $C^{BS}$ .

Table shows the effect of robust pricing. We train our model with the same parameters as Section 6.5 and assume that in reality $κ = 1$ and $ρ = - 0.1$ (instead of $κ = 3$ and $ρ = - 0.5$ ). The agent, however, has no way of knowing the true σ, and must price using the misspecified model. From our results, targeting ${CVaR}_{0.2} = - 0.5$ is a reasonable target, and the generated prices are close to $C_{BS}$ . As the table shows, the classical Black–Scholes price is the lowest, the price induced by the Black–Scholes hedge (but, accounting for transaction costs, and s.t. CV aR = −0.5) is higher, and finally, as expected, the robust price is the highest.

Table 1. Comparison of different pricing schemes.

Display Table

6.6. $α - β$ Risk Measures

We next consider $α - β$ risk measures with $α = 0.1$ , $β = 0.9$ , and p = 0.7 – see the middle panel of Figure . This corresponds to an agent who cares about gains to losses at a ratio of 3:7.

6.6.1. No Transaction Costs

First we consider no transaction costs. Figure shows the hedging portfolios for a knock-in call in the non-robust case. These strategies are best described as a combination of asset trading and option hedging. When the option is not yet knocked-in, the agent ignores the embedded call option and invests in the asset. In this case, when the asset price is significantly above the barrier, they long the asset, but the strategy has a built in stop-loss. As the minimum asset price drops, they start unwinding their position. If prices drop to around $9, for example, the agent refrains from taking a long position in the asset for the remainder of the session. This strategy allows the agent to take advantage of the positive drift in prices, but also places a cap on the losses incurred from trading the asset. If the barrier is breached, the agent quickly adjusts to option hedging.

Figure 7. Non robust hedging strategies for a down-and-in call option. The graph show total asset holdings at t = 0.5T.

Figure shows the corresponding robust strategy. In this case, the agent becomes more conservative in seeking gains. They begin by buying a small number of shares. If the stock price rises, the agent has made a profit and seeks additional gains by purchasing more shares. Compared with the non-robust case, the agent sets a higher stop loss and generally exits from the long position earlier. Similar to the CVaR cases, when the option is knocked-in, the robust strategy holds less in the stock than the non-robust strategy in an attempt to mitigate model uncertainty.

Figure 8. Robust hedging strategies for a down-and-in call option. The graph show total asset holdings at t = 0.5T.

Figure shows analogous results for the knock-out call. When the option is knocked-out (the left panel), the strategy is to go long the asset, but as in the previous examples, it is paired with a stop-loss strategy. When we robustify the strategy, as shown in Figure , the risk in asset trading when knocked-out is greater than the agent can bear, and they simply hold nothing until the end of the trading horizon. In the region before being knocked-out, however, the agent focuses on option hedging. This is dramatic change is due to a phase transition that occurs in investor's behaviour as they change the balance between what is more important gains or losses. We discuss this phenomenon in more detail in Section 6.7.

Figure 9. Non robust hedging strategies for a down-and-out call option. The graph show total asset holdings at t = 0.5T.

Figure 10. Robust hedging strategies for a down-and-out call option. The graph show total asset holdings at t = 0.5T.

6.6.2. Incorporating Transaction Costs and Robustness

Next, we consider transaction costs of c = 0.01. In Figure , we show the total variation of the position and compare with the no transaction case. Overall, when the agent incorporates transaction costs, they favour strategies with less variance. For example, in the knock-in case, the no-transaction-cost agent purchases close to 2 shares at t = 0. This strategy results in very little transactions when prices rise, however, it performs poorly performing when prices fall because the agent loses from both transaction costs and asset value. Instead, the transaction-cost agent, chooses to buy less shares at t = 0, and adjust shares as needed. Compared to CVaR risk measures, introducing transaction costs has a larger impact on the strategies. For both knock-in and knock-out options, the agent avoids paying the upper tail risk due to the transaction costs.

Figure 11. Comparison of total Variation of the hedge position across different asset price paths in cases with transaction costs (TC) and without transaction costs (NTC). (a) Knock-in option and (b) Knock-out option.

Figure compares the robust and non-robust strategies. Under our framework, uncertainty need not arise from the asset price process. For example, the strategy also protects against changes in transaction costs. Models were trained using the same parameters as before. However, for testing, we increase transaction costs to c = 0.07. In both cases the robust strategy forgoes the upper tail gain in order to minimize losses. Since losses are more valued than gains, the agent is better off in terms of the risk measure.

Figure 12. Comparison of non-robust(N-Rob) and robust (Rob) strategies. $- R_{g}^{U} [X^{θ}]$ is shown as dotted lines in both plots. To help with visualizations, the x-axis uses a symmetric logarithmic scale, with values inside $[- 1, 1]$ displayed on a linear scale. (a) Knock-in Option and (b) Knock-out option.

6.7. Phase Transition due to Agent's Preferences

In this section, we examine how the agent's risk preferences affect the optimal strategy and study the non-robust case (as the general features we observed remain similar when we robustify). In particular, we fix $α = 0.1$ and $β = 0.9$ and vary p to examine how the strategy for the knock-in and knock-out call changes as the agent becomes increasingly risk seeking.

Figure shows the strategies for p = 1, 0.85 and 0.70 along two stock paths. We assume the same training parameters as before and include transaction costs. The strategies for p = 1 and 0.85 are almost identical, and the agent seeks a strategy similar to option hedging. When p decreases to 0.70, however, the agents strategy differs significantly when the embedded call option is not enforced (i.e., in the knock-out case after the barrier breached, while in the knock-in case before the barrier is breached). For example, let us focus on the left sample path for the knock-in option. In this sample path, the asset never went below the barrier, and the p = 0.7 agent kept a long position for the entire path, while the p = 1 and p = 0.85 agents held a very small short position. The more aggressive agent was seeking profits, while the more conservative was initially protecting against the potential of the option getting knocked-in, but then slowly unwound the position as the asset price grew. For the same sample path, but now looking at the knock-out option. The p = 1 and p = 0.85 agents took on a hedging position against the call option, and kept slowly increasing that hedge through time as the asset price grew. While the p = 0.7 agent initially ignored hedging the call option, but as time progressed they transformed their position into a hedge position for the call which appear more and more likely to not get knocked-out.

Figure 13. $Δ_{i}$ along two sample paths for knock-in (above) and knock-out (below) options. Strategies shown correspond to p = 1, 0.85 and 0.70.

For $α - β$ risk measures, the transition between the agent performing option hedging (Section 6.5) to a mixed strategy of trading and hedging (Section 6.6) is not smooth. There seems to be a phase transition: after a critical point $p^{*}$ , the agent changes strategies. We conjecture similar phase transitions persists with risk-reward criteria and is not specific to the $α - β$ risk measures.

To investigate this observation, we use the non-robust model with no transaction costs as a baseline. Recall that the $α - β$ risk measure can be written as $R_{p}^{α β} [X^{ϕ}] = - \frac{1}{η} (p E [X^{ϕ} I (F (X^{ϕ}) < α)] - (1 - p) E [X^{ϕ} I (F (X^{ϕ}) > β)])$ . We refer to terms $E [X^{ϕ} I (F (X^{ϕ}) < α)]$ and $E [X^{ϕ} I (F (X^{ϕ}) > β)]$ as lower and upper tail expectations (LTEs and UTEs), respectively. Figure shows the effect p has on the optimal strategy. The phase transition appears to occur near $p^{*} \approx 0.84$ and $p^{*} \approx 0.81$ for knock-in and knock-out options, respectively. For $p^{*} < p < 1$ , the agent strictly hedges the option. When $p^{*} < p$ , it can be observed that the value of p has a negligible effect on the agent's strategy. Once the value of p falls below $p^{*}$ , however, the agent commences a strategy targeting upper tail gains by engaging in both asset trading and option hedging. This critical level $p^{*}$ depends on the details of the option being sold. For knock-in calls, a lower barrier makes it more likely that the option is never knocked in. It becomes easier for the agent to seek upper tail gains without the negative option payoff, and thus results in a higher $p^{*}$ . The reverse is true for knock-out calls.

Figure 14. Comparison of lower tail expectation (LTE) and upper tail expectation (UTE) for $α = 0.1$ and $β = 0.9$ while varying p. (a) Knock-in Option and (b) Knock-out option.

To provide additional insight into this phenomenon, we examine the optimal actions of the agent under $α - β$ risk measures in a scenario where there is no option exposure and inventory is subject to the constraint of being within the range of $[- M, M]$ . In this case, the agent's terminal wealth $X^{ϕ}$ is given by setting $V (0) = V (T) = 0$ in (Equation3(3) $X^{ϕ} = V_{0} + \sum_{i = 0}^{N - 1} Δ_{i}^{ϕ} (S_{i + 1} - S_{i}) - \sum_{i = 0}^{N - 1} c | Δ_{i}^{ϕ} - Δ_{i - 1}^{ϕ} | S_{i} - V_{N},$ (3) ). If $inf_{ϕ} R_{p}^{α β} [X^{ϕ}] = 0$ , then $Δ_{i} = 0$ , for all $i \in {0, \dots, N - 1}$ , is an optimal solution – irrespective of p. Not trading is an admissible strategy that attains $R_{p}^{α β} [X^{ϕ}] = 0$ . If, however, there exists a strategy $\bar{Δ}$ (with parameters $\bar{ϕ}$ ) such that $R_{p}^{α β} [X^{\bar{ϕ}}] < 0$ , then, as $R_{p}^{α β}$ is positive homogeneous, the strategy $τ Δ_{i}$ (for $τ > 0$ ) always results in lower risk. Therefore, the strategy is to either trade nothing or to leverage fully. The latter case leads to the large increases in the UTE and this is what results in the phase transition.

As we have observed, this phenomenon seems to apply to options hedging as well. For instance, given that knock-in options behave similarly to assets when the barrier is not breached, the agent employs a strategy akin to asset hedging. The size of the jump in LTE depends on the inventory constraints. Particularly, if there are no constraints on inventory, no optimal solution exists when $p < p^{*}$ . Furthermore, $p^{*}$ depends on the remaining model assumptions. For example, transaction costs decrease $p^{*}$ because transaction costs decrease the potential gains. Robustification also decreases $p^{*}$ . This is due to the agent's higher risk aversion, as they aim to optimize the worst case scenario of $X^{ϕ}$ rather than $X^{ϕ}$ itself. The result of this effect is demonstrated by the dramatically different behaviours seen in Figures and .

7. Conclusion

Here, we extend the results of robust risk-aware optimization in Jaimungal et al. (Citation2022) to exotic barrier options. To contextualize the strategies, we choose to focus on barrier knock-in and knock-out call options. The methodology, however, is applicable to other options and even over the counter derivatives. This approach appears to produce results that are explainable and in line with intuition. Furthermore, robustification provides protection against uncertainty that can stem from many factors including misspecification in asset prices or changes in transaction dynamics. One issue worth noting is that while the strategies use dynamic decision making, the RDEU is not a time consistent risk measure, and the resulting optimizers should be viewed as precomit strategies. Although they are precomit strategies, they are not static in time or state. Nonetheless, it would be interesting to compare how the strategies for option hedging change when considering time-consistent dynamic risk measures using the methods developed in Coache and Jaimungal (Citation2021), Coache, Jaimungal, and Cartea (Citation2022) and Cheng and Jaimungal (Citation2022).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partially supported by the University of Toronto Data Sciences Institute.

Notes

1 In particular, Limmer and Horvath (Citation2023) appeared online after the current work was submitted for review.

2 The SiLU activation function is given by

f (x) = x / (1 + e^{- x})

References

Ambrosio, Luigi, Deckelnick, Klaus, Dziuk, Gerhard, Mimura, Masayasu, Solonnikov, Vsevolod A., Halil Mete Soner, and Ambrosio, Luigi. 2003. Lecture Notes on Optimal Transport Problems. Berlin/Heidelberg: Springer.
Google Scholar
Artzner, Philippe, Delbaen, Freddy, Eber, Jean-Marc, and Heath, David. 1999. “Coherent Measures of Risk.” Mathematical Finance 9 (3): 203–228. https://doi.org/10.1111/mafi.1999.9.issue-3.
Web of Science ®Google Scholar
Bartl, Daniel, Drapeau, Samuel, Obłój, Jan, and Wiesel, Johannes. 2021. “Sensitivity Analysis of Wasserstein Distributionally Robust Optimization Problems.” Proceedings of the Royal Society A 477 (2256): 20210176. https://doi.org/10.1098/rspa.2021.0176.
Google Scholar
Birgin, Ernesto G., and Martínez, José Mario. 2014. Practical Augmented Lagrangian Methods for Constrained Optimization. Philadelphia: SIAM.
Google Scholar
Black, Fishcer, and Scholes, Myron. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81 (3): 637–654. https://doi.org/10.1086/260062.
Web of Science ®Google Scholar
Buehler, H., Gonon, L., Teichmann, J., and Wood, B.. 2019. “Deep Hedging.” Quantitative Finance 19 (8): 1271–1291. https://doi.org/10.1080/14697688.2019.1571683.
Web of Science ®Google Scholar
Cao, Jay, Chen, Jacky, Hull, John, and Poulos, Zissis. 2021. “Deep Hedging of Derivatives Using Reinforcement Learning.” Journal of Financial Data Science 3 (1): 10–27. https://doi.org/10.3905/jfds.2020.1.052.
Google Scholar
Cheng, Ziteng, and Jaimungal, Sebastian. 2022. “Distributional dynamic risk measures in Markov decision processes.” Preprint, arXiv:2203.09612.
Google Scholar
Coache, Anthony, and Jaimungal, Sebastian. 2021. “Reinforcement Learning with Dynamic Convex Risk Measures.” Mathematical Finance, Forthcoming.
Google Scholar
Coache, Anthony, Jaimungal, Sebastian, and Cartea, Álvaro. 2022. “Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning.” SIAM Journal on Financial Mathematics, Forthcoming.
Google Scholar
Gierjatowicz, Patryk, Sabate-Vidales, Marc, Šiška, David, Szpruch, Lukasz, and Žurič, Žan. 2020. “Robust Pricing and Hedging Via Neural SDEs.” Preprint, arXiv:2007.04154.
Google Scholar
Gramacki, Artur. 2018. Nonparametric Kernel Density Estimation and Its Computational Aspects. Vol. 37. Cham, Switzerland: Springer.
Google Scholar
Ilhan, Aytaç, Jonsson, Mattias, and Sircar, Ronnie. 2009. “Optimal Static-Dynamic Hedges for Exotic Options Under Convex Risk Measures.” Stochastic Processes and Their Applications 119 (10): 3608–3632. https://doi.org/10.1016/j.spa.2009.06.009.
Web of Science ®Google Scholar
Jaimungal, Sebastian, Pesenti, Silvana M., Sheng Wang, Ye, and Tatsat, Hariom. 2022. “Robust Risk-Aware Reinforcement Learning.” SIAM Journal on Financial Mathematics 13 (1): 213–226. https://doi.org/10.1137/21M144640X.
Web of Science ®Google Scholar
Kolm, Petter N., and Ritter, Gordon. 2019. “Dynamic Replication and Hedging: A Reinforcement Learning Approach.” The Journal of Financial Data Science 1 (1): 159–171.
Google Scholar
Limmer, Yannick, and Horvath, Blanka. 2023. Robust Hedging GANs. Available at SSRN 4489029.
Google Scholar
Lütkebohmert, Eva, Schmidt, Thorsten, and Sester, Julian. 2022. “Robust Deep Hedging.” Quantitative Finance 22 (8): 1465–1480. https://doi.org/10.1080/14697688.2022.2056073.
Web of Science ®Google Scholar
Martinelli, Lionel. 2000. “Efficient Option Replication in the Presence of Transaction Costs.” Review of Derivatives Research 4 (2): 107–131. https://doi.org/10.1023/A:1009632624999.
Google Scholar
Merton, Robert. 1973. “Theory of Rational Option Pricing.” The Bell Journal of Economics and Management Science 4 (1): 141–183. https://doi.org/10.2307/3003143.
Google Scholar
Rahimian, Hamed, and Mehrotra, Sanjay. 2019. “Distributionally Robust Optimization: A Review.” Preprint, arXiv:1908.05659.
Google Scholar
Ruf, Johannes, and Wang, Weiguan. 2020. “Neural Networks for Option Pricing and Hedging: A Literature Review.” Journal of Computational Finance 24 (1): 1–46.
Web of Science ®Google Scholar
Santambrogio, Filippo. 2015. Optimal Transport for Applied Mathematicians. Progress in Nonlinear Differential Equations and Their Applications. Springer International Publishing.
Google Scholar
Westermark, Niklas. 2009. Barrier Option Pricing. Degree Project in Mathematics, First Level.
Google Scholar
Whalley, A. E., and Wilmott, P.. 1997. “An Asymptotic Analysis of an Optimal Hedging Model for Option Pricing with Transaction Costs.” Mathematical Finance 7 (3): 307–324. https://doi.org/10.1111/mafi.1997.7.issue-3.
Web of Science ®Google Scholar
Yaari, Menahem. 1987. “The Dual Theory of Choice Under Risk.” Econometrica 55 (1): 95. https://doi.org/10.2307/1911158.
Web of Science ®Google Scholar