Full article: Deep impulse control: application to interest rate intervention

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

We propose a deep learning framework for impulse control problems involving multivariate stochastic processes, which can be controllable or uncontrollable. We use this framework to estimate central bank interventions on the (controllable) interest rate to stabilize the (uncontrollable) inflation rate, where the two rates are correlated and cointegrated. This method is useful for small banks or insurance companies with high exposure to Treasury securities to predict and stress-test their potential losses from central bank interventions. We also study the mathematical properties of the proposed framework.

Keywords:

1. Introduction

Monetary policy is one of the most important ways in which the government can affect the speed and direction of economic growth through central banking (Friedman Citation2000). For example, the United States Congress directs the Federal Reserve (Fed) to pursue several economic goals, including maximum employment, stable prices, and moderate long-term interest. The Fed uses monetary policy consisting of actions and communications to achieve these goals.Footnote¹ Central banks have multiple ways of controlling economic activity, and recent research shows that one of the direct instruments that the central bank can use to do so is to choose between a short-term interest rate and a reserve quantity (Friedman Citation2000). In this paper, we focus on interest rate interventions, as they are closely linked to the 2023 small bank failures in the United States, such as Silicon Valley Bank (SVB). In an interview with Douglas DiamondFootnote² , winner of the 2022 Nobel Prize for his research on bank runs, the reporter writes,

Douglas Diamond argues that the Fed's choice to signal long-term low interest rates, and then suddenly reverse course by raising interest rates in response to inflation, is a major reason for the collapse at Silicon Valley Bank….

Diamond pointed out that even in the Fed's 2022 stress tests, banks were not tested at treasury yield rates above 2%. Although SVB was not subject to a stress test, it likely would have passed under those parameters.

However, due to rapid inflation beginning in the middle of 2021, the Fed began raising interest rates quickly in 2022. Today, the effective Fed funds rate is 4.57%. This sudden reversal, which according to Diamond was not well-telegraphed by the Fed, is the reason that the market value of SVB's securities began to plummet.

This motivates us to use contemporary technology to predict interest rate interventions to control inflation and set a realistic stress test target for financial institutions.

There is a rich body of literature on interest rate dynamics in an open market, such as instantaneous interest rate models (Vasicek Citation1977, Hull and White Citation1990, Black and Karasinski Citation1991, Cox et al. Citation2005) and corresponding empirical studies (Chan et al. Citation1992, Chapman and Pearson Citation2000). In Guttentag (Citation1969), the importance of central bank intervention is highlighted in addition to the effect of market forces on interest rates. In 2022, the Fed launched seven large-scale interventions on the federal funds rate, causing substantial changes in the interest rate market after each intervention Footnote 1. Some studies explore central bank interventions on interest rates, such as interest rate targeting behavior (Rudebusch Citation1995), the Fed's target rate on interest rate dynamics (Balduzzi et al. Citation1997), and its affect on the yield curve (Piazzesi Citation2005). We specifically discuss studies of optimal interest rate interventions (Cadenillas and Zapatero Citation1999, Citation2000, Feng and Muthuraman Citation2010, Mitchell et al. Citation2014). All of these studies formulate optimal interventions as impulse control problems. The optimal interventions derived are then based on tractable models and interest rate targets. Like these studies, we consider optimal impulse control. However, unlike them, our task is to develop a deep learning approach to estimate optimal interest rate interventions to control inflation under a general class of models. In this way, our consideration proposed framework can accommodate a wider range of stochastic models and a more flexible economic target, such as the inflation rate target.

In this paper, we consider the situation in which the central bank is able to intervene on several interest rates, such as short-term and long-term rates, to affect the targeted economic variable (e.g. the inflation rate), which is correlated and cointegrated with the interest rates (Booth and Ciner Citation2001). Thus, the cost function of the central bank is related to the targeted value of the economic variable and the cost of intervention. The central bank then aims to find a control policy to minimize the cost of deviation from the target level and the cost of intervention. In reality, the central bank determines the best time to announce the change in fund rates. Decisions about the timing and extent of intervention constitute the impulse control policy (Constantinides and Richard Citation1978, Sulem Citation1986). Under a Brownian filtration, there are computational methods for impulse control problems (Harrison et al. Citation1983, Feng and Muthuraman Citation2010). Some studies examine impulse control of exchange and interest rates with Ornstein-Uhlenbeck (OU) processes (Cadenillas and Zapatero Citation1999, Citation2000, Mitchell et al. Citation2014). However, solutions for high-dimensional impulse control problems encounter tremendous challenge in terms of analytical tractability. The conventional approaches encounter challenges when it comes to formulating quasi-variational inequalities for high-dimensional impulse control problems, and the inherent uncontrollable nature of the inflation rate further exacerbates the complexity of the problem. As a consequence, prior studies often (if not always) assume a pre-specified target interest rate and derive analytical solutions with one-dimensional impulse control techniques. In reality, the central bank intervenes the interest rate without knowing the target interest rate but the target inflation rate. When the previous impulse control research concentrates on infinite horizon problems for mathematical tractability, the central bank may want to complete the task in a finite horizon. Machine learning techniques, especially deep learning, offer computational feasibility for high-dimensional quantitative finance problems (Tsang and Wong Citation2020, Lo and Singh Citation2023, Mikkilä and Kanniainen Citation2023, Na and Wan Citation2023, Yin and Wong Citation2023). Our deep learning framework can be applied to a more general model setup and can accommodate higher dimensional problems, and handle the finite horizon problems. And some of our numerical comparisons are made against the literature to show the accuracy of our computation.

Our major contribution to the literature is the development of a deep learning framework for impulse control problems, which is based on the deep optimal stopping framework of Becker et al. (Citation2019) and Jia et al. (Citation2023) and additional layers for learning intervention actions. When we apply our framework to real data, our approach predicts Fed funds rate hikes consistent with the Fed's real interventions in terms of aggregate increase, although our predictions are more volatile than those of the real interventions. We suggest that such a discrepancy is associated with the Fed's intention to avoid volatile adjustments, but the aggregate increase in fund rates is similar when the inflation target is reached. Therefore, our framework also contributes to the practical stress test target of interest rate products. In addition, the application of impulse control can go beyond central bank intervention problems, such as inventory control (Cadenillas et al. Citation2010) and reinsurance strategy (Yan et al. Citation2022b).

The remainder of this paper is organized as follows. Section 2 presents the problem formulation and the cost function. Section 3 introduces our machine learning framework, including the neural network (NN) architecture, the approximation method, and the training procedure. The numerical study in section 4 examines the accuracy of our proposed method and the effectiveness of preventing the class vanishing problem where the neural networks are easily to be trapped into the local optimal in a one-dimensional setup. We then perform a real case study of taming inflation by intervening on interest rates. We also demonstrate the use of our proposed framework when the central bank applies impulse control to two interest rates. Finally, section 6 concludes the paper.

2. Problem formulation

Let $(Ω, F, P)$ denote a filtered probability space and T>0 a deterministic fixed future time. The $F = {F_{t}}_{0 \leq t \leq T}$ is the usual filtration. The $Z_{+}$ denotes the set of positive integers and $R_{+}$ is the set of positive real numbers.

Following (Mitchell et al. Citation2014), we postulate that the interest rate follows the Ito process. There are several differences between our model and previous research (Cadenillas et al. Citation2010, Mitchell et al. Citation2014). First, $R_{t} \in R^{d_{1}}$ is a multi-dimensional rather than one-dimensional stochastic process with a general form on which impulse control can be applied. Second, we add a multi-dimensional stochastic process, $I_{t} \in R^{d_{2}}$ , which is cointegrated with $R_{t}$ . However, impulse control cannot be applied to $I_{t}$ . For example, when $I_{t}$ indicates the inflation rate for which the central bank has a target range, the central bank can only intervene on the interest rate to monitor the inflation rate. Third, to mimic real-world settings, we assume that impulse control can only be applied to a set of pre-fixed time points ${t_{1}, t_{2}, \dots, t_{N}} \equiv T$ with a fixed action set $C$ . This is a reasonable assumption for controlling the interest rate because the central bank only changes its monetary policy at regular meetings, and interest rate changes are usually made based on a set of fixed actions rather than $R$ . Fourth, we consider a fixed termination time T>0 rather than ∞ because the members of the Federal Open Market Committee change every year. Therefore, it is reasonable to assume that the objective function will change accordingly.

Under the physical measure, the uncontrolled stochastic process is defined as (1) $\begin{aligned} d I_{t} = μ_{1} (R_{t}, I_{t}) d t + σ_{1} (R_{t}, I_{t}) d W_{t}^{1}, \\ d R_{t} = μ_{2} (R_{t}, I_{t}) d t + σ_{2} (R_{t}, I_{t}) d W_{t}^{2} . \end{aligned}$ (1) This general formulation can easily incorporate several commonly used stochastic processes, including the Cox–Ingersoll–Ross process (Cox et al. Citation2005) and the OU process (Vasicek Citation1977).

The impulse control applied to $R_{t}$ is a sequence of immediate (upward or downward) changes in $R_{t}$ by a certain nonzero amount. We define impulse control v as a sequence of pairs of $(τ_{i}, ξ_{i})_{i = 1}^{z}$ , where z is the number of impulse controls applied, $0 \leq τ_{1} < τ_{2} < \dots < τ_{z} \leq T$ are non-decreasing stopping times belonging to $T$ , and $ξ_{i} \in C$ is the corresponding control amount for $τ_{i}$ . Given a specific impulse control $v = ((τ_{1}, ξ_{1}), (τ_{2}, ξ_{2}), \dots, (τ_{z}, ξ_{z}))$ , the stochastic processes in (Equation1(1) $\begin{aligned} d I_{t} = μ_{1} (R_{t}, I_{t}) d t + σ_{1} (R_{t}, I_{t}) d W_{t}^{1}, \\ d R_{t} = μ_{2} (R_{t}, I_{t}) d t + σ_{2} (R_{t}, I_{t}) d W_{t}^{2} . \end{aligned}$ (1) ) becomes (2) $\begin{aligned} I_{t} & = i_{0} + \int_{0}^{t} μ_{1} (R_{s}, I_{S}) d_{s} + \int_{0}^{t} σ_{1} (R_{s}, I_{s}) d W_{s}, \\ R_{t} & = r_{0} + \int_{0}^{t} μ_{2} (R_{s}, I_{s}) d_{s} + \int_{0}^{t} σ_{2} (R_{s}, I_{s}) d W_{s} \\ + \sum_{i = 1}^{z} I_{{τ_{i} < t}} ξ_{i} . \end{aligned}$ (2) After modeling the stochastic process, we define the cost and objective functions for the impulse control problem. We assume that the central bank can quantify its preferences for $(R_{t}, I_{t})$ and its aversion to intervening. This indicates that the central bank can evaluate its preferences for $(R_{t}, I_{t})$ through a running cost function and its aversion through an action cost function. The running cost and action cost functions measure the central bank's preferences and aversion rather than the real money it pays, which is similar to the settings in Lohmann (Citation1992) and Cadenillas and Zapatero (Citation1999). We adopt the general function $ϕ (t, R_{t}, I_{t})$ to describe the running cost rate at time t with $(R_{t}, I_{t})$ . The action cost of intervention is represented by the function $G (ξ)$ for each time with an action of size ξ. In this paper, we consider a finite time impulse control problem with maturity T and a discount factor β for future costs. By combining these two costs, the total cost to the central bank of impulse control v with an initial value $(r_{0}, i_{0})$ is (3) $\begin{aligned} J_{r_{0}, i_{0}} (v) & = E [\int_{0}^{T} e^{- βt} ϕ (t, R_{t}, I_{t}) d t \\ + \sum_{i = 1}^{z} e^{- β τ_{i}} G (ξ_{i}) | R_{0} = r_{0}, I_{0} = i_{0}] . \end{aligned}$ (3) The ultimate goal of the central bank is to find an optimal policy to minimize the objective function J. Assuming that $ϕ, G > 0$ , we are interested in impulse control v such that (4) $J_{r_{0}, i_{0}} (v) < \infty .$ (4) The collection of all impulse controls is defined as the admissible control set $A$ . The value function V is then given by (5) $V (r_{0}, i_{0}) = min_{v \in A} J_{r_{0}, i_{0}} (v) .$ (5) The general forms of ϕ and G are consistent with previous research on the impulse control of interest rates (Mitchell et al. Citation2014). The key assumption in our method is that the stochastic processes $(R_{t}, I_{t})$ are Markovian. In many applications, this assumption is not restrictive because Markovian processes can be realized by including past information. There are also restrictions on ϕ so that the solutions are non-trivial. Our method does not rely on these restrictions, and interested readers may refer to the following studies (Constantinides and Richard Citation1978, Feng and Muthuraman Citation2010, Mitchell et al. Citation2014).

3. Deep impulse control framework

Studies of impulse control are generally designed for one-dimensional cases, while the cointegrated process $I_{t}$ has yet to be considered (Cadenillas et al. Citation2010, Mitchell et al. Citation2014). Traditional methods encounter difficulties for higher dimensional problems when impulse control can be applied to the multi-dimensional interest rate vector $R_{t}$ , and the incorporation of vector $I_{t}$ makes the problem even more difficult. For example, the central bank can control both short-term and long-term rates to control inflation. To handle the general impulse control problem defined in the previous section, we develop a novel NN to identify the optimal impulse control policy, which is inspired by Becker et al. (Citation2019).

3.1. Express impulse control using neural networks

For impulse control v, two decisions must be made at each time point $t \in T$ : whether v is needed at time t, and its magnitude if so. We consider a larger set of actions $C^{*} \equiv C \cup {0}$ to combine these two decisions. The non-intervention decision is regarded as an impulse control of magnitude 0. In this approach, the number of impulse controls z corresponds to the cardinal number of $T$ denoted N in the rest of the paper.

For $n = 1, 2, \dots N$ , the optimal impulse control at time $t_{n}$ should generally be based on the whole process of $(I, R)$ from time 0 to time $t_{n}$ . However, only $(R_{t_{n}}, I_{t_{n}})$ is sufficient to make the decision at time $t_{n}$ because of the Markov property of the process. Let us assume that the optimal impulse control decision at time $t_{n}$ is $f_{n} (R_{t_{n}}, I_{t_{n}})$ for the measurable function $f_{n} : R^{d_{1} + d_{2}} \to {e_{1}, e_{2} \dots, e_{M}}$ . $e_{i}$ is the one-hot vector taking a value of 1 in the ith coordinate and 0 otherwise, and M is the cardinal number of $C^{*}$ . The optimal control policy can be represented by a sequence of $f_{n} (R_{t_{n}}, I_{t_{n}})$ for $n = 1, 2 \dots, N$ . The idea is to approximate $f_{n}$ through an NN $f^{θ_{n}}$ whose input is the pair $R_{t_{n}}, I_{t_{n}}$ and the output is a one-hot vector. Then, we can obtain the control policy $v^{θ} \equiv {f^{θ_{1}}, f^{θ_{2}}, \dots, f^{θ_{N}}}$ .

3.2. Neural network approximation

The numerical method approximates $f_{n}$ with an NN $f^{θ_{n}}$ for $n \in {1, 2, \dots, N}$ through iterative backward training. Before considering the approximation procedure, we define some notations for better illustration. Consider an auxiliary problem (6) $V_{n} = min_{v \in A_{n}} J (v),$ (6) where $A_{n}$ is the set of all admissible controls in which $ξ_{i} = 0$ for $i = 1, 2 \dots, n - 1$ . We define the expected future cost condition on $(R_{t_{n}}, I_{t_{n}})$ as (7) $\begin{aligned} J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v) \\ = E [\int_{t_{n}^{-}}^{T} e^{- βt} ϕ (t, R_{t}, I_{t}) d t + \sum_{i = n}^{N} e^{- β τ_{i}} G (ξ_{i}) | R_{t_{n}^{-}}, I_{t_{n}^{-}}] . \end{aligned}$ (7) Let the magnitude of the m-th intervention on R be $ξ^{m}$ . For a fixed $n \in {1, 2, \dots, N}$ , a given control in $A_{n}$ is defined as a sequence of measurable functions $f_{n}, f_{n + 1}, \dots, f_{N}$ . The following proposition supports the iterative backward training method.

Proposition 3.1

For a specific $n \in {0, 1, \dots, N - 1}$ , let $v_{n + 1}$ be an impulse control policy generated by measurable functions $f_{n + 1}, \dots, f_{N}$ . Then, there exists a measurable function $f_{n}$ such that the impulse control policy given by $f_{n}, f_{n + 1}, \dots, f_{N}$ satisfies $\begin{aligned} J (v_{n}) - V (n) & \leq \sum_{m = 1}^{M} E [J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) \\ - min_{v \in A_{n + 1}} J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v)], \end{aligned}$ where $V_{n}$ and $V_{n + 1}$ are defined as in (Equation6(6) $V_{n} = min_{v \in A_{n}} J (v),$ (6) ).

Proof.

Let $\begin{aligned} ϵ_{m} & = E [J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) \\ - min_{v \in A_{n + 1}} J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v)] \end{aligned}$ with $v_{n} \in A_{n}$ as the impulse control policy. Then, we have $\begin{aligned} V_{n + 1} - J (v_{n + 1}) & = min_{v \in A_{n + 1}} J (v) - J (v_{n + 1}) \\ = min_{v \in A_{n + 1}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] \\ - E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v_{n + 1})] \end{aligned}$ where the cost from time 0 to time $t_{n}^{-}$ is the same and is therefore subsumed. Let $\begin{aligned} D_{m} & = {J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) \\ + e^{- β t_{n}} G (ξ^{m}) be the minimum for ∀ξ \in C^{*}} \\ E_{m} & = {ξ_{n} = ξ^{m}} . \end{aligned}$ and $f_{n} \equiv (I_{D_{1}}, I_{D_{2}}, \dots, I_{D_{M}})$ , with $I$ being the indicator function. Then $\begin{aligned} V_{n} - J (v_{n}) \\ = min_{v \in A_{n}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] - E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v_{n})] \\ = min_{v \in A_{n}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] \\ - E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) + e^{- β t_{n}} G (ξ^{m})) I_{D_{m}}] \\ \geq min_{v \in A_{n}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] \\ - E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) + e^{- β t_{n}} G (ξ^{m})) I_{E_{m}}] \\ = min_{v \in A_{n}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] \\ - \sum_{m = 1}^{M} E [(J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) + e^{- β t_{n}} G (ξ^{m})) I_{E_{m}}] \\ \geq min_{v \in A_{n}} E [J^{'} (t_{n}^{-}, R_{t_{n}^{-}}, I_{t_{n}^{-}}, v)] \\ - \sum_{m = 1}^{M} (E [(min_{v \in A_{n + 1}} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) \\ + e^{- β t_{n}} G (ξ^{m})) I_{E_{m}}] + ϵ_{m}) . \end{aligned}$ As $E_{m}$ is arbitrary, it is also satisfied in the optimal case: $V_{n} - J (v_{n}) \geq - \sum_{m = 1}^{M} ϵ_{m} . ■$

Proposition 3.2

Let $n \in {1, 2, \dots, N}$ and impulse control $v \in A$ . For any depth of the NN with $D \geq 2$ and $ϵ > 0$ , there exist positive integers $q_{1}, q_{2}, \dots, q_{D - 1}$ such that $\begin{aligned} inf_{θ \in R^{q}} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) + e^{- β t_{n}} G (ξ^{m})) f_{m}^{θ} (R_{t_{n}^{-}}, I_{t_{n}})] \\ \leq inf_{f \in D} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) \\ + e^{- β t_{n}} G (ξ^{m})) f_{m} (R_{t_{n}^{-}}, I_{t_{n}})] + ϵ, \end{aligned}$ where $D$ is the set of all measurable one-hot functions.

Proof.

For a fixed $ϵ > 0$ , the integrability condition ensures that there exists a measurable function $\tilde{f} : R^{d_{1} + d_{2}} \to {e_{1}, \dots, e_{M}}$ such that $\begin{aligned} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) + e^{- β t_{n}} G (ξ^{m})) {\tilde{f}}_{m} (R_{t_{n}^{-}}, I_{t_{n}})] \\ \leq inf_{f \in D} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) \\ + e^{- β t_{n}} G (ξ^{m})) f_{m} (R_{t_{n}^{-}}, I_{t_{n}})] + \frac{ϵ}{3}, \end{aligned}$ and there exists a Borel set $A_{m}$ such that ${\tilde{f}}_{m} = 1_{A_{m}}$ . As the output of $\tilde{f}$ is a one-hot function, $A_{m}$ is disjoint and $⋃_{m = 1}^{M} A_{m} = R^{d_{1} + d_{2}}$ . Based on the integrability condition and (Equation4(4) $J_{r_{0}, i_{0}} (v) < \infty .$ (4) ), $B \to E [(J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) + e^{- β t_{n}} G (ξ^{m})) 1_{A_{m}} (R_{t_{n}^{-}}, I_{t_{n}})],$ which defines a finite Borel measure on $R^{d_{1} + d_{2}}$ . Let $ρ_{K} : R^{d_{1} + d_{2}} \to [0, \infty]$ such that $ρ_{K} (x) = inf_{y \in K} ‖ x - y ‖_{2}$ . Then, $k_{a, m} (x) = max {1 - a ρ_{K_{m}} (x), - 1}, a \in N$ defines a sequence of continuous functions $k_{a, m} : R^{d} \to [- 1, 1]$ that converge pointwisely to $1_{K_{m}} - 1_{K_{m}^{C}}$ for each m. By the dominated convergence theorem, there exists $a_{m} \in N$ such that $\begin{aligned} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) + e^{- β t_{n}} G (ξ^{m})) I_{{k_{a_{m}, m} (R_{t_{n}^{-}}, I_{t_{n}}) \geq 0}}] \\ \leq E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) \\ + e^{- β t_{n}} G (ξ^{m})) I_{K_{m}} (R_{t_{n}^{-}}, I_{t_{n}})] + \frac{ϵ}{3} . \end{aligned}$ We require $a_{m}$ to satisfy at least one $k_{a_{m}, m} (R_{t_{n}^{-}}, I_{t_{n}}) \geq 0$ for $\forall (R_{t_{n}^{-}}, I_{t_{n}})$ . This can be done because the union of $A_{m}$ is $R^{d_{1} + d_{2}}$ . By Leshno's theorem, $k_{a, m}$ can be approximated uniformly on compact sets by functions of the form (8) $\sum_{i = 1}^{r} (v_{i}^{⊤} x + c_{i})^{+} - \sum_{i = 1}^{s} (ω_{i}^{⊤} x + d_{i})^{+} .$ (8) Hence, there exists a function $h_{m} : R^{d_{1} + d_{2}} \to R$ in (Equation8(8) $\sum_{i = 1}^{r} (v_{i}^{⊤} x + c_{i})^{+} - \sum_{i = 1}^{s} (ω_{i}^{⊤} x + d_{i})^{+} .$ (8) ) such that $\begin{aligned} E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) + e^{- β t_{n}} G (ξ^{m})) I_{{h_{m} (R_{t_{n}^{-}}, I_{t_{n}}) \geq 0}}] \\ \leq E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v) \\ + e^{- β t_{n}} G (ξ^{m})) I_{{k_{a_{m}, m} (R_{t_{n}^{-}}, I_{t_{n}}) \geq 0}}] + \frac{ϵ}{3} . \end{aligned}$ $h_{m}$ can be expressed as an NN. We can then combine multiple $h_{m}$ functions to form a larger prediction network.

Based on the above two propositions, we can construct an NN to approximate the optimal policy. We introduce the NN architecture in the next section.

3.3. Neural network architecture

Based on Proposition 3.1, it is theoretically possible to find a fully connected deep neural network (DNN) $f^{θ_{n}}$ to approximate $f_{n}$ . However, the output of $f^{θ_{n}}$ is ${e_{1}, e_{2} \dots, e_{M}}$ , to which gradient descent cannot be applied. Therefore, we include an NN $F^{θ_{n}}$ as a transition step for optimization purposes. $F^{θ_{n}}$ is continuous and almost everywhere differentiable, and our objective is to minimize the following function: $E [\sum_{m = 1}^{M} (J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, ξ_{n + 1}) + e^{- β t_{n}} G (ξ^{m})) F_{m}^{θ} (R_{t_{n}^{-}}, I_{t_{n}})] .$ After determining $F^{θ_{n}}$ , the function $f^{θ_{n}}$ is defined as $f^{θ_{n}} = pmax (F^{θ_{n}})$ where $pmax : (0, 1)^{M} \to {e_{1}, e_{2} \dots, e_{M}}$ is the function used to find the position of the maximum value of the input. The softmax and pmax functions restrict the output of the NN to ${e_{1}, e_{2}, \dots, e_{M}}$ , resulting in a smaller value than the indicator functions of Proposition 3.2. When there is more than one maximum value in $F^{θ_{n}}$ , we assume that pmax finds the position of the first maximum value by convention.

During the training process, we find that the general fully connected NN has a high probability of falling into the local minimums. Furthermore, the iterative backward training method worsens the situation because the accuracy of $F^{θ_{n}}$ depends on an accurate estimate of $f^{θ_{n + 1}}, \dots, f^{θ_{N}}$ .

This phenomenon is due to the complex nature of our task. Our problem is neither a traditional regression problem nor a traditional classification problem. It differs from classical regression problems because the outputs of the NN are one-hot vectors. It also differs from classification problems because the objective function is not related to the misclassification error and there is no ground truth available for the training process. Our task is a classification task with a regression-type objective function, and this type of combination results in a high probability of being trapped in local minimums during gradient descent. To the best of our knowledge, there is no research describing this phenomenon, which we refer to as ‘class vanishing’ in the rest of the paper. When class vanishing occurs, the output of the DNN will be a proper subset of ${e_{1}, e_{2} \dots, e_{M}}$ , which means that the DNN solves a simpler problem with a smaller set of actions than the original set. We further illustrate the class vanishing problem in the numerical study section for the one-dimensional case, as a benchmark is available for this case.

To solve the class vanishing problem, we propose a new NN architecture that proves useful in the one-dimensional case. Because of the lack of benchmarks for high-dimensional cases, we leave it to future research to test the effectiveness of this NN architecture in high-dimensional situations.

$F^{θ_{n}}$ takes the following form: (9) $\begin{aligned} F^{θ_{n}} & = ψ (F_{1}^{θ_{n}}, F_{2}^{θ_{n}}, \dots, F_{M}^{θ_{n}}) \\ F^{θ_{n}, 1} & = a_{I_{1}, 1}^{θ_{n}} \circ φ_{q_{D_{1} - 1, 1}} \circ a_{D_{1} - 1, 1}^{θ_{n}} \circ \dots \circ φ_{q_{1, 1}} \circ a_{1, 1}^{θ_{n}} \\ \dots \\ F^{θ_{n}, M} & = a_{D_{M}, M}^{θ_{n}} \circ φ_{q_{D_{M} - 1, M}} \circ a_{D_{M} - 1, M}^{θ_{n}} \circ \dots \circ φ_{q_{1, M}} \circ a_{1, M}^{θ_{n}}, \end{aligned}$ (9) where $D_{k}$ is the depth of the NN $F^{θ_{n}, k}$ , $q_{1, k}, q_{2, k}, \dots q_{D_{k} - 1, k}$ are positive integers indicating the number of nodes in the hidden layer of $F^{θ_{n}, k}$ . $ψ : R^{M} \to (0, 1)^{M}$ is the standard softmax function defined as $ψ (x)_{i} = \frac{e^{x_{i}}}{\sum_{i = 1}^{M} e^{x_{i}}} for i = 1, 2 \dots, M and x \in R^{M} .$ $a_{1, k}^{θ_{n}} : R^{d_{1} + d_{2}} \to R^{q_{1, k}}, \dots, a_{D_{k} - 1, k}^{θ_{n}} : R^{q_{D_{k} - 2, k}} \to R^{q_{D_{k} - 1, k}}$ and $a_{D_{k}, k}^{θ_{n}} : R^{q_{D_{k} - 1, k}} \to R$ are affine functions. $φ_{a} : R^{a} \to R^{a}$ are $ReLU$ activation functions where $φ (x_{1}, x_{2} \dots, x_{a}) = (x_{1}^{+}, x_{2}^{+}, \dots, x_{a}^{+})$ . The NN $F^{θ_{n}}$ consist of M different NNs, and each NN is only responsible for predicting one impulse control choice in $C^{*}$ . The outputs of small NNs are combined through a softmax function, which is commonly used for multi-class classification. An overview of the NN architecture is shown in figure . The idea here is very similar to ensemble learning techniques such as xgboost or random forest. However, our aim is to prevent the occurrence of the class vanishing problem rather than to make the bias–variance tradeoff. We refer to this type of NN as the ensemble NN in the rest of the paper. The ensemble NN can be regarded as a kind of regularization method similar to the DropConnect method proposed in Wan et al. (Citation2013). The difference is that DropConnect randomly sets the weights to 0, whereas we do it deterministically.

Figure 1. Neural network architecture of ensemble network. Each small NN will predict one action and the results are normalized by the softmax function.

The training of $F^{θ_{n}}$ depends on the sample of $(R_{t_{n}^{-}}, I_{t_{n}})$ , and $F^{θ_{n}}$ may not minimize $ϵ_{m}$ for $\forall m$ in Proposition 3.1. We can train different $F_{m}^{θ_{n}}$ functions for different distributions of $(R_{t_{n}^{-}}, I_{t_{n}})$ to minimize $ϵ_{m}$ . In our numerical study below, one $F^{θ_{n}}$ function is enough to obtain a satisfactory control policy.

3.4. Parameter optimization

We determine the depth $D_{k}$ and the number of nodes in the hidden layer $q_{1, k}, \dots q_{D_{k} - 1, k}$ for all k and train the NN described in section 3.3. To obtain the parameter $θ_{n}$ numerically, we simulate H paths of $(R_{t}, I_{t})$ and minimize the sample average, which can be regarded as an approximation of the expectation of the cost function.

Consider a given $n \in {1, \dots, N}$ , we assume that the parameters $θ_{n + 1}, \dots, θ_{N}$ are determined and generate an impulse control policy $v_{n + 1}$ . We denote the cost calculated by the hth path from time $t_{n}$ to time T by $l_{n}^{h}$ , which is defined as $l_{n}^{h} = \int_{t_{n}}^{T} e^{- βt} ϕ (t, r_{t}^{h}, i_{t}^{h}) d t + \sum_{i = n}^{N} e^{- β τ_{i}} G (ξ_{i}) .$ $l_{n}^{h}$ is estimated in the discrete case by $l_{n}^{h} = \sum_{i = n}^{N} [e^{- β t_{i + 1}} ϕ (t_{i + 1}, r_{t_{i + 1}^{-}}^{h}, i_{t_{i + 1}}^{h}) (t_{i + 1} - t_{i}) + e^{- β t_{i}} G (ξ_{i})],$ where $t_{N + 1} \equiv T$ . In this paper, the integral is measured by ϕ at the end time point of each interval. We choose the end time point here because it is the most difficult to predict. The input for $f^{θ_{n}}$ is $(R_{t_{n}^{-}}, I_{t_{n}})$ and the resulting process values are $(R_{t_{n}^{-}} + ξ_{n}, I_{t_{n}})$ , which contain no randomness. It is easier for the NN to find the optimal control policy if we include $(R_{t_{n}^{-}} + ξ_{n}, I_{t_{n}})$ in the cost calculation.

Suppose that we can apply impulse control according to $F^{θ_{n}}$ at time $t_{n}$ and make a decision according to $f^{θ_{n + 1}}, \dots, f^{θ_{N}}$ ; then, the cost for the hth simulated path is given by $\begin{aligned} Γ_{n}^{h} (θ_{n}) & = \sum_{m = 1}^{M} [(e^{- β t_{n + 1}} ϕ (t_{n + 1}, r_{t_{n + 1}^{-}}^{h, m}, i_{t_{n + 1}}^{h, m}) \\ + e^{- β t_{n}} G (ξ^{m}) + l_{n + 1}^{h, m}) F_{m}^{θ_{n}} (r_{t_{n}^{-}}^{h}, i_{t_{n}^{-}}^{h})], \end{aligned}$ where $(r^{h, m}, i^{h, m})$ is the simulated path with $ξ^{m}$ applied at time $t_{n}^{-}$ and $l_{n + 1}^{h, m}$ is the corresponding cost from time $t_{n + 1}$ to time T.

For a large H, the mean of $Γ_{n}^{h}$ , $\frac{1}{H} \sum_{h = 1}^{H} Γ_{n}^{h} (θ_{n}),$ approximates the expectation $\begin{aligned} E [\sum_{m = 1}^{M} ((J^{'} (t_{n}, R_{t_{n}^{-}} + ξ^{m}, I_{t_{n}}, v_{n + 1}) + e^{- β t_{n}} G (ξ^{m})) \\ \times F_{m}^{θ} (R_{t_{n}^{-}}, I_{t_{n}})], \end{aligned}$ so that it is used to find $θ_{n}$ through gradient descent.

We divide the training process into two phases, the pre-training phase and the fine-training phase, to prevent class vanishing. In the pre-training phase, the objective function is $\frac{1}{H} \sum_{h = 1}^{H} [Γ_{n}^{h} (θ_{n}) - λ \sum_{m = 1}^{M} \log (F_{m}^{θ_{n}} (r_{t_{n}^{-}}^{h}, i_{t_{n}}^{h}) + ε)] .$ The first part of this equation is the expectation that we want to minimize and the second part is the penalty term for the occurrence of a small probability in $F^{θ_{n}}$ . λ is the pre-defined parameter to balance these two objectives and ε is a small number to prevent log explosion during the training process. By adding the penalty term, we can avoid the class vanishing problem but may introduce an additional bias into our predictions. After the pre-training phase, we remove the penalty term and continue to train the model in the fine-training phase to remove this additional bias.

4. Numerical study

In this section, we examine our method in three scenarios. First, we do not consider the uncontrollable stochastic process I and test our method in the one-dimensional case. Because impulse control in the one-dimensional case is well studied, we can use the results of Cadenillas et al. (Citation2010) as a benchmark to examine our method. Second, we illustrate our method using data on real inflation and effective federal funds rates (EFFR) in the United States. We compare our results with the Fed's decision on the EFFR in 2022. Next, we examine the effect of our model with a multi-dimensional R and a one-dimensional I through a simulation study.

4.1. One-dimensional benchmark

The one-dimensional stochastic impulse control problem is studied in Cadenillas et al. (Citation2010), and we follow these settings for our simulation study. Assume that $R_{t}$ follows an OU process given by $d R_{t} = a (b - R_{t}) d_{t} + σ d W_{t}$ when there is no impulse control. a = 0.2 is the speed of reversion, b = 2 represents the long-term mean level, $σ = 1.2$ is the instantaneous volatility of the model, and $W_{t}$ is the one-dimensional Brownian motion. The discount rate β is set to 0.06.

There are several differences between (Cadenillas et al. Citation2010) and our study, leading to a deviation between our optimal policy and theirs. In Cadenillas et al. (Citation2010), T is set to ∞ and $T = R_{\geq 0}$ are non-negative real numbers, and $C = R$ . For our numerical study, T is set to 5, $T = {0, 1, 2, \dots, 4}$ , and $C^{*} = {- 6, - 3, 0, 3, 6}$ . We use the same running cost function and action cost function defined as $\begin{aligned} ϕ (r) & = (r - 2)^{2} \\ G (ξ) & = {\begin{cases} 5 + 2 \times ξ & if ξ > 0 \\ 5 & if ξ = 0 \\ 5 - 2 \times ξ & if ξ < 0 \end{cases} \end{aligned}$ in Cadenillas et al. (Citation2010), respectively. The cost should be ${- 17, - 11, 5, 11, 17}$ for $C^{*}$ based on the two cost functions above. However, 0 in $C^{*}$ represents the choice of no impulse control rather than an impulse control of magnitude 0. Therefore, we replace 5 with 0, and the cost is ${- 17, - 11, 0, 11, 17}$ in our paper.

We train backwardly from $f^{θ_{4}}$ with the NN architecture defined in section 3.3. The depth $D_{k} = 3$ , and $q_{1, k} = q_{2, k} = 8$ for $\forall k$ . The $R_{t}$ process in (4.1) can be easily simulated, and the initial value is sampled from a uniform distribution $U (- 11, 15)$ to allow the NN to learn different scenarios. For each $F^{θ_{n}}$ , we simulate 8,192 sample paths for the training process, and we conduct 1,500 training steps in the pre-training phase with $λ = 0.3$ and 3,000 training steps in the fine-training phase.

$F^{θ_{4}}$ and $F^{θ_{2}}$ are shown in figure with different $r_{t_{4}^{-}}$ and $r_{t_{2}^{-}}$ values. $F^{θ_{0}}, \dots, F^{θ_{2}}$ are all similar, and we show $F^{θ_{2}}$ here. We also show $F^{θ_{4}}$ because $F^{θ_{4}}$ is different from the other functions. The reason may be that this is the last decision point and the NN is more conservative in impulse control.

Figure 2. $F^{θ}$ for different values at time $t_{2}^{-}$ and time $t_{4}^{-}$ . The X axis is the value at the corresponding time points and the Y axis is the probabilities $F^{θ}$ for different actions. The NN at $t_{4}^{-}$ is more conservative in applying interventions compared with the representative NN at $t_{2}^{-}$ .

We also show a representative class vanishing problem for $F^{θ_{4}}$ in figure . $F^{θ_{4}}$ is a fully connected NN with a depth of I = 3 and 40 nodes in each hidden layer.Footnote³ The training process is the same with the same parameter settings, and we can see from the figure that the three classes −3, 3 and 6 are missing. We increase the number of nodes in the hidden layers to 50, but the class vanishing problem still occurs.

Figure 3. $F^{θ}$ for different values at time $t_{4}^{-}$ for 40 and 50 nodes. The X axis and Y axis are defined in the same way as figure . The top is NN with 40 nodes and the bottom is NN with 50 nodes. Both NNs encounter class vanishing problem.

Figure 3. Fθ for different values at time t4− for 40 and 50 nodes. The X axis and Y axis are defined in the same way as figure 2. The top is NN with 40 nodes and the bottom is NN with 50 nodes. Both NNs encounter class vanishing problem.

4.2. Fed's interest rate illustration

In this section, we illustrate our method with real data from the United States. We collect EFFR and Consumer Price Index (CPI) data from U. B. of Labor Statistics (Citation2023) and F.R.B. of New York (Citation2023). We define the month-over-month (MoM) inflation rate $I_{t}$ as $I_{t} = \frac{CP I_{t + 1} - CP I_{t}}{I_{t}} .$ Assume that the EFFR $(R_{1 t})$ and monthly inflation $(I_{t})$ follow stochastic processes, (10) $\begin{aligned} d R_{1, t} & = a_{1} (b_{1} - R_{1, t}) d t + σ_{1} d W_{t}^{1}, \\ d I_{t} & = (α_{0} + α_{1} R_{1, t}) d t + σ_{2} d W_{t}^{2}, \end{aligned}$ (10) when no impulse control is applied. $R_{1, t}$ follows an OU process with $a_{1}$ being the speed of reversion, and $b_{1}$ the long-term mean level. $I_{t}$ follows a stochastic process that is cointegrated with $R_{1, t}$ . $σ_{1}$ and $σ_{2}$ are constant instantaneous volatilities, and $W_{t}^{1}$ and $W_{t}^{2}$ are independent one-dimensional Brownian motions. We estimate the parameters using EFFR and CPI data from August 1, 2019 to November 2.Footnote⁴ As the Fed launched several interventions on the EFFR during this period, we remove the corresponding days during the estimation process. The estimated parameters are $a_{1} = 0.2246, b_{1} = - 0.0555, α_{0} = 0.5220, α_{1} = - 0.8213, σ_{1} = 0.2826$ , and $σ_{2} = 1.1619$ .

After determining the parameters of the stochastic processes, we start training our NN for a one-year interest rate impulse control policy for 2022. The maturity T is set to 1. The Fed holds eight regular meetings each year to determine monetary policy, and the time interval between meetings is usually one and a half months. Therefore, we set $T$ to ${0, 0.125, \dots, 0.875}$ . The Fed could only intervene on the EFFR, so the action set $C^{*}$ is set to ${- 0.25, 0, 0.25, 0.5, 0.75, 1}$ , which contains all of the actions applied by the Fed to the EFFR in 2022. We also add −0.25 to the action set to test whether the NN chooses the right direction of action. Inflation was extremely high in 2022, so we assume that the main goal of the Fed was to control the inflation rate. Because the Fed is aiming for 2% inflation over the long run, we set our MoM inflation rate target to 0.2%. The running cost function and the action cost function are defined as follows: $\begin{aligned} ϕ (r, i) & = | i - 0.2 % | \\ G (ξ) & = {\begin{cases} 0.0025 + 0.06 * ξ & if ξ > 0 \\ 0 & if ξ = 0 \\ 0.0025 - 0.06 * ξ & if ξ < 0 \end{cases}, \end{aligned}$ where the discount factor $β = 0.06$ . The depth $D_{k}$ is set to 3, and the number of hidden nodes $q_{1, k} = q_{2, k} = 10$ for $\forall k$ . We simulate 30,000 sample paths to train the NN. The training step in the pre-training phase is set to 2000 with $λ = 0.0001$ . The training step in the fine-training phase is set to 1500. After training the NN, there is one last problem to solve before comparing the output of the NN with the actual interventions of the Fed in 2022. In reality, there is a delay in the publication of the CPI, so we use the predicted inflation rate as the input. We compare our prediction results with the Fed's actions from January to September 2022, and the results are shown in table . Here, we consider two methods for determining the magnitude of the Fed's actions. First, we follow the general rule to choose the action with the highest probability (I) estimated by the NN. Second, we construct a method that is similar to the dot plot in the Fed's Summary of Economic Projections. We calculate $\sum_{m = 1}^{M} ξ^{m} F_{m}^{θ_{n}}$ for the decision at time $t_{n}$ and choose the action in $C^{*}$ that is closest to it (II).

Table 1. The Fed's actions and our NN results.

Download CSV Display Table

Table shows that our impulse control policy is more volatile than the Fed's policy. We speculate that this may be related to the Fed's intention to smooth its intervention levels and the Fed's aim on the year-on-year (YoY) inflation rate rather than the MoM inflation rate. As the YoY inflation rate is less volatile than its MoM counterpart, the induced intervention magnitude is expected to be less volatile too. Despite the intervention volatility, the overall level of interest rate intervention by the Fed is similar to our NN predictions. The predicted aggregate intervention level is useful to set stress testing target for fixed income securities. For instance, around July 26, 2022, the MoM inflation rate was predicted to be 0.2%, and our NN model suggests stopping the interest rate intervention. The Fed should have increased the EFFR by 3.25%, but it was only increased by 1.5% in reality. Therefore, the Fed still needed an increase of at least 0.75%. A similar phenomenon occurs on September 20. Therefore, our model offers an early warning signal to financial institutions for potential future interventions. This signal sets a more realistic stress test target for interest rate products.

4.3. Two-dimensional illustration

We demonstrate the application of our model to short-term and long-term interest rate interventions to monitor inflation. In other words, the central bank can apply impulse control to the EFFR and long-term interest rates. Similar to the previous section, we define the stochastic processes of the EFFR $R_{1, t}$ , the long-term interest rate $R_{2, t}$ , and the MoM inflation rate $I_{t}$ as follows: $\begin{aligned} d R_{1, t} & = a_{1} (b_{1} - R_{1, t}) d t + σ_{1} d W_{t}^{1}, \\ d R_{2, t} & = a_{2} (b_{2} - R_{2, t}) d t + σ_{3} d W_{t}^{1}, \\ d I_{t} & = (α_{0} + α_{1} R_{1, t} + α_{2} R_{2, t}) d t + σ_{2} d W_{t}^{2}, \end{aligned}$ without intervention, where $R_{1, t}$ and $I_{t}$ are the same as in the previous section. We assume that $R_{2, t}$ follows an OU process with a reversion speed $a_{2} = 0.25$ , a long-term mean level $b_{2} = 0.2$ , $σ_{3} = 0.35$ , and $α_{2} = - 0.5$ . We define the action set $C^{*}$ as $\begin{aligned} {(0.25, - 0.4), & (0.25, 0), (0.25, 0.4), \\ (0, - 0.4), & (0, 0), (0, 0.4), \\ (- 0.25, - 0.4), & (- 0.25, 0), (- 0.25, 0.4)} . \end{aligned}$ If the central bank wants to monitor the monthly inflation rate by controlling the EFFR and long-term interest rates, we use the same running cost function as in the previous section, but the action cost function is replaced by $\begin{aligned} G (ξ) | & = 0.02 \times | ξ |_{1} + 0.0025 \times I (| ξ |_{1} \neq 0) + 0.01 \\ \times | ξ |_{2} + 0.005 \times I ((| ξ |_{2} \neq 0)) . \end{aligned}$ The discount factor $β = 0.06$ .

The depth $D_{k}$ of the NN is set to 3 and the number of hidden nodes $q_{1, k} = q_{2, k} = 10$ for $\forall k$ . We simulate 30,000 paths to train the NN backwardly. After training the NN, we simulate another set of 30,000 paths to illustrate the performance of our NN. Because of the lack of a benchmark, we calculate the sample mean of the overall cost if no impulse control is applied and compare it with the overall cost if we intervene based on the NN. The cost drops from 1.299 to 0.980, indicating that our NN is useful for minimizing the cost. However, it is technically more difficult for the Fed to intervene simultaneously on short-term and long-term rates in reality. Therefore, we do not report the details here, but we mention this potential alternative use of our framework.Footnote⁵

5. Discussion and areas for future study

5.1. Relationship with deep reinforcement learning

The primary goal of Deep Reinforcement Learning (DRL) is to optimize a policy for discrete actions that maximizes the expected cumulative rewards. This objective could be made in line with that in (Equation3(3) $\begin{aligned} J_{r_{0}, i_{0}} (v) & = E [\int_{0}^{T} e^{- βt} ϕ (t, R_{t}, I_{t}) d t \\ + \sum_{i = 1}^{z} e^{- β τ_{i}} G (ξ_{i}) | R_{0} = r_{0}, I_{0} = i_{0}] . \end{aligned}$ (3) ).Footnote⁶ This offers a possibility to link our problem to the discrete-time and discrete-space reinforcement learning problem. We believe that it is a promising future direction to use DRL techniques such as deep Q-learning (Mnih et al. Citation2013) to solve the interest rate intervention problem. An advantage of our method is that we train the DNN backward to get the optimal impulse control policy for which we can find an upper bound for the errors. The DRL requires exploration such as the ϵ greedy strategy to learn the environment. Further investigation is needed to derive error bound (or regret bound) to DRL for the same problem and compare the empirical results between the two approaches. This is a potential future direction to incorporate our ensemble networks in DRL and then empirically compare for their empirical performances.

5.2. Delay effect of interest rate intervention

Although the implementation of impulse control results in an immediate intervention of the interest rate, there exists an inherent time delay in the inflation response. In the context of inflation, it is anticipated that any adjustments would manifest as a gradual shift rather than an abrupt leap. Our proposed model (Equation10(10) $\begin{aligned} d R_{1, t} & = a_{1} (b_{1} - R_{1, t}) d t + σ_{1} d W_{t}^{1}, \\ d I_{t} & = (α_{0} + α_{1} R_{1, t}) d t + σ_{2} d W_{t}^{2}, \end{aligned}$ (10) ) partially captures the intrinsic characteristics by incorporating the impact of the EFFR on the drift term of the inflation rate. Thus, it takes time for the inflation rate to revert to the intervened equilibrium level through the notion of cointegration dynamics. However, it is still interesting to examine and incorporate alternative delayed mechanism. An examples is the delayed stochastic system in Yan et al. (Citation2022a) and Yan and Wong (Citation2022) as follows. $\begin{aligned} d R_{1, t} & = a_{1} (b_{1} - R_{1, t}) d t + σ_{1} d W_{t}^{1}, \\ d I_{t} & = (α_{0} + \int_{t - Δ}^{t} α (s) R_{1, s} d s) d t + σ_{2} d W_{t}^{2}, \end{aligned}$ with a specific Δ time lag. The discrete version becomes: $\begin{aligned} d R_{1, t} & = a_{1} (b_{1} - R_{1, t}) d t + σ_{1} d W_{t}^{1}, \\ d I_{t} & = (α_{0} + \sum_{s \in [t - Δ, t]} α (s) R_{1, s}) d t + σ_{2} d W_{t}^{2} . \end{aligned}$ The DNN training is the same except for a different input for the DNN. This delayed mechanism changes the Markovian nature of the original problem and complicates the training of the DNN. We leave it a future research.

6. Conclusion

We generalize the impulse control problem and propose a novel deep-learning framework to solve it. The generalized impulse control problem is formulated with a finite time horizon, finite decision points, and a finite set of actions which is more aligned with the realistic situations compared with previous research. In addition, we consider both controllable and uncontrollable processes to replicate the real-world scenarios encountered by the central bank. Our framework can handle high-dimensional cases with cointegrated processes, which cannot be directly controlled by impulse control. We also propose a new NN architecture to solve the class vanishing problem, which occurs because of the complex nature of our task. The ensemble NN can be easily extended to other areas when the problem is a classification task with a regression-type objective function. The accuracy of our method is examined in one-dimensional cases. Our numerical results show reasonable congruence between the predictions of our NN model and the Fed's interventions on the EFFR in 2022. We suggest that our deep impulse control framework is useful for financial institutions and regulatory agency to develop stress test interest rate scenarios for risk management purposes.

Acknowledgements

We are grateful to comments by participants of 26th International Congress on Insurance: Mathematics and Economics at Heriot-Watt University, Recent Advances on Quantitative Finance 2023 at Hong Kong Polytechnic University, and The 3rd Yushan Conference at National Yang Ming Chiao Tung University.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

H.Y. Wong acknowledges the support by HKRGC under grant no. GRF-14308422.

Notes

1 Source: the monetary policy web page of the Board of Governors of the Federal Reserve System. https://www.federalreserve.gov/monetarypolicy.htm.

2 https://www.promarket.org/2023/03/16/nobel-laureate-douglas-diamond-on-how-the-fed-could-have-prevented-svbs-collapse/?amp.

3 Our NN architecture has three layers at each decision time point so that the whole NN collecting all decision points has a total of three times the number of decision points of layers. When the whole NN is trained at once, it is essentially a DNN training (Becker et al. Citation2021). In addition, our framework is an extension of the deep optimal stopping (Becker et al. Citation2019), which also uses a three-layer NN at each time point.

4 This is a distinctive period for us to examine our DNN because the last interest rate hikes to control inflation can be dated back to 1979-1982, when the interest rates were fully controlled to retained in a specified interval by the Fed. In other words, the interest rates were not sufficiently random to empirically estimate the interaction between interest rates and the inflation rate.

5 Interested readers could e-mail [email protected] for detailed result.

6 We thank an anonymous referee for pointing out the connection. As our paper aims to stimulate the use of contemporary machine learning method to help stress testing target, this suggestion would further strengthen our purpose.

References

Balduzzi, P., Bertola, G. and Foresi, S., A model of target changes and the term structure of interest rates. J. Monet. Econ., 1997, 39, 223–249.
Web of Science ®Google Scholar
Becker, S., Cheridito, P. and Jentzen, A., Deep optimal stopping. J. Mach. Learn. Res., 2019, 20, 2712–2736.
Web of Science ®Google Scholar
Becker, S., Cheridito, P., Jentzen, A. and Welti, T., Solving high-dimeniosnal optimal stopping problems using deep learning. Eur. J. Appl. Math., 2021, 32, 470–514.
Web of Science ®Google Scholar
Black, F. and Karasinski, P., Bond and option pricing when short rates are lognormal. Financ. Anal. J., 1991, 47, 52–59.
Google Scholar
Booth, G.G. and Ciner, C., The relationship between nominal interest rates and inflation: International evidence. J. Multinatl. Financ. Manag., 2001, 11, 269–280.
Google Scholar
Cadenillas, A. and Zapatero, F., Optimal central bank intervention in the foreign exchange market. J. Econ. Theory, 1999, 87, 218–242.
Web of Science ®Google Scholar
Cadenillas, A. and Zapatero, F., Classical and impulse stochastic control of the exchange rate using interest rates and reserves. Math. Finance, 2000, 10, 141–156.
Web of Science ®Google Scholar
Cadenillas, A., Lakner, P. and Pinedo, M., Optimal control of a mean-reverting inventory. Oper. Res., 2010, 58, 1697–1710.
Web of Science ®Google Scholar
Chan, K.C., Karolyi, G.A., Longstaff, F.A. and Sanders, A.B., An empirical comparison of alternative models of the short-term interest rate. J. Finance, 1992, 47, 1209–1227.
Web of Science ®Google Scholar
Chapman, D.A. and Pearson, N.D., Is the short rate drift actually nonlinear? J. Finance, 2000, 55, 355–388.
Web of Science ®Google Scholar
Constantinides, G.M. and Richard, S.F., Existence of optimal simple policies for discounted-cost inventory and cash management in continuous time. Oper. Res., 1978, 26, 620–636.
Web of Science ®Google Scholar
Cox, J.C., Ingersoll Jr, J.E. and Ross, S.A., A theory of the term structure of interest rates. In Theory of Valuation, pp. 129–164, 2005 (World Scientific).
Google Scholar
F.R.B. of New York, Effective federal funds rate [EFFR], retrieved from FRED, Federal Reserve Bank of St. Louis, 2023. Available online at: https://fred.stlouisfed.org/series/EFFR (accessed 26 February 2023).
Google Scholar
Feng, H. and Muthuraman, K., A computational method for stochastic impulse control problems. Math. Oper. Res., 2010, 35, 830–850.
Web of Science ®Google Scholar
Friedman, B.M., Monetary policy, 2000.
Google Scholar
Guttentag, J.M., Defensive and dynamic open market operations, discounting, and the federal reserve system's crisis-prevention responsibilities. J. Finance, 1969, 24, 249–262.
Web of Science ®Google Scholar
Harrison, J.M., Sellke, T.M. and Taylor, A.J., Impulse control of brownian motion. Math. Oper. Res., 1983, 8, 454–466.
Web of Science ®Google Scholar
Hull, J. and White, A., Pricing interest-rate-derivative securities. Rev. Financ. Stud., 1990, 3, 573–592.
Web of Science ®Google Scholar
Jia, B., Wang, L. and Wong, H.Y., Machine learning of surrender: Optimality and humanity. J. Risk Insur., 2023. https://doi.org/10.1111/jori.12428
Web of Science ®Google Scholar
Lo, A.W. and Singh, M., Deep-learning models for forecasting financial risk premia and their interpretations. Quant. Finance, 2023, 23, 917–929.
Web of Science ®Google Scholar
Lohmann, S., Optimal commitment in monetary policy: Credibility versus flexibility. Am. Econ. Rev., 1992, 82, 273–286.
Web of Science ®Google Scholar
Mikkilä, O. and Kanniainen, J., Empirical deep hedging. Quant. Finance, 2023, 23, 111–122.
Web of Science ®Google Scholar
Mitchell, D., Feng, H. and Muthuraman, K., Impulse control of interest rates. Oper. Res., 2014, 62, 602–615.
Web of Science ®Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.
Google Scholar
Na, A.S. and Wan, J.W., Efficient pricing and hedging of high-dimensional American options using deep recurrent networks. Quant. Finance, 2023, 23, 631–651.
Web of Science ®Google Scholar
Piazzesi, M., Bond yields and the federal reserve. J. Polit. Econ., 2005, 113, 311–344.
Web of Science ®Google Scholar
Rudebusch, G.D., Federal reserve interest rate targeting, rational expectations, and the term structure. J. Monet. Econ., 1995, 35, 245–274.
Web of Science ®Google Scholar
Sulem, A., A solvable one-dimensional model of a diffusion inventory system. Math. Oper. Res., 1986, 11, 125–133.
Web of Science ®Google Scholar
Tsang, K.H. and Wong, H.Y., Deep-learning solution to portfolio selection with serially dependent returns. SIAM J. Financ. Math., 2020, 11, 593–619.
Google Scholar
U. B. of Labor Statistics, U.S. bureau of labor statistics, consumer price index for all urban consumers: All items in U.S. city average [cpiaucsl], retrieved from FRED, Federal Reserve Bank of St. Louis, 2023. Available online at: https://fred.stlouisfed.org/series/CPIAUCSL (accessed 26 February 2023).
Google Scholar
Vasicek, O., An equilibrium characterization of the term structure. J. Financ. Econ., 1977, 5, 177–188.
Web of Science ®Google Scholar
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y. and Fergus, R., Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pp. 1058–1066, 2013 (PMLR).
Google Scholar
Yan, T. and Wong, H.Y., Equilibrium pairs-trading under delayed cointegration. Automatica, 2022, 144, 110498.
Web of Science ®Google Scholar
Yan, T., Chiu, M.C. and Wong, H.Y., Pairs trading under delayed cointegration. Quant. Finance, 2022a, 22, 1627–1648.
Web of Science ®Google Scholar
Yan, T., Park, K. and Wong, H.Y., Irreversible reinsurance: A singular control approach. Insur. Math. Econ., 2022b, 107, 326–348.
Web of Science ®Google Scholar
Yin, J. and Wong, H.Y., Deep lob trading: Half a second please!. Expert Syst. Appl., 2023, 213, 118899.
Web of Science ®Google Scholar

Deep impulse control: application to interest rate intervention

Abstract

1. Introduction

2. Problem formulation

3. Deep impulse control framework