897
Views
0
CrossRef citations to date
0
Altmetric
Research Papers

Deep impulse control: application to interest rate intervention

&
Pages 221-232 | Received 06 Sep 2023, Accepted 04 Jan 2024, Published online: 06 Feb 2024

Abstract

We propose a deep learning framework for impulse control problems involving multivariate stochastic processes, which can be controllable or uncontrollable. We use this framework to estimate central bank interventions on the (controllable) interest rate to stabilize the (uncontrollable) inflation rate, where the two rates are correlated and cointegrated. This method is useful for small banks or insurance companies with high exposure to Treasury securities to predict and stress-test their potential losses from central bank interventions. We also study the mathematical properties of the proposed framework.

1. Introduction

Monetary policy is one of the most important ways in which the government can affect the speed and direction of economic growth through central banking (Friedman Citation2000). For example, the United States Congress directs the Federal Reserve (Fed) to pursue several economic goals, including maximum employment, stable prices, and moderate long-term interest. The Fed uses monetary policy consisting of actions and communications to achieve these goals.Footnote1 Central banks have multiple ways of controlling economic activity, and recent research shows that one of the direct instruments that the central bank can use to do so is to choose between a short-term interest rate and a reserve quantity (Friedman Citation2000). In this paper, we focus on interest rate interventions, as they are closely linked to the 2023 small bank failures in the United States, such as Silicon Valley Bank (SVB). In an interview with Douglas DiamondFootnote2 , winner of the 2022 Nobel Prize for his research on bank runs, the reporter writes,

Douglas Diamond argues that the Fed's choice to signal long-term low interest rates, and then suddenly reverse course by raising interest rates in response to inflation, is a major reason for the collapse at Silicon Valley Bank….

Diamond pointed out that even in the Fed's 2022 stress tests, banks were not tested at treasury yield rates above 2%. Although SVB was not subject to a stress test, it likely would have passed under those parameters.

However, due to rapid inflation beginning in the middle of 2021, the Fed began raising interest rates quickly in 2022. Today, the effective Fed funds rate is 4.57%. This sudden reversal, which according to Diamond was not well-telegraphed by the Fed, is the reason that the market value of SVB's securities began to plummet.

This motivates us to use contemporary technology to predict interest rate interventions to control inflation and set a realistic stress test target for financial institutions.

There is a rich body of literature on interest rate dynamics in an open market, such as instantaneous interest rate models (Vasicek Citation1977, Hull and White Citation1990, Black and Karasinski Citation1991, Cox et al. Citation2005) and corresponding empirical studies (Chan et al. Citation1992, Chapman and Pearson Citation2000). In Guttentag (Citation1969), the importance of central bank intervention is highlighted in addition to the effect of market forces on interest rates. In 2022, the Fed launched seven large-scale interventions on the federal funds rate, causing substantial changes in the interest rate market after each intervention Footnote 1. Some studies explore central bank interventions on interest rates, such as interest rate targeting behavior (Rudebusch Citation1995), the Fed's target rate on interest rate dynamics (Balduzzi et al. Citation1997), and its affect on the yield curve (Piazzesi Citation2005). We specifically discuss studies of optimal interest rate interventions (Cadenillas and Zapatero Citation1999Citation2000, Feng and Muthuraman Citation2010, Mitchell et al. Citation2014). All of these studies formulate optimal interventions as impulse control problems. The optimal interventions derived are then based on tractable models and interest rate targets. Like these studies, we consider optimal impulse control. However, unlike them, our task is to develop a deep learning approach to estimate optimal interest rate interventions to control inflation under a general class of models. In this way, our consideration proposed framework can accommodate a wider range of stochastic models and a more flexible economic target, such as the inflation rate target.

In this paper, we consider the situation in which the central bank is able to intervene on several interest rates, such as short-term and long-term rates, to affect the targeted economic variable (e.g. the inflation rate), which is correlated and cointegrated with the interest rates (Booth and Ciner Citation2001). Thus, the cost function of the central bank is related to the targeted value of the economic variable and the cost of intervention. The central bank then aims to find a control policy to minimize the cost of deviation from the target level and the cost of intervention. In reality, the central bank determines the best time to announce the change in fund rates. Decisions about the timing and extent of intervention constitute the impulse control policy (Constantinides and Richard Citation1978, Sulem Citation1986). Under a Brownian filtration, there are computational methods for impulse control problems (Harrison et al. Citation1983, Feng and Muthuraman Citation2010). Some studies examine impulse control of exchange and interest rates with Ornstein-Uhlenbeck (OU) processes (Cadenillas and Zapatero Citation1999Citation2000, Mitchell et al. Citation2014). However, solutions for high-dimensional impulse control problems encounter tremendous challenge in terms of analytical tractability. The conventional approaches encounter challenges when it comes to formulating quasi-variational inequalities for high-dimensional impulse control problems, and the inherent uncontrollable nature of the inflation rate further exacerbates the complexity of the problem. As a consequence, prior studies often (if not always) assume a pre-specified target interest rate and derive analytical solutions with one-dimensional impulse control techniques. In reality, the central bank intervenes the interest rate without knowing the target interest rate but the target inflation rate. When the previous impulse control research concentrates on infinite horizon problems for mathematical tractability, the central bank may want to complete the task in a finite horizon. Machine learning techniques, especially deep learning, offer computational feasibility for high-dimensional quantitative finance problems (Tsang and Wong Citation2020, Lo and Singh Citation2023, Mikkilä and Kanniainen Citation2023, Na and Wan Citation2023, Yin and Wong Citation2023). Our deep learning framework can be applied to a more general model setup and can accommodate higher dimensional problems, and handle the finite horizon problems. And some of our numerical comparisons are made against the literature to show the accuracy of our computation.

Our major contribution to the literature is the development of a deep learning framework for impulse control problems, which is based on the deep optimal stopping framework of Becker et al. (Citation2019) and Jia et al. (Citation2023) and additional layers for learning intervention actions. When we apply our framework to real data, our approach predicts Fed funds rate hikes consistent with the Fed's real interventions in terms of aggregate increase, although our predictions are more volatile than those of the real interventions. We suggest that such a discrepancy is associated with the Fed's intention to avoid volatile adjustments, but the aggregate increase in fund rates is similar when the inflation target is reached. Therefore, our framework also contributes to the practical stress test target of interest rate products. In addition, the application of impulse control can go beyond central bank intervention problems, such as inventory control (Cadenillas et al. Citation2010) and reinsurance strategy (Yan et al. Citation2022b).

The remainder of this paper is organized as follows. Section 2 presents the problem formulation and the cost function. Section 3 introduces our machine learning framework, including the neural network (NN) architecture, the approximation method, and the training procedure. The numerical study in section 4 examines the accuracy of our proposed method and the effectiveness of preventing the class vanishing problem where the neural networks are easily to be trapped into the local optimal in a one-dimensional setup. We then perform a real case study of taming inflation by intervening on interest rates. We also demonstrate the use of our proposed framework when the central bank applies impulse control to two interest rates. Finally, section 6 concludes the paper.

2. Problem formulation

Let (Ω,F,P) denote a filtered probability space and T>0 a deterministic fixed future time. The F={Ft}0tT is the usual filtration. The Z+ denotes the set of positive integers and R+ is the set of positive real numbers.

Following (Mitchell et al. Citation2014), we postulate that the interest rate follows the Ito process. There are several differences between our model and previous research (Cadenillas et al. Citation2010, Mitchell et al. Citation2014). First, RtRd1 is a multi-dimensional rather than one-dimensional stochastic process with a general form on which impulse control can be applied. Second, we add a multi-dimensional stochastic process, ItRd2, which is cointegrated with Rt. However, impulse control cannot be applied to It. For example, when It indicates the inflation rate for which the central bank has a target range, the central bank can only intervene on the interest rate to monitor the inflation rate. Third, to mimic real-world settings, we assume that impulse control can only be applied to a set of pre-fixed time points {t1,t2,,tN}T with a fixed action set C. This is a reasonable assumption for controlling the interest rate because the central bank only changes its monetary policy at regular meetings, and interest rate changes are usually made based on a set of fixed actions rather than R. Fourth, we consider a fixed termination time T>0 rather than ∞ because the members of the Federal Open Market Committee change every year. Therefore, it is reasonable to assume that the objective function will change accordingly.

Under the physical measure, the uncontrolled stochastic process is defined as (1) dIt=μ1(Rt,It)dt+σ1(Rt,It)dWt1,dRt=μ2(Rt,It)dt+σ2(Rt,It)dWt2.(1) This general formulation can easily incorporate several commonly used stochastic processes, including the Cox–Ingersoll–Ross process (Cox et al. Citation2005) and the OU process (Vasicek Citation1977).

The impulse control applied to Rt is a sequence of immediate (upward or downward) changes in Rt by a certain nonzero amount. We define impulse control v as a sequence of pairs of (τi,ξi)i=1z, where z is the number of impulse controls applied, 0τ1<τ2<<τzT are non-decreasing stopping times belonging to T, and ξiC is the corresponding control amount for τi. Given a specific impulse control v=((τ1,ξ1),(τ2,ξ2),,(τz,ξz)), the stochastic processes in (Equation1) becomes (2) It=i0+0tμ1(Rs,IS)ds+0tσ1(Rs,Is)dWs,Rt=r0+0tμ2(Rs,Is)ds+0tσ2(Rs,Is)dWs+i=1zI{τi<t}ξi.(2) After modeling the stochastic process, we define the cost and objective functions for the impulse control problem. We assume that the central bank can quantify its preferences for (Rt,It) and its aversion to intervening. This indicates that the central bank can evaluate its preferences for (Rt,It) through a running cost function and its aversion through an action cost function. The running cost and action cost functions measure the central bank's preferences and aversion rather than the real money it pays, which is similar to the settings in Lohmann (Citation1992) and Cadenillas and Zapatero (Citation1999). We adopt the general function ϕ(t,Rt,It) to describe the running cost rate at time t with (Rt,It). The action cost of intervention is represented by the function G(ξ) for each time with an action of size ξ. In this paper, we consider a finite time impulse control problem with maturity T and a discount factor β for future costs. By combining these two costs, the total cost to the central bank of impulse control v with an initial value (r0,i0) is (3) Jr0,i0(v)=E[i=1z0Teβtϕ(t,Rt,It)dt+i=1zeβτiG(ξi)|R0=r0,I0=i0].(3) The ultimate goal of the central bank is to find an optimal policy to minimize the objective function J. Assuming that ϕ,G>0, we are interested in impulse control v such that (4) Jr0,i0(v)<.(4) The collection of all impulse controls is defined as the admissible control set A. The value function V is then given by (5) V(r0,i0)=minvAJr0,i0(v).(5) The general forms of ϕ and G are consistent with previous research on the impulse control of interest rates (Mitchell et al. Citation2014). The key assumption in our method is that the stochastic processes (Rt,It) are Markovian. In many applications, this assumption is not restrictive because Markovian processes can be realized by including past information. There are also restrictions on ϕ so that the solutions are non-trivial. Our method does not rely on these restrictions, and interested readers may refer to the following studies (Constantinides and Richard Citation1978, Feng and Muthuraman Citation2010, Mitchell et al. Citation2014).

3. Deep impulse control framework

Studies of impulse control are generally designed for one-dimensional cases, while the cointegrated process It has yet to be considered (Cadenillas et al. Citation2010, Mitchell et al. Citation2014). Traditional methods encounter difficulties for higher dimensional problems when impulse control can be applied to the multi-dimensional interest rate vector Rt, and the incorporation of vector It makes the problem even more difficult. For example, the central bank can control both short-term and long-term rates to control inflation. To handle the general impulse control problem defined in the previous section, we develop a novel NN to identify the optimal impulse control policy, which is inspired by Becker et al. (Citation2019).

3.1. Express impulse control using neural networks

For impulse control v, two decisions must be made at each time point tT: whether v is needed at time t, and its magnitude if so. We consider a larger set of actions CC{0} to combine these two decisions. The non-intervention decision is regarded as an impulse control of magnitude 0. In this approach, the number of impulse controls z corresponds to the cardinal number of T denoted N in the rest of the paper.

For n=1,2,N, the optimal impulse control at time tn should generally be based on the whole process of (I,R) from time 0 to time tn. However, only (Rtn,Itn) is sufficient to make the decision at time tn because of the Markov property of the process. Let us assume that the optimal impulse control decision at time tn is fn(Rtn,Itn) for the measurable function fn:Rd1+d2{e1,e2,eM}. ei is the one-hot vector taking a value of 1 in the ith coordinate and 0 otherwise, and M is the cardinal number of C. The optimal control policy can be represented by a sequence of fn(Rtn,Itn) for n=1,2,N. The idea is to approximate fn through an NN fθn whose input is the pair Rtn,Itn and the output is a one-hot vector. Then, we can obtain the control policy vθ{fθ1,fθ2,,fθN}.

3.2. Neural network approximation

The numerical method approximates fn with an NN fθn for n{1,2,,N} through iterative backward training. Before considering the approximation procedure, we define some notations for better illustration. Consider an auxiliary problem (6) Vn=minvAnJ(v),(6) where An is the set of all admissible controls in which ξi=0 for i=1,2,n1. We define the expected future cost condition on (Rtn,Itn) as (7) J(tn,Rtn,Itn,v)=E[tnTeβtϕ(t,Rt,It)dt+i=nNeβτiG(ξi)|Rtn,Itn].(7) Let the magnitude of the m-th intervention on R be ξm. For a fixed n{1,2,,N}, a given control in An is defined as a sequence of measurable functions fn,fn+1,,fN. The following proposition supports the iterative backward training method.

Proposition 3.1

For a specific n{0,1,,N1}, let vn+1 be an impulse control policy generated by measurable functions fn+1,,fN. Then, there exists a measurable function fn such that the impulse control policy given by fn,fn+1,,fN satisfies J(vn)V(n)m=1ME[minvAn+1J(tn,Rtn+ξm,Itn,vn+1)minvAn+1J(tn,Rtn+ξm,Itn,v)], where Vn and Vn+1 are defined as in (Equation6).

Proof.

Let ϵm=E[minvAn+1J(tn,Rtn+ξm,Itn,vn+1)minvAn+1J(tn,Rtn+ξm,Itn,v)] with vnAn as the impulse control policy. Then, we have Vn+1J(vn+1)=minvAn+1J(v)J(vn+1)=minvAn+1E[J(tn,Rtn,Itn,v)]E[J(tn,Rtn,Itn,vn+1)] where the cost from time 0 to time tn is the same and is therefore subsumed. Let Dm={J(tn,Rtn+ξm,Itn,vn+1)+eβtnG(ξm) be the minimum for ∀ξC}Em={ξn=ξm}. and fn(ID1,ID2,,IDM), with I being the indicator function. Then VnJ(vn)=minvAnE[J(tn,Rtn,Itn,v)]E[J(tn,Rtn,Itn,vn)]=minvAnE[J(tn,Rtn,Itn,v)]E[m=1M(J(tn,Rtn+ξm,Itn,vn+1)+eβtnG(ξm))IDm]minvAnE[J(tn,Rtn,Itn,v)]E[m=1M(J(tn,Rtn+ξm,Itn,vn+1)+eβtnG(ξm))IEm]=minvAnE[J(tn,Rtn,Itn,v)]m=1ME[(J(tn,Rtn+ξm,Itn,vn+1)+eβtnG(ξm))IEm]minvAnE[J(tn,Rtn,Itn,v)]m=1M(E[(minvAn+1(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))IEm]+ϵm). As Em is arbitrary, it is also satisfied in the optimal case: VnJ(vn)m=1Mϵm.

Proposition 3.2

Let n{1,2,,N} and impulse control vA. For any depth of the NN with D2 and ϵ>0, there exist positive integers q1,q2,,qD1 such that infθRqE[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))fmθ(Rtn,Itn)]inffDE[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))fm(Rtn,Itn)m=1M]+ϵ, where D is the set of all measurable one-hot functions.

Proof.

For a fixed ϵ>0, the integrability condition ensures that there exists a measurable function f~:Rd1+d2{e1,,eM} such that E[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))f~m(Rtn,Itn)]inffDE[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))fm(Rtn,Itn)m=1M]+ϵ3, and there exists a Borel set Am such that f~m=1Am. As the output of f~ is a one-hot function, Am is disjoint and m=1MAm=Rd1+d2. Based on the integrability condition and (Equation4), BE[(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))1Am(Rtn,Itn)], which defines a finite Borel measure on Rd1+d2. Let ρK:Rd1+d2[0,] such that ρK(x)=infyKxy2. Then, ka,m(x)=max{1aρKm(x),1},aN defines a sequence of continuous functions ka,m:Rd[1,1] that converge pointwisely to 1Km1KmC for each m. By the dominated convergence theorem, there exists amN such that E[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))I{kam,m(Rtn,Itn)0}]E[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))IKm(Rtn,Itn)m=1M]+ϵ3. We require am to satisfy at least one kam,m(Rtn,Itn)0 for (Rtn,Itn). This can be done because the union of Am is Rd1+d2. By Leshno's theorem, ka,m can be approximated uniformly on compact sets by functions of the form (8) i=1r(vix+ci)+i=1s(ωix+di)+.(8) Hence, there exists a function hm:Rd1+d2R in (Equation8) such that E[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))I{hm(Rtn,Itn)0}]E[m=1M(J(tn,Rtn+ξm,Itn,v)+eβtnG(ξm))I{kam,m(Rtn,Itn)0}m=1M]+ϵ3. hm can be expressed as an NN. We can then combine multiple hm functions to form a larger prediction network.

Based on the above two propositions, we can construct an NN to approximate the optimal policy. We introduce the NN architecture in the next section.

3.3. Neural network architecture

Based on Proposition 3.1, it is theoretically possible to find a fully connected deep neural network (DNN) fθn to approximate fn. However, the output of fθn is {e1,e2,eM}, to which gradient descent cannot be applied. Therefore, we include an NN Fθn as a transition step for optimization purposes. Fθn is continuous and almost everywhere differentiable, and our objective is to minimize the following function: E[m=1M(J(tn,Rtn+ξm,Itn,ξn+1)+eβtnG(ξm))Fmθ(Rtn,Itn)]. After determining Fθn, the function fθn is defined as fθn=pmax(Fθn) where pmax:(0,1)M{e1,e2,eM} is the function used to find the position of the maximum value of the input. The softmax and pmax functions restrict the output of the NN to {e1,e2,,eM}, resulting in a smaller value than the indicator functions of Proposition 3.2. When there is more than one maximum value in Fθn, we assume that pmax finds the position of the first maximum value by convention.

During the training process, we find that the general fully connected NN has a high probability of falling into the local minimums. Furthermore, the iterative backward training method worsens the situation because the accuracy of Fθn depends on an accurate estimate of fθn+1,,fθN.

This phenomenon is due to the complex nature of our task. Our problem is neither a traditional regression problem nor a traditional classification problem. It differs from classical regression problems because the outputs of the NN are one-hot vectors. It also differs from classification problems because the objective function is not related to the misclassification error and there is no ground truth available for the training process. Our task is a classification task with a regression-type objective function, and this type of combination results in a high probability of being trapped in local minimums during gradient descent. To the best of our knowledge, there is no research describing this phenomenon, which we refer to as ‘class vanishing’ in the rest of the paper. When class vanishing occurs, the output of the DNN will be a proper subset of {e1,e2,eM}, which means that the DNN solves a simpler problem with a smaller set of actions than the original set. We further illustrate the class vanishing problem in the numerical study section for the one-dimensional case, as a benchmark is available for this case.

To solve the class vanishing problem, we propose a new NN architecture that proves useful in the one-dimensional case. Because of the lack of benchmarks for high-dimensional cases, we leave it to future research to test the effectiveness of this NN architecture in high-dimensional situations.

Fθn takes the following form: (9) Fθn=ψ(F1θn,F2θn,,FMθn)Fθn,1=aI1,1θnφqD11,1aD11,1θnφq1,1a1,1θnFθn,M=aDM,MθnφqDM1,MaDM1,Mθnφq1,Ma1,Mθn,(9) where Dk is the depth of the NN Fθn,k, q1,k,q2,k,qDk1,k are positive integers indicating the number of nodes in the hidden layer of Fθn,k. ψ:RM(0,1)M is the standard softmax function defined as ψ(x)i=exii=1Mexifori=1,2,MandxRM. a1,kθn:Rd1+d2Rq1,k,,aDk1,kθn:RqDk2,kRqDk1,k and aDk,kθn:RqDk1,kR are affine functions. φa:RaRa are ReLU activation functions where φ(x1,x2,xa)=(x1+,x2+,,xa+). The NN Fθn consist of M different NNs, and each NN is only responsible for predicting one impulse control choice in C. The outputs of small NNs are combined through a softmax function, which is commonly used for multi-class classification. An overview of the NN architecture is shown in figure . The idea here is very similar to ensemble learning techniques such as xgboost or random forest. However, our aim is to prevent the occurrence of the class vanishing problem rather than to make the bias–variance tradeoff. We refer to this type of NN as the ensemble NN in the rest of the paper. The ensemble NN can be regarded as a kind of regularization method similar to the DropConnect method proposed in Wan et al. (Citation2013). The difference is that DropConnect randomly sets the weights to 0, whereas we do it deterministically.

Figure 1. Neural network architecture of ensemble network. Each small NN will predict one action and the results are normalized by the softmax function.

Figure 1. Neural network architecture of ensemble network. Each small NN will predict one action and the results are normalized by the softmax function.

The training of Fθn depends on the sample of (Rtn,Itn), and Fθn may not minimize ϵm for m in Proposition 3.1. We can train different Fmθn functions for different distributions of (Rtn,Itn) to minimize ϵm. In our numerical study below, one Fθn function is enough to obtain a satisfactory control policy.

3.4. Parameter optimization

We determine the depth Dk and the number of nodes in the hidden layer q1,k,qDk1,k for all k and train the NN described in section 3.3. To obtain the parameter θn numerically, we simulate H paths of (Rt,It) and minimize the sample average, which can be regarded as an approximation of the expectation of the cost function.

Consider a given n{1,,N}, we assume that the parameters θn+1,,θN are determined and generate an impulse control policy vn+1. We denote the cost calculated by the hth path from time tn to time T by lnh, which is defined as lnh=tnTeβtϕ(t,rth,ith)dt+i=nNeβτiG(ξi). lnh is estimated in the discrete case by lnh=i=nN[eβti+1ϕ(ti+1,rti+1h,iti+1h)(ti+1ti)+eβtiG(ξi)], where tN+1T. In this paper, the integral is measured by ϕ at the end time point of each interval. We choose the end time point here because it is the most difficult to predict. The input for fθn is (Rtn,Itn) and the resulting process values are (Rtn+ξn,Itn), which contain no randomness. It is easier for the NN to find the optimal control policy if we include (Rtn+ξn,Itn) in the cost calculation.

Suppose that we can apply impulse control according to Fθn at time tn and make a decision according to fθn+1,,fθN; then, the cost for the hth simulated path is given by Γnh(θn)=m=1M[(eβtn+1ϕ(tn+1,rtn+1h,m,itn+1h,m)+eβtnG(ξm)+ln+1h,m)Fmθn(rtnh,itnh)], where (rh,m,ih,m) is the simulated path with ξm applied at time tn and ln+1h,m is the corresponding cost from time tn+1 to time T.

For a large H, the mean of Γnh, 1Hh=1HΓnh(θn), approximates the expectation E[m=1M((J(tn,Rtn+ξm,Itn,vn+1)+eβtnG(ξm))×Fmθ(Rtn,Itn)m=1M], so that it is used to find θn through gradient descent.

We divide the training process into two phases, the pre-training phase and the fine-training phase, to prevent class vanishing. In the pre-training phase, the objective function is 1Hh=1H[Γnh(θn)λm=1Mlog(Fmθn(rtnh,itnh)+ε)]. The first part of this equation is the expectation that we want to minimize and the second part is the penalty term for the occurrence of a small probability in Fθn. λ is the pre-defined parameter to balance these two objectives and ε is a small number to prevent log explosion during the training process. By adding the penalty term, we can avoid the class vanishing problem but may introduce an additional bias into our predictions. After the pre-training phase, we remove the penalty term and continue to train the model in the fine-training phase to remove this additional bias.

4. Numerical study

In this section, we examine our method in three scenarios. First, we do not consider the uncontrollable stochastic process I and test our method in the one-dimensional case. Because impulse control in the one-dimensional case is well studied, we can use the results of Cadenillas et al. (Citation2010) as a benchmark to examine our method. Second, we illustrate our method using data on real inflation and effective federal funds rates (EFFR) in the United States. We compare our results with the Fed's decision on the EFFR in 2022. Next, we examine the effect of our model with a multi-dimensional R and a one-dimensional I through a simulation study.

4.1. One-dimensional benchmark

The one-dimensional stochastic impulse control problem is studied in Cadenillas et al. (Citation2010), and we follow these settings for our simulation study. Assume that Rt follows an OU process given by dRt=a(bRt)dt+σdWt when there is no impulse control. a = 0.2 is the speed of reversion, b = 2 represents the long-term mean level, σ=1.2 is the instantaneous volatility of the model, and Wt is the one-dimensional Brownian motion. The discount rate β is set to 0.06.

There are several differences between (Cadenillas et al. Citation2010) and our study, leading to a deviation between our optimal policy and theirs. In Cadenillas et al. (Citation2010), T is set to ∞ and T=R0 are non-negative real numbers, and C=R. For our numerical study, T is set to 5, T={0,1,2,,4}, and C={6,3,0,3,6}. We use the same running cost function and action cost function defined as ϕ(r)=(r2)2G(ξ)={5+2×ξifξ>05ifξ=052×ξifξ<0 in Cadenillas et al. (Citation2010), respectively. The cost should be {17,11,5,11,17} for C based on the two cost functions above. However, 0 in C represents the choice of no impulse control rather than an impulse control of magnitude 0. Therefore, we replace 5 with 0, and the cost is {17,11,0,11,17} in our paper.

We train backwardly from fθ4 with the NN architecture defined in section 3.3. The depth Dk=3, and q1,k=q2,k=8 for k. The Rt process in (4.1) can be easily simulated, and the initial value is sampled from a uniform distribution U(11,15) to allow the NN to learn different scenarios. For each Fθn, we simulate 8,192 sample paths for the training process, and we conduct 1,500 training steps in the pre-training phase with λ=0.3 and 3,000 training steps in the fine-training phase.

Fθ4 and Fθ2 are shown in figure  with different rt4 and rt2 values. Fθ0,,Fθ2 are all similar, and we show Fθ2 here. We also show Fθ4 because Fθ4 is different from the other functions. The reason may be that this is the last decision point and the NN is more conservative in impulse control.

Figure 2. Fθ for different values at time t2 and time t4. The X axis is the value at the corresponding time points and the Y axis is the probabilities Fθ for different actions. The NN at t4 is more conservative in applying interventions compared with the representative NN at t2.

Figure 2. Fθ for different values at time t2− and time t4−. The X axis is the value at the corresponding time points and the Y axis is the probabilities Fθ for different actions. The NN at t4− is more conservative in applying interventions compared with the representative NN at t2−.

We also show a representative class vanishing problem for Fθ4 in figure . Fθ4 is a fully connected NN with a depth of I = 3 and 40 nodes in each hidden layer.Footnote3 The training process is the same with the same parameter settings, and we can see from the figure  that the three classes −3, 3 and 6 are missing. We increase the number of nodes in the hidden layers to 50, but the class vanishing problem still occurs.

Figure 3. Fθ for different values at time t4 for 40 and 50 nodes. The X axis and Y axis are defined in the same way as figure . The top is NN with 40 nodes and the bottom is NN with 50 nodes. Both NNs encounter class vanishing problem.

Figure 3. Fθ for different values at time t4− for 40 and 50 nodes. The X axis and Y axis are defined in the same way as figure 2. The top is NN with 40 nodes and the bottom is NN with 50 nodes. Both NNs encounter class vanishing problem.

4.2. Fed's interest rate illustration

In this section, we illustrate our method with real data from the United States. We collect EFFR and Consumer Price Index (CPI) data from U. B. of Labor Statistics (Citation2023) and F.R.B. of New York (Citation2023). We define the month-over-month (MoM) inflation rate It as It=CPIt+1CPItIt. Assume that the EFFR (R1t) and monthly inflation (It) follow stochastic processes, (10) dR1,t=a1(b1R1,t)dt+σ1dWt1,dIt=(α0+α1R1,t)dt+σ2dWt2,(10) when no impulse control is applied. R1,t follows an OU process with a1 being the speed of reversion, and b1 the long-term mean level. It follows a stochastic process that is cointegrated with R1,t. σ1 and σ2 are constant instantaneous volatilities, and Wt1 and Wt2 are independent one-dimensional Brownian motions. We estimate the parameters using EFFR and CPI data from August 1, 2019 to November 2.Footnote4 As the Fed launched several interventions on the EFFR during this period, we remove the corresponding days during the estimation process. The estimated parameters are a1=0.2246,b1=0.0555,α0=0.5220,α1=0.8213,σ1=0.2826, and σ2=1.1619.

After determining the parameters of the stochastic processes, we start training our NN for a one-year interest rate impulse control policy for 2022. The maturity T is set to 1. The Fed holds eight regular meetings each year to determine monetary policy, and the time interval between meetings is usually one and a half months. Therefore, we set T to {0,0.125,,0.875}. The Fed could only intervene on the EFFR, so the action set C is set to {0.25,0,0.25,0.5,0.75,1}, which contains all of the actions applied by the Fed to the EFFR in 2022. We also add −0.25 to the action set to test whether the NN chooses the right direction of action. Inflation was extremely high in 2022, so we assume that the main goal of the Fed was to control the inflation rate. Because the Fed is aiming for 2% inflation over the long run, we set our MoM inflation rate target to 0.2%. The running cost function and the action cost function are defined as follows: ϕ(r,i)=|i0.2%|G(ξ)={0.0025+0.06ξifξ>00ifξ=00.00250.06ξifξ<0, where the discount factor β=0.06. The depth Dk is set to 3, and the number of hidden nodes q1,k=q2,k=10 for k. We simulate 30,000 sample paths to train the NN. The training step in the pre-training phase is set to 2000 with λ=0.0001. The training step in the fine-training phase is set to 1500. After training the NN, there is one last problem to solve before comparing the output of the NN with the actual interventions of the Fed in 2022. In reality, there is a delay in the publication of the CPI, so we use the predicted inflation rate as the input. We compare our prediction results with the Fed's actions from January to September 2022, and the results are shown in table . Here, we consider two methods for determining the magnitude of the Fed's actions. First, we follow the general rule to choose the action with the highest probability (I) estimated by the NN. Second, we construct a method that is similar to the dot plot in the Fed's Summary of Economic Projections. We calculate m=1MξmFmθn for the decision at time tn and choose the action in C that is closest to it (II).

Table 1. The Fed's actions and our NN results.

Table  shows that our impulse control policy is more volatile than the Fed's policy. We speculate that this may be related to the Fed's intention to smooth its intervention levels and the Fed's aim on the year-on-year (YoY) inflation rate rather than the MoM inflation rate. As the YoY inflation rate is less volatile than its MoM counterpart, the induced intervention magnitude is expected to be less volatile too. Despite the intervention volatility, the overall level of interest rate intervention by the Fed is similar to our NN predictions. The predicted aggregate intervention level is useful to set stress testing target for fixed income securities. For instance, around July 26, 2022, the MoM inflation rate was predicted to be 0.2%, and our NN model suggests stopping the interest rate intervention. The Fed should have increased the EFFR by 3.25%, but it was only increased by 1.5% in reality. Therefore, the Fed still needed an increase of at least 0.75%. A similar phenomenon occurs on September 20. Therefore, our model offers an early warning signal to financial institutions for potential future interventions. This signal sets a more realistic stress test target for interest rate products.

4.3. Two-dimensional illustration

We demonstrate the application of our model to short-term and long-term interest rate interventions to monitor inflation. In other words, the central bank can apply impulse control to the EFFR and long-term interest rates. Similar to the previous section, we define the stochastic processes of the EFFR R1,t, the long-term interest rate R2,t, and the MoM inflation rate It as follows: dR1,t=a1(b1R1,t)dt+σ1dWt1,dR2,t=a2(b2R2,t)dt+σ3dWt1,dIt=(α0+α1R1,t+α2R2,t)dt+σ2dWt2, without intervention, where R1,t and It are the same as in the previous section. We assume that R2,t follows an OU process with a reversion speed a2=0.25, a long-term mean level b2=0.2, σ3=0.35, and α2=0.5. We define the action set C as {(0.25,0.4),(0.25,0),(0.25,0.4),(0,0.4),(0,0),(0,0.4),(0.25,0.4),(0.25,0),(0.25,0.4)}. If the central bank wants to monitor the monthly inflation rate by controlling the EFFR and long-term interest rates, we use the same running cost function as in the previous section, but the action cost function is replaced by G(ξ)|=0.02×|ξ|1+0.0025×I(|ξ|10)+0.01×|ξ|2+0.005×I((|ξ|20)). The discount factor β=0.06.

The depth Dk of the NN is set to 3 and the number of hidden nodes q1,k=q2,k=10 for k. We simulate 30,000 paths to train the NN backwardly. After training the NN, we simulate another set of 30,000 paths to illustrate the performance of our NN. Because of the lack of a benchmark, we calculate the sample mean of the overall cost if no impulse control is applied and compare it with the overall cost if we intervene based on the NN. The cost drops from 1.299 to 0.980, indicating that our NN is useful for minimizing the cost. However, it is technically more difficult for the Fed to intervene simultaneously on short-term and long-term rates in reality. Therefore, we do not report the details here, but we mention this potential alternative use of our framework.Footnote5

5. Discussion and areas for future study

5.1. Relationship with deep reinforcement learning

The primary goal of Deep Reinforcement Learning (DRL) is to optimize a policy for discrete actions that maximizes the expected cumulative rewards. This objective could be made in line with that in (Equation3).Footnote6 This offers a possibility to link our problem to the discrete-time and discrete-space reinforcement learning problem. We believe that it is a promising future direction to use DRL techniques such as deep Q-learning (Mnih et al. Citation2013) to solve the interest rate intervention problem. An advantage of our method is that we train the DNN backward to get the optimal impulse control policy for which we can find an upper bound for the errors. The DRL requires exploration such as the ϵ greedy strategy to learn the environment. Further investigation is needed to derive error bound (or regret bound) to DRL for the same problem and compare the empirical results between the two approaches. This is a potential future direction to incorporate our ensemble networks in DRL and then empirically compare for their empirical performances.

5.2. Delay effect of interest rate intervention

Although the implementation of impulse control results in an immediate intervention of the interest rate, there exists an inherent time delay in the inflation response. In the context of inflation, it is anticipated that any adjustments would manifest as a gradual shift rather than an abrupt leap. Our proposed model (Equation10) partially captures the intrinsic characteristics by incorporating the impact of the EFFR on the drift term of the inflation rate. Thus, it takes time for the inflation rate to revert to the intervened equilibrium level through the notion of cointegration dynamics. However, it is still interesting to examine and incorporate alternative delayed mechanism. An examples is the delayed stochastic system in Yan et al. (Citation2022a) and Yan and Wong (Citation2022) as follows. dR1,t=a1(b1R1,t)dt+σ1dWt1,dIt=(α0+tΔtα(s)R1,sds)dt+σ2dWt2, with a specific Δ time lag. The discrete version becomes: dR1,t=a1(b1R1,t)dt+σ1dWt1,dIt=(α0+s[tΔ,t]α(s)R1,s)dt+σ2dWt2. The DNN training is the same except for a different input for the DNN. This delayed mechanism changes the Markovian nature of the original problem and complicates the training of the DNN. We leave it a future research.

6. Conclusion

We generalize the impulse control problem and propose a novel deep-learning framework to solve it. The generalized impulse control problem is formulated with a finite time horizon, finite decision points, and a finite set of actions which is more aligned with the realistic situations compared with previous research. In addition, we consider both controllable and uncontrollable processes to replicate the real-world scenarios encountered by the central bank. Our framework can handle high-dimensional cases with cointegrated processes, which cannot be directly controlled by impulse control. We also propose a new NN architecture to solve the class vanishing problem, which occurs because of the complex nature of our task. The ensemble NN can be easily extended to other areas when the problem is a classification task with a regression-type objective function. The accuracy of our method is examined in one-dimensional cases. Our numerical results show reasonable congruence between the predictions of our NN model and the Fed's interventions on the EFFR in 2022. We suggest that our deep impulse control framework is useful for financial institutions and regulatory agency to develop stress test interest rate scenarios for risk management purposes.

Acknowledgements

We are grateful to comments by participants of 26th International Congress on Insurance: Mathematics and Economics at Heriot-Watt University, Recent Advances on Quantitative Finance 2023 at Hong Kong Polytechnic University, and The 3rd Yushan Conference at National Yang Ming Chiao Tung University.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

H.Y. Wong acknowledges the support by HKRGC under grant no. GRF-14308422.

Notes

1 Source: the monetary policy web page of the Board of Governors of the Federal Reserve System. https://www.federalreserve.gov/monetarypolicy.htm.

3 Our NN architecture has three layers at each decision time point so that the whole NN collecting all decision points has a total of three times the number of decision points of layers. When the whole NN is trained at once, it is essentially a DNN training (Becker et al. Citation2021). In addition, our framework is an extension of the deep optimal stopping (Becker et al. Citation2019), which also uses a three-layer NN at each time point.

4 This is a distinctive period for us to examine our DNN because the last interest rate hikes to control inflation can be dated back to 1979-1982, when the interest rates were fully controlled to retained in a specified interval by the Fed. In other words, the interest rates were not sufficiently random to empirically estimate the interaction between interest rates and the inflation rate.

5 Interested readers could e-mail [email protected] for detailed result.

6 We thank an anonymous referee for pointing out the connection. As our paper aims to stimulate the use of contemporary machine learning method to help stress testing target, this suggestion would further strengthen our purpose.

References

  • Balduzzi, P., Bertola, G. and Foresi, S., A model of target changes and the term structure of interest rates. J. Monet. Econ., 1997, 39, 223–249.
  • Becker, S., Cheridito, P. and Jentzen, A., Deep optimal stopping. J. Mach. Learn. Res., 2019, 20, 2712–2736.
  • Becker, S., Cheridito, P., Jentzen, A. and Welti, T., Solving high-dimeniosnal optimal stopping problems using deep learning. Eur. J. Appl. Math., 2021, 32, 470–514.
  • Black, F. and Karasinski, P., Bond and option pricing when short rates are lognormal. Financ. Anal. J., 1991, 47, 52–59.
  • Booth, G.G. and Ciner, C., The relationship between nominal interest rates and inflation: International evidence. J. Multinatl. Financ. Manag., 2001, 11, 269–280.
  • Cadenillas, A. and Zapatero, F., Optimal central bank intervention in the foreign exchange market. J. Econ. Theory, 1999, 87, 218–242.
  • Cadenillas, A. and Zapatero, F., Classical and impulse stochastic control of the exchange rate using interest rates and reserves. Math. Finance, 2000, 10, 141–156.
  • Cadenillas, A., Lakner, P. and Pinedo, M., Optimal control of a mean-reverting inventory. Oper. Res., 2010, 58, 1697–1710.
  • Chan, K.C., Karolyi, G.A., Longstaff, F.A. and Sanders, A.B., An empirical comparison of alternative models of the short-term interest rate. J. Finance, 1992, 47, 1209–1227.
  • Chapman, D.A. and Pearson, N.D., Is the short rate drift actually nonlinear? J. Finance, 2000, 55, 355–388.
  • Constantinides, G.M. and Richard, S.F., Existence of optimal simple policies for discounted-cost inventory and cash management in continuous time. Oper. Res., 1978, 26, 620–636.
  • Cox, J.C., Ingersoll Jr, J.E. and Ross, S.A., A theory of the term structure of interest rates. In Theory of Valuation, pp. 129–164, 2005 (World Scientific).
  • F.R.B. of New York, Effective federal funds rate [EFFR], retrieved from FRED, Federal Reserve Bank of St. Louis, 2023. Available online at: https://fred.stlouisfed.org/series/EFFR (accessed 26 February 2023).
  • Feng, H. and Muthuraman, K., A computational method for stochastic impulse control problems. Math. Oper. Res., 2010, 35, 830–850.
  • Friedman, B.M., Monetary policy, 2000.
  • Guttentag, J.M., Defensive and dynamic open market operations, discounting, and the federal reserve system's crisis-prevention responsibilities. J. Finance, 1969, 24, 249–262.
  • Harrison, J.M., Sellke, T.M. and Taylor, A.J., Impulse control of brownian motion. Math. Oper. Res., 1983, 8, 454–466.
  • Hull, J. and White, A., Pricing interest-rate-derivative securities. Rev. Financ. Stud., 1990, 3, 573–592.
  • Jia, B., Wang, L. and Wong, H.Y., Machine learning of surrender: Optimality and humanity. J. Risk Insur., 2023. https://doi.org/10.1111/jori.12428
  • Lo, A.W. and Singh, M., Deep-learning models for forecasting financial risk premia and their interpretations. Quant. Finance, 2023, 23, 917–929.
  • Lohmann, S., Optimal commitment in monetary policy: Credibility versus flexibility. Am. Econ. Rev., 1992, 82, 273–286.
  • Mikkilä, O. and Kanniainen, J., Empirical deep hedging. Quant. Finance, 2023, 23, 111–122.
  • Mitchell, D., Feng, H. and Muthuraman, K., Impulse control of interest rates. Oper. Res., 2014, 62, 602–615.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.
  • Na, A.S. and Wan, J.W., Efficient pricing and hedging of high-dimensional American options using deep recurrent networks. Quant. Finance, 2023, 23, 631–651.
  • Piazzesi, M., Bond yields and the federal reserve. J. Polit. Econ., 2005, 113, 311–344.
  • Rudebusch, G.D., Federal reserve interest rate targeting, rational expectations, and the term structure. J. Monet. Econ., 1995, 35, 245–274.
  • Sulem, A., A solvable one-dimensional model of a diffusion inventory system. Math. Oper. Res., 1986, 11, 125–133.
  • Tsang, K.H. and Wong, H.Y., Deep-learning solution to portfolio selection with serially dependent returns. SIAM J. Financ. Math., 2020, 11, 593–619.
  • U. B. of Labor Statistics, U.S. bureau of labor statistics, consumer price index for all urban consumers: All items in U.S. city average [cpiaucsl], retrieved from FRED, Federal Reserve Bank of St. Louis, 2023. Available online at: https://fred.stlouisfed.org/series/CPIAUCSL (accessed 26 February 2023).
  • Vasicek, O., An equilibrium characterization of the term structure. J. Financ. Econ., 1977, 5, 177–188.
  • Wan, L., Zeiler, M., Zhang, S., Le Cun, Y. and Fergus, R., Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pp. 1058–1066, 2013 (PMLR).
  • Yan, T. and Wong, H.Y., Equilibrium pairs-trading under delayed cointegration. Automatica, 2022, 144, 110498.
  • Yan, T., Chiu, M.C. and Wong, H.Y., Pairs trading under delayed cointegration. Quant. Finance, 2022a, 22, 1627–1648.
  • Yan, T., Park, K. and Wong, H.Y., Irreversible reinsurance: A singular control approach. Insur. Math. Econ., 2022b, 107, 326–348.
  • Yin, J. and Wong, H.Y., Deep lob trading: Half a second please!. Expert Syst. Appl., 2023, 213, 118899.