Full article: Control charts for high-dimensional time series with estimated in-control parameters

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this article, we study the effect of misspecification caused by fitting the target process in the Phase I analysis of the monitoring procedure on the behavior of several types of multivariate exponentially weighted moving average (MEWMA) control charts in the high-dimensional setting. In particular, the classical MEWMA control charts, whose control statistics are based on the exact and asymptotic Mahalanobis distance, are considered together with the novel approaches where the Euclidean distance and the diagonalized Euclidean distance are employed in the construction of control statistics. The high-dimensional distributions of the control statistics are deduced at each time. These results are later used to assess the performance of the considered control charts under misspecification. Both theoretical and empirical findings lead to the conclusion that the control charts based on the Euclidean distance and the diagonalized Euclidean distance are robust to misspecification for moderate dimensions of the data-generating model, whereas they tend to overestimate the in-control average run lengths (ARLs) in the case of larger dimensions. On the other hand, the control schemes based on the Mahalanobis distance are considerably affected by the estimation of the parameters of the target process, and their application results in drastically smaller values of the ARLs, especially when the dimension of the data-generating model is large.

Keywords:

1. INTRODUCTION

Statistical process control (SPC) plays a special role in the monitoring of production processes. The methods of SPC are also widely used in other fields of science, like in engineering, economics, medicine, chemistry, biology, and finance (see, e.g., Frisén Citation1992; Schipper and Schmid Citation2001; Sonesson and Bock Citation2003; Andersson, Bock, and Frisén Citation2004; Schmid and Tzotchev Citation2004; Lawson and Kleinman Citation2005; O. Bodnar Citation2007, Citation2009b; Messaoud, Weihs, and Hering Citation2008; Golosnoy et al. Citation2011; O. Bodnar and Schmid Citation2017).

In the setup of the monitoring procedure, the relationship between the observed and target process should be specified. Let ${Y_{t}}$ denote the p-dimensional target process and let ${X_{t}}$ be the p-dimensional observed process. Under the target process, we consider a process that fulfills quality requirements, whereas the observed process is the actual process. In the following, the relationship between the target and observed processes is described by the change point model expressed as (1.1) $X_{t} = {\begin{array}{l} Y_{t} & for & t < τ \\ Y_{t} + a & for & t \geq τ \end{array}, t \in Z,$ (1.1) where $a \neq 0$ and $τ \in N \cup {\infty} .$ If $τ = \infty,$ then the observed process is called an in-control process. Otherwise, it is called out-of-control process. The symbols $E_{\infty} (.), V a r_{\infty} (.),$ and $C o v_{\infty} (.)$ will denote the mean, variance, and covariance matrix, respectively, computed under the assumption of the in-control state. We assume that the target process ${Y_{t}}$ is a weakly stationary process with mean vector $μ$ and autocovariance matrix at lag h, denoted by $Γ (h) .$

Control charts present the mostly spread tool of SPC (see Montgomery Citation2020). While the first control charts were designed for detecting changes in the location behavior based on univariate independent observations (cf. Shewhart Citation1926; Page Citation1954; Roberts Citation1959), they were extended to time series by Alwan and Roberts (Citation1988), Schmid (Citation1995, Citation1997a), Schmid and Schöne (Citation1997), and Knoth and Schmid (Citation2002), among others. Another line of research led to the surveillance of the parameters in multivariate models.

The first multivariate control chart was proposed in Hotelling (Citation1947), who introduced a control scheme based on the Mahalanobis distance to monitor the mean vector of the independent observations coming from the multivariate normal distribution. This approach was later extended by Crosier (Citation1988), Pignatiello and Runger (Citation1990), Lowry et al. (Citation1992), and Ngai and Zhang (Citation2001), who proposed several multivariate control charts based on the multivariate exponentially weighted moving average (MEWMA) recursion and the cumulative sum approach. Multivariate control charts for monitoring the parameters of multivariate time series have recently become a hot topic of research. Control charts of the mean behavior were discussed in Theodossiou (Citation1993), Kramer and Schmid (Citation1997), O. Bodnar and Schmid (Citation2007, Citation2011), and O. Bodnar (Citation2009a), and control charts for monitoring the covariance matrix were introduced in Śliwa and Schmid (Citation2005) and O. Bodnar and Schmid (Citation2017).

Due to the rapid development of computer technology, monitoring the parameters of complex high-dimensional processes has become possible and has attracted many researchers to this challenging field of science. In the high-dimensional setting, it is assumed that the dimensions of the data-generating model grow at the same rate as the sample size when the latter tends to infinity (see, e.g., Bai and Silverstein Citation2010; T. Bodnar, Dette, and Parolya Citation2019). K. Wang and Jiang (Citation2009) considered a variable selection–based multivariate SPC procedure under the high-dimensional setting, and a high-dimensional control chart for profile monitoring was suggested by Chen and Nembhard (Citation2011). The control scheme is based on the adaptive Neyman test statistic for the coefficients of the discrete Fourier transform of profiles. Li et al. (Citation2014) suggested a new control chart that starts monitoring with the second observation regardless of the dimensionality and reduces the average run length (ARL) in detecting early shifts in high-dimensionality measurements. Z. Wang, Li, and Zhou (Citation2017) constructed a hybrid control chart in the case of independent multivariate Poisson data, and R. Bodnar, Bodnar, and Schmid (Citation2023) introduced several MEWMA-type control charts for high-dimensional time series where the Euclidean distance and the diagonalized Euclidean distance are employed in the construction of the control statistics instead of the Mahalanobis distance.

All of the abovementioned control charts were designed under the assumption that the parameters of the target process ${Y_{t}}$ are known. However, this assumption appears to be very restrictive in many practical situations (see, e.g., Kramer and Schmid Citation2000; Albers and Kallenberg Citation2004; Jensen et al. Citation2006; Saleh et al. Citation2015; Jardim, Chakraborti, and Epprecht Citation2020; Sarmiento et al. Citation2022). In practice, the target process should be fitted in the Phase I analysis of the monitoring procedure, whereas the control statistics for the Phase II are constructed by replacing the unknown true parameters with their corresponding estimators. This approach leads to the misspecified control charts, and the effect of the misspecification has to be studied before the monitoring scheme is applied in practice. In this article, we contribute to the literature by developing new theoretical results that allow assessment of the effect of the misspecification in the high-dimensional setting. In particular, we show that the MEWMA control charts based on the Euclidean distance and the diagonalized Euclidean distance are quite robust to the misspecification caused by the estimation of the parameters of the target process in Phase I, whereas the control schemes based on the Mahalanobis distance can be strongly affected by the misspecification effect in high dimensions.

The derived results can be applied in several fields of science. One direction of possible applications lead to economics and finance with a special emphasis on optimal portfolio theory. Several control charts for monitoring the structure of optimal portfolios were suggested in O. Bodnar and Schmid (Citation2007), Golosnoy and Schmid (Citation2007), O. Bodnar (Citation2009b), and Golosnoy et al. (Citation2011), among others. Whereas these procedures were developed in the case of small dimensions of data-generating model, the introduced approaches extend the existent methods to the high-dimensional case. Moreover, because the model parameters of the data-generating model in Phase I are not known in financial applications, the obtained findings potentially introduce new methods for monitoring the structure of optimal portfolios, where the Euclidean distance and the diagonalized Euclidean distance are employed in the construction of the test statistics. Another line of possible applications is in environmental science, where problems of monitoring the parameters of the high-dimensional spatial processes may be present (see, e.g., Otto and Schmid Citation2023).

The rest of the article is organized as follows. In Section 2, the MEWMA recursion is discussed and its basic properties are presented. Section 3 introduces the MEWMA control charts for high-dimensional time series under model misspecification, and the distributional properties of the considered statistics are derived in Section 4. The results of the simulation study are given in Section 5, and final remarks are summarized in Section 6.

2. CONTROL CHARTS BASED ON MEWMA RECURSIONS

The MEWMA recursion is defined by (2.1) $Z_{t} = (I - R) Z_{t - 1} + R X_{t}, t \geq 1$ (2.1) with $Z_{0} = μ .$ In the following, we set $R = diag (r_{1}, \dots, r_{p})$ with $r_{i} \in (0, 1]$ for $i = 1, \dots, p$ being known and deterministic (see, e.g., Qiu Citation2013; Montgomery Citation2020).

Following Kramer and Schmid (Citation1997), it holds that for $t \in N$ and p fixed, $E (Z_{t}) = μ + a_{t - τ} \to_{t \to \infty}^{} μ + a I_{N} (τ) with a_{t - τ} = (I - {(I - R)}^{t - τ + 1}) a I_{{0, 1, \dots}} (t - τ)$ and $\begin{array}{l} Σ_{t} = C ov (Z_{t}) = R \sum_{i, j = 0}^{t - 1} {(I - R)}^{i} Γ (j - i) {(I - R)}^{j} R \\ \to_{t \to \infty}^{} Σ_{l} = R \sum_{i, j = 0}^{\infty} {(I - R)}^{i} Γ (j - i) {(I - R)}^{j} R, \end{array}$ provided that ${Γ (v)}$ is absolutely summable; that is, that $\sum_{v = 0}^{\infty} | | Γ (v) | | < \infty,$ where $| | . | |$ denotes the Euclidean norm (see, e.g., Kramer and Schmid Citation1997). Furthermore, to ensure that $Σ_{t}$ is positive definite, it is assumed that the covariance matrix of $(Y_{1}^{'}, \dots Y_{t}^{'})',$ defined as a block matrix with the (j, i)th block equal to $Γ (j - i),$ is positive definite for any t. Alternatively, the positive definiteness of $Σ_{t}$ can be ensured by assuming that the process ${Y_{t}}$ has a positive definite spectral density matrix denoted by $f (λ),$ because for any vector u it holds that $\begin{array}{l} u' Σ_{t} u = u' R \sum_{v, j = 0}^{t - 1} {(I - R)}^{v} (\int_{- π}^{π} e^{i (j - v) λ} f (λ) d λ) {(I - R)}^{j} R u \\ = \int_{- π}^{π} (\sum_{v = 0}^{t - 1} {(I - R)}^{v} e^{- iv λ} R u)' f (λ) (\sum_{j = 0}^{t - 1} {(I - R)}^{j} e^{- ij λ} R u) d λ \geq 0 \end{array}$ with the equality if and only if $\sum_{j = 0}^{t - 1} {(I - R)}^{j} e^{- ij λ} R u = {(I - (I - R) e^{- i λ})}^{- 1} (I - {(I - R)}^{t} e^{- it λ}) u = 0,$ which is equivalent to $u = 0$ due to the definition of R at the beginning of Section 2. It is noted that the spectral density matrix $f (λ)$ is positive definite for many stationary multivariate time series, such as, for example, for a stationary vector autoregressive moving average (VARMA) process. Similar results in the univariate case can be found in Schmid (Citation1997b). Whereas the mean vector of $Z_{t}$ depends whether the observed process is in the in-control state or in the out-of-control state, its covariance matrix is same in both states. In the following, we will always assume that $rk (Σ_{t}) = p = rk (Σ_{l}) .$

The first control chart based on the MEWMA recursion was suggested by Lowry et al. (Citation1992) for independent multivariate observations, and Kramer and Schmid (Citation1997) extended this chart to monitor changes in the mean vector of a multivariate time series. This control chart is based on the Mahalanobis distance $(Z_{t} - μ)' Σ_{t}^{- 1} (Z_{t} - μ) .$ Because for this scheme the covariance matrix has to be determined at each time point, practitioners prefer to work with the statistic $(Z_{t} - μ)' Σ_{l}^{- 1} (Z_{t} - μ),$ where the exact covariance matrix of $Z_{t}$ is replaced by its limit as t tends to infinity.

Recently, R. Bodnar, Bodnar, and Schmid (Citation2023) proposed two further versions of control statistics based on the MEWMA recursion. The first one uses the Euclidean distance to compute the control statistic at each time t and is given by $(Z_{t} - μ)' (Z_{t} - μ) .$ Let $Σ_{d; t}$ be a diagonal matrix that consists of the diagonal elements of $Σ_{t} .$ The second control statistic proposed in R. Bodnar, Bodnar, and Schmid (Citation2023) is based on the diagonalized Euclidean distance expressed as $(Z_{t} - μ)' Σ_{d; t}^{- 1} (Z_{t} - μ) .$

Using these approaches. R. Bodnar, Bodnar, and Schmid (Citation2023) developed suitable control statistics by normalizing these quantities. They distinguished between several possibilities. The control statistics are centered by subtracting the exact in-control mean, the asymptotic in-control mean as t goes to infinity, and the asymptotic in-control mean as p goes to infinity, respectively. Further they normalized these expressions by dividing by the square root of the exact variance, the asymptotic variance as t goes to infinity, and the asymptotic variance as p goes to infinity, respectively. Altogether, 12 control schemes were considered and compared with each other.

3. MEWMA CONTROL CHARTS WITH ESTIMATED PARAMETERS

The control charts in the previous section depend on certain parameters as—for example, $μ, Γ (h), h \geq 0$ —which were assumed to be known in the previous studies. In practice, however, these parameters have to be estimated. This is usually done by a prerun in engineering or by using a historical sample in, for example, finance. In the Phase I analysis, the unknown parameters of the target process are estimated by a previous sample, and these parameter estimators are used within the Phase II analysis, the monitoring phase.

We will assume in the following that a sample of the underlying target process—that is, for $τ = \infty$ —is available to estimate the parameters in Phase I. This sample is assumed to be independent of the observation vectors in Phase II, which are used to construct the MEWMA recursion and the corresponding control charts to monitor the mean behavior of the underlying high-dimensional time series.

To run a control chart, one should estimate the expectation and the autocovariance matrices of the underlying stationary process using the sample from Phase I by a certain estimation procedure. As such, the estimators of $μ$ and $Γ (h), h \geq 0$ are obtained and are used instead of the true population counterparts in the definition of the control statistics. In particular, this means that the estimators of $μ$ and $Γ (h),$ denoted by $μ^{*}$ and $Γ {(h)}^{*},$ are assumed to be deterministic in practice. Of course, we cannot estimate $Γ (h)$ for all values of h for an arbitrary stationary process. However, it is usually assumed that the underlying target process ${Y_{t}}$ follows a vector autoregressive (VAR) or VARMA process and then the autocovariance matrices can be estimated for all h using the estimators of the coefficient matrices and the covariance matrix of the white noise process.

If ${Y_{t}}$ follows a VAR(1) process given by (3.1) $Y_{t} = μ + Φ (Y_{t - 1} - μ) + ε_{t},$ (3.1) where ${ε_{t}}$ are independent and normally distributed with $E (ε_{t}) = 0$ and $C ov (ε_{t}, ε_{t}) = Σ,$ then it holds that $E (Y_{t}) = μ = E_{\infty} (X_{t})$ and (see, e.g., Brockwell and Davis Citation1991; Reinsel Citation1993; Lütkepohl Citation2005) (3.2) $Γ (h) = Φ^{h} Γ (0) and Γ (- h) = Γ (h)' for h = 1, 2, \dots,$ (3.2) where $Γ (0)$ is the solution of the following matrix equation: (3.3) $Γ (0) = Φ Γ (0) Φ' + Σ .$ (3.3)

For estimation of the autocovariance matrices $Γ (h)$ for $h \geq 0,$ the coefficient matrix $Φ$ and the covariance matrix $Σ$ of the white noise process’s various procedures were proposed in the statistical literature (e.g., Brockwell and Davis Citation1991; Hamilton Citation1994). In the following, three estimation methods are considered.

Let $X_{1 : N_{0}} = (X_{1}, \dots, X_{N_{0}})$ denote the $p \times N_{0}$ observation matrix consisting of N₀ observation vectors taken in Phase I. Later, it is assumed that $N_{0} > p .$ This assumption ensures that the sample covariance matrix is positive definite with probability 1 when an independent sample is taken from the normal distribution (see, e.g., theorem 3.1.4 in Muirhead Citation1982). In the first and second approaches, the mean vector $μ$ is estimated by its sample counterpart given by (Brockwell and Davis Citation1991) $\bar{X} = \frac{1}{N_{0}} \sum_{t = 1}^{N_{0}} X_{t} .$

The first method is based on the nonparametric estimation of the autocovariance matrices $Γ (h)$ expressed as ${\hat{Γ}}_{non} (h) = {\begin{array}{l} \frac{1}{N_{0}} \sum_{t = 1}^{N_{0} - h} (X_{t + h} - \bar{X}) (X_{t} - \bar{X})' & for 0 \leq h \leq N_{0} - 1, \\ \frac{1}{N_{0}} \sum_{t = - h + 1}^{N_{0}} (X_{t + h} - \bar{X}) (X_{t} - \bar{X})' & for - N_{0} + 1 \leq h < 0. \end{array}$

The second approach uses the assumption of the VAR(1) model as given in Equation(3.1)(3.1) $Y_{t} = μ + Φ (Y_{t - 1} - μ) + ε_{t},$ (3.1) . In this case, $Γ (0)$ and $Γ (1)$ are estimated nonparametrically by ${\hat{Γ}}_{non} (0)$ and ${\hat{Γ}}_{non} (1),$ respectively, which are then used to estimate $Φ$ and $Σ$ by ${\hat{Φ}}_{V} = {\hat{Γ}}_{non} (1) {\hat{Γ}}_{non} {(0)}^{- 1}$ and ${\hat{Σ}}_{V} = {\hat{Γ}}_{non} (0) - {\hat{Φ}}_{V} {\hat{Γ}}_{non} (0) {\hat{Φ}}_{V}^{'} .$

Finally, $Γ (h)$ and $Γ (- h)$ for $h \geq 1$ are estimated by ${\hat{Γ}}_{V} (h) = {\hat{Φ}}_{V}^{h} {\hat{Γ}}_{non} (0) and {\hat{Γ}}_{V} (- h) = {\hat{Γ}}_{V} (h)' .$

The third approach is based on the maximum likelihood estimation of $ν = (I - Φ) μ, Φ$ and $Σ,$ where I denotes the identity matrix. Following section 11.1 of Hamilton (Citation1994), the maximum likelihood estimator of $Π' = [ν, Φ]$ is given by $\hat{Π}'_{ML} = [{\hat{ν}}_{ML}, {\hat{Φ}}_{ML}] = [\sum_{t = 1}^{N_{0}} X_{t} V_{t}^{'}] {[\sum_{t = 1}^{N_{0}} V_{t} V_{t}^{'}]}^{- 1} with V_{t} = [\begin{matrix} 1 \\ X_{t - 1} \end{matrix}] .$

Furthermore, the maximum likelihood estimator for $Σ$ is expressed as ${\hat{Σ}}_{ML} = \frac{1}{T} \sum_{t = 1}^{T} \hat{ε_{t}} \hat{ε_{t}}' with \hat{ε_{t}} = Y_{t} - \hat{Π}' V_{t} .$

Finally, using Equation(3.2)(3.2) $Γ (h) = Φ^{h} Γ (0) and Γ (- h) = Γ (h)' for h = 1, 2, \dots,$ (3.2) and Equation(3.3)(3.3) $Γ (0) = Φ Γ (0) Φ' + Σ .$ (3.3) , we get the maximum likelihood estimators of $Γ (h)$ given by $vec ({\hat{Γ}}_{ML} (0)) = (I - {\hat{Φ}}_{ML} \otimes {\hat{Φ}}_{ML}) vec ({\hat{Σ}}_{ML}), {\hat{Γ}}_{ML} (h) = {\hat{Φ}}_{ML}^{h} {\hat{Γ}}_{ML} (0), {\hat{Γ}}_{ML} (- h) = {\hat{Γ}}_{ML} (h)',$ where $vec (.)$ denotes the vec operator and the symbol ⊗ stands for the Kronecker product (see, e.g., Harville Citation1997).

Later, we use the symbol * to denote the estimators of the population quantities, which are used in the construction of the control statistics based on MEWMA recursion. This leads to $μ^{*},$ $Σ_{t}^{*} = R \sum_{i, j = 0}^{t - 1} {(I - R)}^{i} Γ {(j - i)}^{*} {(I - R)}^{j} R,$ and $Σ_{d; t}^{*}$ for the matrix consisting of the diagonal elements of $Σ_{t}^{*} .$ By analogy, $Σ_{l}^{*}$ is used, which is well defined if $\sum_{v = 0}^{\infty} | | Γ {(v)}^{*} | | \leq \infty .$ Further, we use $Σ_{d; l}^{*}$ for the limit of $Σ_{d; t}^{*}$ as t tends to infinity.

Replacing $μ, Σ_{t}, Σ_{l},$ and $Σ_{d; t}$ by the misspecified values $μ^{*}, Σ_{t}^{*}, Σ_{l}^{*},$ and $Σ_{d; t}^{*}$ in the quadratic forms discussed in Section 2, we get the misspecified Mahalanobis quantities $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}), (Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*})$ and the misspecified quantities based on the Euclidean distance $(Z_{t} - μ^{*})' (Z_{t} - μ^{*}), (Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) .$

R. Bodnar, Bodnar, and Schmid (Citation2023) introduced several control charts based on the Euclidean distance. However, the effect of misspecification was not taken into account. In this article, we want to analyze the behavior of these control charts under misspecification. Following R. Bodnar, Bodnar, and Schmid (Citation2023) and their notation, we consider the MEWMA control charts given by $\begin{array}{l} T_{1, t}^{*} = \frac{(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) - tr (Σ_{t}^{*})}{\sqrt{2 tr (Σ_{t}^{* 2})}}, \\ T_{2, t}^{*} = \frac{(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) - tr (Σ_{l}^{*})}{\sqrt{2 tr (Σ_{t}^{* 2})}}, \\ T_{3, t}^{*} = \frac{(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) - tr (Σ_{t}^{*})}{\sqrt{2 tr (Σ_{l}^{* 2})}}, \\ T_{4, t}^{*} = \frac{(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) - tr (Σ_{l}^{*})}{\sqrt{2 tr (Σ_{l}^{* 2})}}, \\ T_{6, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) - tr (Σ_{d; t}^{* - 1} Σ_{t}^{*})}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}}, \\ T_{7, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) - tr (Σ_{d; l}^{* - 1} Σ_{l}^{*})}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}}, \\ T_{8, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) - tr (Σ_{d; t}^{* - 1} Σ_{t}^{*})}{\sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}}, \\ T_{9, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) - tr (Σ_{d; l}^{* - 1} Σ_{l}^{*})}{\sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}}, \\ T_{Mah, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}) - p}{\sqrt{2 p}}, \\ T_{MahInf, t}^{*} = \frac{(Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*}) - tr (Σ_{l}^{* - 1} Σ_{t}^{*})}{\sqrt{2 tr ({(Σ_{l}^{* - 1} Σ_{t}^{*})}^{2})}} . \end{array}$

These statistics were studied in R. Bodnar, Bodnar, and Schmid (Citation2023); however, here we additionally take misspecification into account. The statistics $T_{5, t}$ and $T_{10, t}$ of R. Bodnar, Bodnar, and Schmid (Citation2023) are not considered in this article, because they make use of the limit of the first and second moments as p tends to infinity, which in most cases is difficult to determine.

4. BEHAVIOR OF THE CONTROL STATISTICS UNDER MISSPECIFICATION

In this section, we analyze the distributional properties of the control statistics defined under misspecification in detail. In particular, the exact distributions of the quadratic forms present in the definition of the control statistics are derived and their asymptotic approximations are provided in two cases, when t tends to infinity and when p tends to infinity. These results shed light on the effect of misspecification on the performance of the considered MEWMA control charts, especially in the high-dimensional case.

The following notation will be used throughout the article: (4.1) $a_{t - τ} = (I - {(I - R)}^{t - τ + 1}) a I_{{0, 1, 2, \dots}} (t - τ) .$ (4.1)

In Theorems 4.1 and 4.2, the results are presented in the case of the control schemes based on the Mahalanobis distance, and Theorems 4.3 and 4.4 provide the results of the control chart based on the Euclidean distance and the diagonalized Euclidean distance, respectively.

Theorem 4.1.

Let ${Y_{t}}$ be a stationary Gaussian process with $E (Y_{t}) = μ$ and $C ov (Y_{t + h}, Y_{t}) = Γ (h) .$ Let τ be fixed.

Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ . Let $U_{t}$ be an orthogonal matrix such that $U_{t}^{'} Σ_{t}^{1 / 2} Σ_{t}^{* - 1} Σ_{t}^{1 / 2} U_{t} = diag (λ_{Mah, 1, t}, \dots, λ_{Mah, p, t})$ and let $δ_{Mah, t} = U_{t}^{'} Σ_{t}^{- 1 / 2} (μ + a_{t - τ} - μ^{*}) = {(δ_{Mah, i, t})}_{i = 1, \dots, p}$ . Further, suppose that ζ₁, …, ζ_p are independent and standard normally distributed random variables.
Then $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}) \overset{d}{=} \sum_{i = 1}^{p} λ_{Mah, i, t} {(ζ_{i} + δ_{Mah, i, t})}^{2} .$ Moreover, $E ((Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*})) = tr (Σ_{t}^{* - 1} Σ_{t}) + (μ + a_{t - τ} - μ^{*})' Σ_{t}^{* - 1} (μ + a_{t - τ} - μ^{*})$ and $V ar ((Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*})) = 2 tr ({(Σ_{t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{* - 1} Σ_{t} Σ_{t}^{* - 1} (μ + a_{t - τ} - μ^{*}) .$
Suppose that $rk (Σ_{t}) = rk (Σ_{l}) = rk (Σ_{t}^{*}) = rk (Σ_{l}^{*}) = p$ and that ${Γ (v)}$ and ${Γ {(v)}^{*}}$ are absolutely summable. Let p be fixed and let $U_{l}$ be an orthogonal matrix such that $U_{l}^{'} Σ_{l}^{1 / 2} Σ_{l}^{* - 1} Σ_{l}^{1 / 2} U_{l} = diag (λ_{Mah, 1}, \dots, λ_{Mah, p})$ and let $δ_{Mah} = U_{l}^{'} Σ_{l}^{- 1 / 2} (μ + a I_{N} (τ) - μ^{*}) = {(δ_{Mah, i})}_{i = 1, \dots, p}$ .
If, further,(4.2) $lim_{t \to \infty} U_{t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.2) then the asymptotic distribution of $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*})$ as t tends to infinity is equal to the distribution of $\sum_{i = 1}^{p} λ_{Mah, i} {(ζ_{i} + δ_{Mah, i})}^{2}$ with ζ_i as in part (a).
Let t be fixed. Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ and that(4.3) $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})}{\sum_{i = 1}^{p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})} = 0.$ (4.3) Then $\frac{(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}) - E ((Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}))}{\sqrt{V ar ((Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}))}} \to_{p \to \infty}^{d} N (0, 1) .$

Proof.

It holds that $Z_{t} - μ \sim N_{p} (a_{t - τ}, Σ_{t})$ and thus $Σ_{t}^{- 1 / 2} (Z_{t} - μ^{*}) \sim N (Σ_{t}^{- 1 / 2} (μ + a_{t - τ} - μ^{*}), I) .$ Consequently, $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*}) = (Σ_{t}^{- 1 / 2} (Z_{t} - μ^{*}))' Σ_{t}^{1 / 2} Σ_{t}^{* - 1} Σ_{t}^{1 / 2} (Σ_{t}^{- 1 / 2} (Z_{t} - μ^{*})) .$

Thus, the proof of part (a) follows immediately using chapter 3.1a, corollary 3.2b.1, and theorem 3.2b.2 of Mathai and Provost (Citation1992).

To prove part (b), we make use of (a) and the fact that $Σ_{t} \underset{t \to \infty}{\to} Σ_{l} .$ Further, we use that the eigenvalues of a matrix are continuous functions of the elements of the matrix (cf. theorem 9.6 in Lax Citation2007). Consequently, $λ_{Mah, i, t} \underset{t \to \infty}{\to} λ_{Mah, i} .$ Because of Equation(4.2)(4.2) $lim_{t \to \infty} U_{t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.2) , part (b) follows.

To prove part (c), we apply lemma 7.1 of R. Bodnar, Bodnar, and Schmid (Citation2023). □

Condition Equation(4.2)(4.2) $lim_{t \to \infty} U_{t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.2) is needed to prove part (b). In principle, we need the eigenvectors of a symmetric matrix to be continuous functions of the elements of the matrix. This is in general not fulfilled and therefore we have to assume Equation(4.2)(4.2) $lim_{t \to \infty} U_{t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.2) . A detailed discussion of this problem is given in, for example, chapter 9 of Lax (Citation2007). A sufficient condition for Equation(4.2)(4.2) $lim_{t \to \infty} U_{t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.2) to hold is that all eigenvalues of $Σ_{t}^{1 / 2} Σ_{t}^{* - 1} Σ_{t}^{1 / 2}$ are simple. Condition Equation(4.3)(4.3) $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})}{\sum_{i = 1}^{p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})} = 0.$ (4.3) is a technical one and is needed to apply a central limit theorem for a nonidentically distributed random sequence. In particular, this condition ensures that there is no dominating summand (with considerably larger variance) in the infinite sum of the random variables. Namely, the quadratic form $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*})$ can be presented as a weighted sum of independent random variables that are all non-central $χ^{2}$ -distributed with one degree of freedom. Furthermore, the denominator in condition Equation(4.3)(4.3) $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})}{\sum_{i = 1}^{p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})} = 0.$ (4.3) is equal to the variance of the quadratic form with the summands corresponding the variance contribution of each random variable presented in the stochastic representation of $(Z_{t} - μ^{*})' Σ_{t}^{* - 1} (Z_{t} - μ^{*})$ mentioned above. As such, condition Equation(4.3)(4.3) $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})}{\sum_{i = 1}^{p} λ_{Mah, i, t}^{2} (1 + 2 δ_{Mah, i, t}^{2})} = 0.$ (4.3) requires that no summand in the variance decomposition be dominating.

Now we study the statistic based on the limit covariance matrix.

Theorem 4.2.

Let ${Y_{t}}$ be a stationary Gaussian process with $E (Y_{t}) = μ$ and $C ov (Y_{t + h}, Y_{t}) = Γ (h) .$ Let τ be fixed.

Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ . Let $U_{l; t}$ be an orthogonal matrix such that $U_{l; t}^{'} Σ_{t}^{1 / 2} Σ_{l}^{* - 1} Σ_{t}^{1 / 2} U_{l; t} = diag (λ_{MahInf, 1, t}, \dots, λ_{MahInf, p, t})$ and let $δ_{MahInf, t} = U_{l; t}^{'} Σ_{t}^{- 1 / 2} (μ + a_{t - τ} - μ^{*}) = {(δ_{MahInf, i, t})}_{i = 1, \dots, p}$ . Further, suppose that ζ₁, …, ζ_p are independent and standard normally distributed random variables.
Then $(Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*}) \overset{d}{=} \sum_{i = 1}^{p} λ_{MahInf, i, t} {(ζ_{i} + δ_{MahInf, i, t})}^{2} .$ Moreover, $E ((Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*})) = tr (Σ_{l}^{* - 1} Σ_{t}) + (μ + a_{t - τ} - μ^{*})' Σ_{l}^{* - 1} (μ + a_{t - τ} - μ^{*})$ and $V ar ((Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*})) = 2 tr ({(Σ_{l}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{l}^{* - 1} Σ_{t} Σ_{l}^{* - 1} (μ + a_{t - τ} - μ^{*}) .$
Suppose that $rk (Σ_{t}) = rk (Σ_{l}) = rk (Σ_{t}^{*}) = rk (Σ_{l}^{*}) = p$ and that ${Γ (v)}$ and ${Γ {(v)}^{*}}$ are absolutely summable. Let p be fixed and let $U_{l}$ be the orthogonal matrix defined in Theorem 4.1(b). If, further,(4.4) $lim_{t \to \infty} U_{l; t} = U_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.4) then the asymptotic distribution of $(Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*})$ as t tends to infinity is equal to the distribution of $\sum_{i = 1}^{p} λ_{Mah, i} {(ζ_{i} + δ_{Mah, i})}^{2}$ with ζ_i as in part (a).
Let t be fixed. Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ and that $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{MahInf, i, t}^{2} (1 + 2 δ_{MahInf, i, t}^{2})}{\sum_{i = 1}^{p} λ_{MahInf, i, t}^{2} (1 + 2 δ_{MahInf, i, t}^{2})} = 0.$ Then $\frac{(Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*}) - E ((Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*}))}{\sqrt{V ar ((Z_{t} - μ^{*})' Σ_{l}^{* - 1} (Z_{t} - μ^{*}))}} \to_{p \to \infty}^{d} N (0, 1) .$ Of course, the limit distribution in Theorem 4.2(b) is the same as that in Theorem 4.1(b).

Next we analyze the statistics based on the Euclidean distance and the diagonalized Euclidean distance.

Theorem 4.3.

Let ${Y_{t}}$ be a stationary Gaussian process with $E (Y_{t}) = μ$ and $C ov (Y_{t + h}, Y_{t}) = Γ (h) .$ Let τ be fixed.

Suppose that $rk (Σ_{t}) = p$ . Let ${\tilde{U}}_{t}$ be an orthogonal matrix such that ${\tilde{U}}_{t}^{'} Σ_{t} {\tilde{U}}_{t} = diag (λ_{Eu, 1, t}, \dots, λ_{Eu, p, t})$ and let $δ_{Eu, t} = {\tilde{U}}_{t}^{'} Σ_{t}^{- 1 / 2} (μ + a_{t - τ} - μ^{*}) = {(δ_{Eu, i, t})}_{i = 1, \dots, p}$ . Further, suppose that ζ₁, …, ζ_p are independent and standard normally distributed random variables.
Then $(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) \overset{d}{=} \sum_{i = 1}^{p} λ_{Eu, i, t} {(ζ_{i} + δ_{Eu, i, t})}^{2} .$ Moreover, $E ((Z_{t} - μ^{*})' (Z_{t} - μ^{*})) = tr (Σ_{t}) + (μ + a_{t - τ} - μ^{*})' Σ_{t} (μ + a_{t - τ} - μ^{*})$ and $V ar ((Z_{t} - μ^{*})' (Z_{t} - μ^{*})) = 2 tr (Σ_{t}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{2} (μ + a_{t - τ} - μ^{*}) .$
Suppose that $rk (Σ_{t}) = rk (Σ_{l}) = p$ and that ${Γ (v)}$ is absolutely summable. Let p be fixed and let ${\tilde{U}}_{l}$ be an orthogonal matrix such that ${\tilde{U}}_{l}^{'} Σ_{l} {\tilde{U}}_{l} = diag (λ_{Eu, 1}, \dots, λ_{Eu, p})$ and let $δ_{Eu} = {\tilde{U}}_{l}^{'} Σ_{l}^{- 1 / 2} (μ + a I_{N} (τ) - μ^{*}) = {(δ_{Eu, i})}_{i = 1 \dots, p .}$ If, further,(4.5) $lim_{t \to \infty} {\tilde{U}}_{t} = {\tilde{U}}_{l} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.5) then the asymptotic distribution of $(Z_{t} - μ^{*})' (Z_{t} - μ^{*})$ as t tends to infinity is equal to the distribution of $\sum_{i = 1}^{p} λ_{Eu, i} {(ζ_{i} + δ_{Eu, i})}^{2}$ with ζ_i as in part (a).
Let t be fixed. Suppose that $rk (Σ_{t}) = p$ and that $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Eu, i, t}^{2} (1 + 2 δ_{Eu, i, t}^{2})}{\sum_{i = 1}^{p} λ_{Eu, i, t}^{2} (1 + 2 δ_{Eu, i, t}^{2})} = 0.$ Then $\frac{(Z_{t} - μ^{*})' (Z_{t} - μ^{*}) - E ((Z_{t} - μ^{*})' (Z_{t} - μ^{*}))}{\sqrt{V ar ((Z_{t} - μ^{*})' (Z_{t} - μ^{*}))}} \to_{p \to \infty}^{d} N (0, 1) .$

Theorem 4.4.

Let ${Y_{t}}$ be a stationary Gaussian process with $E (Y_{t}) = μ$ and $C ov (Y_{t + h}, Y_{t}) = Γ (h) .$ Let τ be fixed.

Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ . Let $U_{d; t}$ be an orthogonal matrix such that $U_{d; t}^{'} Σ_{t}^{1 / 2} Σ_{d; t}^{* - 1} Σ_{t}^{1 / 2} U_{d; t} = diag (λ_{Eu, d; 1, t}, \dots, λ_{Eu, d; p, t})$ and let $δ_{Eu, d; t} = U_{d; t}^{'} Σ_{t}^{- 1 / 2} (μ + a_{t - τ} - μ^{*}) = {(δ_{Eu, d; i, t})}_{i = 1, \dots, p}$ . Further, suppose that ζ₁, …, ζ_p are independent and standard normally distributed random variables.
Then $(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) \overset{d}{=} \sum_{i = 1}^{p} λ_{Eu, d; i, t} {(ζ_{i} + δ_{Eu, d; i, t})}^{2} .$ Moreover, $E ((Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*})) = tr (Σ_{d; t}^{* - 1} Σ_{t}) + (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*})$ and $V ar ((Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*})) = 2 tr ({(Σ_{d; t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} Σ_{t} Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*}) .$
Suppose that $rk (Σ_{t}) = rk (Σ_{l}) = rk (Σ_{t}^{*}) = rk (Σ_{l}^{*}) = p$ and that ${Γ (v)}$ and ${Γ {(v)}^{*}}$ are absolutely summable. Let p be fixed and let $U_{d}$ be an orthogonal matrix such that $U_{d}^{'} Σ_{l}^{1 / 2} Σ_{d; t}^{* - 1} Σ_{l}^{1 / 2} U_{d} = diag (λ_{Eu, d; 1}, \dots, λ_{Eu, d; p})$ and let $δ_{Eu, d} = U_{d}^{'} Σ_{l}^{- 1 / 2} (μ + a I_{N} (τ) - μ^{*}) = {(δ_{Eu, d; i})}_{i = 1, \dots, p} .$
If, further,(4.6) $lim_{t \to \infty} U_{d; t} = U_{d} and lim_{t \to \infty} Σ_{t}^{1 / 2} = Σ_{l}^{1 / 2},$ (4.6) then the asymptotic distribution of $(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*})$ as t tends to infinity is equal to the distribution of $\sum_{i = 1}^{p} λ_{Eu, d; i} {(ζ_{i} + δ_{Eu, d; i})}^{2}$ with ζ_i as in part (a).
Let t be fixed. Suppose that $rk (Σ_{t}) = rk (Σ_{t}^{*}) = p$ and that $lim_{p \to \infty} \frac{{max}_{1 \leq i \leq p} λ_{Eu, d; i, t}^{2} (1 + 2 δ_{Eu, d; i, t}^{2})}{\sum_{i = 1}^{p} λ_{Eu, d; i, t}^{2} (1 + 2 δ_{Eu, d; i, t}^{2})} = 0.$ Then $\frac{(Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}) - E ((Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}))}{\sqrt{V ar ((Z_{t} - μ^{*})' Σ_{d; t}^{* - 1} (Z_{t} - μ^{*}))}} \to_{p \to \infty}^{d} N (0, 1) .$ Note that in practice the in-control mean is frequently known and the mean must not be estimated. This case is obtained by setting $μ^{*} = μ$ in the above formulas.

Using the results of the theorems, we get under the conditions given that as $t \to \infty$ and with p fixed, $\begin{array}{l} P (T_{1, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, i} {(ζ_{i} + δ_{Eu, i})}^{2} \leq tr (Σ_{t}^{*}) + x \sqrt{2 tr (Σ_{t}^{* 2})}), \\ P (T_{2, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, i} {(ζ_{i} + δ_{Eu, i})}^{2} \leq tr (Σ_{l}^{*}) + x \sqrt{2 tr (Σ_{t}^{* 2})}), \\ P (T_{3, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, i} {(ζ_{i} + δ_{Eu, i})}^{2} \leq tr (Σ_{t}^{*}) + x \sqrt{2 tr (Σ_{l}^{* 2}}), \\ P (T_{4, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, i} {(ζ_{i} + δ_{Eu, i})}^{2} \leq tr (Σ_{l}^{*}) + x \sqrt{2 tr (Σ_{l}^{* 2})}), \\ P (T_{6, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, d; i} {(ζ_{i} + δ_{Eu, d; i})}^{2} \leq tr (Σ_{d; t}^{* - 1} Σ_{t}^{*}) + x \sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}), \\ P (T_{7, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, d; i} {(ζ_{i} + δ_{Eu, d; i})}^{2} \leq tr (Σ_{d; l}^{* - 1} Σ_{l}^{*}) + x \sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}), \\ P (T_{8, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, d; i} {(ζ_{i} + δ_{Eu, d; i})}^{2} \leq tr (Σ_{d; t}^{* - 1} Σ_{t}^{*}) + x \sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}), \\ P (T_{9, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Eu, d; i} {(ζ_{i} + δ_{Eu, d; i})}^{2} \leq tr (Σ_{d; l}^{* - 1} Σ_{l}^{*}) + x \sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}), \\ P (T_{Mah, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Mah, i} {(ζ_{i} + δ_{Mah, i})}^{2} \leq p + x \sqrt{2 p}), \\ P (T_{MahInf, t}^{*} \leq x) \approx P (\sum_{i = 1}^{p} λ_{Mah, i} {(ζ_{i} + δ_{Mah, i})}^{2} \leq tr (Σ_{l}^{* - 1} Σ_{t}^{*}) + x \sqrt{2 tr ({(Σ_{l}^{* - 1} Σ_{t}^{*})}^{2})}) . \end{array}$

As t is fixed and p tends to infinity, it holds under conditions given in Section 4 that control statistics are asymptotically normally distributed in the high-dimensional setting. Namely, we get that $\begin{array}{l} P (T_{1, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{t}^{*}) - tr (Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{t} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr (Σ_{t}^{* 2})}}{\sqrt{2 tr ({(Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{2} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{2, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{l}^{*}) - tr (Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{t} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr (Σ_{t}^{* 2})}}{\sqrt{2 tr ({(Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{2} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{3, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{t}^{*}) - tr (Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{t} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr (Σ_{l}^{* 2})}}{\sqrt{2 tr ({(Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{2} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{4, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{l}^{*}) - tr (Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{t} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr (Σ_{l}^{* 2})}}{\sqrt{2 tr ({(Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{2} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{6, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{d; t}^{* - 1} (Σ_{t}^{*} - Σ_{t})) - (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} Σ_{t} Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{7, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{d; l}^{* - 1} Σ_{l}^{*} - Σ_{d; t}^{* - 1} Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t}^{*})}^{2})}}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} Σ_{t} Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{8, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{d; t}^{* - 1} (Σ_{t}^{*} - Σ_{t})) - (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} Σ_{t} Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{9, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{d; l}^{* - 1} Σ_{l}^{*} - Σ_{d; t}^{* - 1} Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr ({(Σ_{d; l}^{* - 1} Σ_{l}^{*})}^{2})}}{\sqrt{2 tr ({(Σ_{d; t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{d; t}^{* - 1} Σ_{t} Σ_{d; t}^{* - 1} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{Mah, t}^{*} \leq x) \approx Φ (\frac{p - tr (Σ_{t}^{* - 1} Σ_{t}) - (μ + a_{t - τ} - μ^{*})' Σ_{t}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 p}}{\sqrt{2 tr ({(Σ_{t}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{t}^{* - 1} Σ_{t} Σ_{t}^{* - 1} (μ + a_{t - τ} - μ^{*})}}), \\ P (T_{MahInf, t}^{*} \leq x) \approx Φ (\frac{tr (Σ_{l}^{* - 1} (Σ_{t}^{*} - Σ_{t})) - (μ + a_{t - τ} - μ^{*})' Σ_{l}^{* - 1} (μ + a_{t - τ} - μ^{*}) + x \sqrt{2 tr ({(Σ_{l}^{* - 1} Σ_{t}^{*})}^{2})}}{\sqrt{2 tr ({(Σ_{l}^{* - 1} Σ_{t})}^{2}) + 4 (μ + a_{t - τ} - μ^{*})' Σ_{l}^{* - 1} Σ_{t} Σ_{l}^{* - 1} (μ + a_{t - τ} - μ^{*})}}) . \end{array}$

We conclude this section with the analysis of the above asymptotic distributions obtained under the misspecification, which are compared with the corresponding high-dimensional distribution obtained without the effect of misspecification; that is, when $μ^{*}$ and $Γ {(h)}^{*}$ coincide with the true population values $μ$ and $Γ (h) .$ These results provide initial intuition about the effect of misspecification. More detailed analysis is obtained via a Monte Carlo study and is presented in Section 5.

In the setup of the simulation study, we use the same data-generating model as described in Section 5.1. Moreover, to investigate the effects of the sample size, we set $N_{0} \in {100, 250, 500}$ when $p \in {20, 50}$ and $N_{0} \in {2000, 5000, 10000}$ when $p \in {500, 1000} .$ The results for the four variants of the MEWMA control chart based on the Euclidean distance and diagonalized Euclidean distance are depicted in , and the findings obtained for the MEWMA approaches based on the exact and asymptotic Mahalanobis distance are summarized in .

In the figures, we observe that the misspecification has only a minor effect on the in-control performance of the MEWMA control charts that are based on the Euclidean distance and diagonalized Euclidean distance in the case of moderate dimensions of the data-generating model (see and ). All four curves in each plot are located very close to the density of the standard normal distribution, which is the limiting distribution of all considered control statistics when no misspecification is present; that is, the parameters of the in-control process are known. Some minor departures are present only when the sample size in Phase I is equal to 100. Interestingly, when p = 20, minor shifts of the densities to the right take place, whereas the asymptotic distributions under misspecification appear to have smaller variances when p = 50.

Figure 1. Probabilities $P (T_{1, t}^{*} \leq x), P (T_{2, t}^{*} \leq x), P (T_{3, t}^{*} \leq x), P (T_{4, t}^{*} \leq x), P (T_{6, t}^{*} \leq x), P (T_{7, t}^{*} \leq x), P (T_{8, t}^{*} \leq x),$ and $P (T_{9, t}^{*} \leq x)$ as functions of x for t = 5, p = 20, and $N_{0} \in {100, 250, 500} .$ The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 1. Probabilities P(T1,t*≤x),P(T2,t*≤x),P(T3,t*≤x),P(T4,t*≤x),P(T6,t*≤x),P(T7,t*≤x),P(T8,t*≤x), and P(T9,t*≤x) as functions of x for t = 5, p = 20, and N0∈{100,250,500}. The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 2. Probabilities $P (T_{1, t}^{*} \leq x), P (T_{2, t}^{*} \leq x), P (T_{3, t}^{*} \leq x), P (T_{4, t}^{*} \leq x), P (T_{6, t}^{*} \leq x), P (T_{7, t}^{*} \leq x), P (T_{8, t}^{*} \leq x),$ and $P (T_{9, t}^{*} \leq x)$ as functions of x for t = 5, p = 50, and $N_{0} \in {100, 250, 500} .$ The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 2. Probabilities P(T1,t*≤x),P(T2,t*≤x),P(T3,t*≤x),P(T4,t*≤x),P(T6,t*≤x),P(T7,t*≤x),P(T8,t*≤x), and P(T9,t*≤x) as functions of x for t = 5, p = 50, and N0∈{100,250,500}. The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

and present the results for large dimensions of the data-generating model. In this case, the empirical densities are shifted to the left, indicating a more conservative behavior of the MEWMA control chart based on the Euclidean distance and diagonalized Euclidean distance. As such, the effect of misspecification is expected to result in larger values of the in-control ARLs for these control schemes.

Figure 3. Probabilities $P (T_{1, t}^{*} \leq x), P (T_{2, t}^{*} \leq x), P (T_{3, t}^{*} \leq x), P (T_{4, t}^{*} \leq x), P (T_{6, t}^{*} \leq x), P (T_{7, t}^{*} \leq x), P (T_{8, t}^{*} \leq x),$ and $P (T_{9, t}^{*} \leq x)$ as functions of x for t = 5, p = 500, and $N_{0} \in {2000, 5000, 10000} .$ The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 3. Probabilities P(T1,t*≤x),P(T2,t*≤x),P(T3,t*≤x),P(T4,t*≤x),P(T6,t*≤x),P(T7,t*≤x),P(T8,t*≤x), and P(T9,t*≤x) as functions of x for t = 5, p = 500, and N0∈{2000,5000,10000}. The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 4. Probabilities $P (T_{1, t}^{*} \leq x), P (T_{2, t}^{*} \leq x), P (T_{3, t}^{*} \leq x), P (T_{4, t}^{*} \leq x), P (T_{6, t}^{*} \leq x), P (T_{7, t}^{*} \leq x), P (T_{8, t}^{*} \leq x),$ and $P (T_{9, t}^{*} \leq x)$ as functions of x for t = 5, p = 1,000, and $N_{0} \in {2000, 5000, 10000} .$ The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 4. Probabilities P(T1,t*≤x),P(T2,t*≤x),P(T3,t*≤x),P(T4,t*≤x),P(T6,t*≤x),P(T7,t*≤x),P(T8,t*≤x), and P(T9,t*≤x) as functions of x for t = 5, p = 1,000, and N0∈{2000,5000,10000}. The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

The situation is completely different in , where the results for the control charts based on the Mahalanobis distance are provided. The impact of the misspecification is considerable for this type of the MEWMA control scheme, even for moderate dimensions of the data-generating model. The blue curves in the plots, which correspond to the case of $N_{0} = 100,$ are drastically shifted to the right. Moreover, the increase in the sample size does not obviously lead to the desired behavior of the asymptotic distributions. Even in the case of $N_{0} = 500$ and p = 50, the asymptotic densities deviate significantly from the distribution corresponding to the case without misspecification. For larger dimensions of the data-generating model, the empirical densities are completely shifted to the right. This behavior would result in high probabilities of false alarms when the MEWMA control schemes based on the Mahalanobis distance are used. Such an effect can be explained by the large amount of noise that is present in the nondiagonal elements of the inverse estimated covariance matrices $Σ_{t}^{*}$ and $Σ_{l}^{*} .$ In contrast to the control schemes based on the Mahalanobis distance, the statistics of MEWMA control charts based on the Euclidian distance and the diagonalized Euclidean distance do not use the nondiagonal elements of $Σ_{t}^{*}$ and $Σ_{l}^{*}$ in their definitions, and, as such, they appear to be robust against misspecification.

Figure 5. Probabilities $P (T_{Mah, t}^{*} \leq x)$ (left column) and $P (T_{MahInf, t}^{*} \leq x)$ (right column) as functions of x for t = 5. We set $p \in {20, 50}$ with $N_{0} \in {100, 250, 500}$ and $p \in {500, 1000}$ with $N_{0} \in {2000, 5000, 10000} .$ The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

Figure 5. Probabilities P(TMah,t*≤x) (left column) and P(TMahInf,t*≤x) (right column) as functions of x for t = 5. We set p∈{20,50} with N0∈{100,250,500} and p∈{500,1000} with N0∈{2000,5000,10000}. The red plot corresponds to the density of the distribution in the case without misspecification; that is, the standard normal distribution.

5. COMPARISON STUDY

The above results characterize the behavior of the considered MEWMA charts if the parameters of the process are misspecified. Thus, we can compare the charts and see how the charts react to deviations from the true parameters. This point will be illustrated in Section 5.2.

In practice, we estimate the process parameters within the Phase I analysis. Usually a preliminary run or historical data is used. Several approaches have been proposed in the literature, which we discussed in Section 3. If we apply the control statistics $T_{i, t} = T_{i, t} (θ)$ proposed above, then we have to estimate the parameter by the prerun and get ${\hat{T}}_{i, t} = T_{i, t} (\hat{θ}) .$ Here we will assume that the estimators obtained in the Phase I analysis are independent from the control statistics used in the Phase II analysis. With the law of total probability, we get (4.7) $P (T_{i, t} (\hat{θ}) \leq x) = \int_{}^{} P (T_{i, t} (\hat{θ}) \leq x | \hat{θ} = θ^{*}) f_{\hat{θ}} (θ^{*}) d θ^{*} = \int_{}^{} P (T_{i, t} (θ^{*}) \leq x) f_{\hat{θ}} (θ^{*}) d θ^{*} .$ (4.7)

We have exclusively analyzed $P (T_{i, t} (θ^{*}) \leq x)$ in the previous sections. To study Equation(4.7)(4.7) $P (T_{i, t} (\hat{θ}) \leq x) = \int_{}^{} P (T_{i, t} (\hat{θ}) \leq x | \hat{θ} = θ^{*}) f_{\hat{θ}} (θ^{*}) d θ^{*} = \int_{}^{} P (T_{i, t} (θ^{*}) \leq x) f_{\hat{θ}} (θ^{*}) d θ^{*} .$ (4.7) , we have to know the distribution of the estimator $\hat{θ} .$ This distribution is unknown; we only know that these quantities are usually asymptotically normally distributed. This fact could be used to get an approximation to Equation(4.7)(4.7) $P (T_{i, t} (\hat{θ}) \leq x) = \int_{}^{} P (T_{i, t} (\hat{θ}) \leq x | \hat{θ} = θ^{*}) f_{\hat{θ}} (θ^{*}) d θ^{*} = \int_{}^{} P (T_{i, t} (θ^{*}) \leq x) f_{\hat{θ}} (θ^{*}) d θ^{*} .$ (4.7) , but then the problem is to determine the resulting integral. Here we choose another procedure to evaluate Equation(4.7)(4.7) $P (T_{i, t} (\hat{θ}) \leq x) = \int_{}^{} P (T_{i, t} (\hat{θ}) \leq x | \hat{θ} = θ^{*}) f_{\hat{θ}} (θ^{*}) d θ^{*} = \int_{}^{} P (T_{i, t} (θ^{*}) \leq x) f_{\hat{θ}} (θ^{*}) d θ^{*} .$ (4.7) and make use of simulations. Our procedure and the results are provided in Section 5.2.

5.1. Design of the Comparison Study

In Section 5.1 we described the design of our simulation study. We have chosen the in-control process to be a VAR(1) process; that is, $Y_{t} = Φ Y_{t - 1} + ε_{t}$ with independent white noise process ${ε_{t}}$ and

$Φ = φ I$ with $φ = 0.5,$
$ε_{t} \sim N_{p} (0, Σ)$ with $Σ = DAD,$

where

D = diag (d_{1}, \dots, d_{p})

is a diagonal matrix consisting of the standard deviations

d_{1}, \dots, d_{p}

and

A = (\begin{matrix} 1 & α & α^{2} & \dots & α^{p - 1} \\ α & 1 & α & \dots & α^{p - 2} \\ α^{2} & α & 1 & \dots & α^{p - 3} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ α^{p - 1} & α^{p - 2} & α^{p - 3} & \dots & 1 \end{matrix})

is a correlation matrix with $α = 0.5 .$ The values of $d_{1}, \dots, d_{p}$ are drawn randomly from the uniform distribution on the interval $[0.5, 2] .$

Additionally, we set $R = r I$ with $r \in {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}$ and the dimension of the observed process and the target process p to be equal to 20. Note that at the beginning we determine the matrix $Σ$ and keep it constant for all of our simulations. Furthermore, we will always assume in this article that $μ = 0$ is known and thus we do not estimate $μ;$ that is, $μ = μ^{*} = 0 .$

5.2. Behavior of the Control Statistics under Misspecification

The control procedures will be compared using the ARL as a performance criterion (see, e.g., Montgomery Citation2020). This is the most popular performance measure in SPC. Whereas the run length is equal to the first value at which the control chart gives a signal, the ARL is equal to the expected run length assuming that the change has occurred at the first position; that is, τ = 1.

Because the control statistics are dependent variables over time having a complicate correlation structure, no explicit expressions for the ARL are known. Note that even for univariate time series, explicit expressions are only known for special cases (see, e.g., Schmid Citation1995). In the case of an independent sequence of normally distributed variables, the ARL is obtained by solving an integral equation (e.g., Knoth Citation2021). This is why we use simulations to estimate the ARL.

In a first step, we generate a preliminary sample of the above VAR(1) process of size 500 and use it to estimate the parameters of the VAR(1) process following the three estimators presented in Section 3. It is assumed that the preliminary sample is in control. Then, in the Phase II analysis, we generate a realization of the Phase II process. Here we restrict ourselves to the case where the Phase II process is in control as well. The Phase I process and the Phase II process are assumed to be independent of each other. Now we apply one of the above-discussed control charts, fix a control limit, and determine the run length for the given data. This procedure is repeated 1,000 times and the ARL is used as an estimator of the true run length. Consequently, we get an estimator of the in-control ARL for a given control limit.

To compare the control charts, they have to be calibrated. This means that the control limit for each chart is determined in such way that the corresponding in-control ARL is equal to a fixed value ξ. In this article, we chose ξ = 200. This means that, on average, after 200 observations a wrong decision is made; that is, it is concluded that the process is out of control but in reality it is in control. The choice of ξ depends on the data frequency. In engineering, frequently the value 500 is chosen, whereas in finance smaller values are taken; for example, ξ = 60. A discussion on the choice of ξ can be found in Severin and Schmid (Citation1999).

Now we have to determine the control limits for each chart. Therefore, the Regula falsi is applied to the estimated in-control ARL; that is, the estimated in-control ARL is calculated for various values of control limits based on 10⁴ independent observations. This procedure leads to estimators for the control limits. R. Bodnar, Bodnar, and Schmid (Citation2023) noticed that the control charts $T_{6, t}$ and $T_{7, t}$ have the same behavior as $T_{8, t}$ and $T_{9, t},$ respectively. Thus, in the following we will consider only $T_{6, t}$ and $T_{7, t} .$ Moreover, we will omit the chart $T_{2, t}$ because even for the true process this control scheme performs very poorly in both the in-control and out-of-control states.

and depict the in-control ARLs computed for the misspecified $T_{1, t}^{*}, T_{3, t}^{*}, T_{4, t}^{*}, T_{6, t}^{*}, T_{7, t}^{*}, T_{Mah, t}^{*},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for various values of $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 20-dimensional VAR(1) model in and the 100-dimensional VAR(1) model in as defined in Section 5.1. We observe that the estimation method has only a minor impact on the computed values of the estimated ARLs with the exception present for both MEWMA control schemes based on the Mahalanobis distance when r is small. In general, quite robust behavior toward the parameter misspecification is present when the control statistics are defined by using the Euclidean norm and the diagonalized Euclidean norm. In all of the cases, the computed ARLs depart from the target ARL by not more than 15% when p = 20. In the case of a larger dimension of the data-generating model, the control charts based on the Euclidean norm and the diagonalized Euclidean norm become more conservative with the deviation of the empirical ARLs to the target value of 200 bounded by around 65%. The results obtained for the control schemes based on the Mahalanobis distance are completely different. The ARLs are considerably smaller than the target value of 200, especially, when r is small. If p = 100, then the computed ARLs for the MEWMA control charts based on the Mahalanobis distance are very small, indicating high probabilities of the false alarm in this case due to the misspecification. These results are in line with our previous findings depicted in , where it was noted that the high-dimensional asymptotic distributions of the control statistics in the case of Mahalanobis distance are drastically moved to the right, which results in small values of the estimated ARLs. Because the control statistics, which are based on the Euclidean distance and the diagonalized Euclidean distance, do not depend on the inverse of $Σ_{t}^{*}$ and $Σ_{l}^{*},$ these control charts are robust to large estimation errors that are present when the inverse of a covariance matrix is estimated. Finally, a similar performance of the MEWMA control charts based on the Euclidean distance and the diagonalized Euclidean distance is displayed in , which is again in line with the results depicted in and .

Table 1. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 20-dimensional VAR(1) process.

Display Table

Table 2. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 100-dimensional VAR(1) process.

Display Table

6. CONCLUSION

The commonly used multivariate control charts are derived under the assumption that the parameters of the target process are known before the control procedure starts. However, this assumption is not fulfilled in many situations of practical interest; for example, in economics and finance. In such situations, a detailed analysis is performed in Phase 1 of the surveillance procedure with the aim to fit the target process by estimating its parameters. The quality of the estimator is expected to have a considerable impact on the control procedure, especially when the data-generating model is of large dimension.

In this article, we analyze the in-control properties of the MEWMA control charts whose control statistics are based on the Mahalanobis distance, the Euclidean distance, and the diagonalized Euclidean distance. Misspecified MEWMA control schemes are proposed in which the unknown parameters of the in-control process are replaced by their estimators. The distributional properties of the control statistics of the misspecified control charts are investigated and their high-dimensional asymptotic distributions are derived. The established theoretical results are used to study the effect of the mispecification, and the finite-sample behavior of the control statistics is assessed via an intensive simulation study. Based on our findings, it is concluded that the MEWMA control schemes based on the Mahalanobis distance can suffer from considerable misspecification, whereas the control charts whose test statistics are defined by using the Euclidean distance and the diagonalized Euclidean distance are quite robust to misspecification, because we estimate the parameters of the target process in the Phase I analysis of the monitoring process.

DISCLOSURE

The authors have no conflicts of interest to report.

ACKNOWLEDGEMENT

The authors thank the Editor, the Associate Editor, and two anonymous reviewers for their constructive comments that improved the quality of this article.

Additional information

Funding

The first and third authors acknowledge financial support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Project No. 428472210.

REFERENCES

Albers, W., and W. C. Kallenberg. 2004. “Are Estimated Control Charts in Control?” Statistics 38 (1): 67–79. https://doi.org/10.1080/02669760310001619369
Web of Science ®Google Scholar
Alwan, L. C., and H. V. Roberts. 1988. “Time-Series Modeling for Statistical Process Control.” Journal of Business & Economic Statistics 6 (1): 87–95.
Web of Science ®Google Scholar
Andersson, E., D. Bock, and M. Frisén. 2004. “Detection of Turning Points in Business Cycles.” Journal of Business Cycle Measurement and Analysis 2004 (1): 93–108. https://doi.org/10.1787/jbcma-v2004-art6-en
Google Scholar
Bai, Z., and J. W. Silverstein. 2010. Spectral Analysis of Large Dimensional Random Matrices, Vol. 20. New York: Springer.
Google Scholar
Bodnar, O. 2007. “Sequential Procedures for Monitoring Covariances of Asset Returns.” In Advances in Risk Management, edited by G. N. Gregoriou, 241–64. New York: Palgrave.
Google Scholar
Bodnar, O. 2009a. “Application of the Generalized Likelihood Ratio Test for Detecting Changes in the Mean of Multivariate Garch Processes.” Communications in Statistics - Simulation and Computation 38 (5): 919–938. https://doi.org/10.1080/03610910802691861
Web of Science ®Google Scholar
Bodnar, O. 2009b. “Sequential Surveillance of the Tangency Portfolio Weights.” International Journal of Theoretical and Applied Finance 12 (06): 797–810. https://doi.org/10.1142/S0219024909005464
Google Scholar
Bodnar, O., and W. Schmid. 2007. “Surveillance of the Mean Behavior of Multivariate Time Series.” Statistica Neerlandica 61 (4): 383–406. https://doi.org/10.1111/j.1467-9574.2007.00365.x
Web of Science ®Google Scholar
Bodnar, O., and W. Schmid. 2011. “CUSUM Charts for Monitoring the Mean of a Multivariate Gaussian Process.” Journal of Statistical Planning and Inference 141 (6): 2055–2070. https://doi.org/10.1016/j.jspi.2010.12.020
Web of Science ®Google Scholar
Bodnar, O., and W. Schmid. 2017. “CUSUM Control Schemes for Monitoring the Covariance Matrix of Multivariate Time Series.” Statistics 51 (4): 722–744. https://doi.org/10.1080/02331888.2016.1268616
Web of Science ®Google Scholar
Bodnar, R., T. Bodnar, and W. Schmid. 2023. “Sequential Monitoring of High-Dimensional Time Series.” Scandinavian Journal of Statistics 50 (3): 962–992. https://doi.org/10.1111/sjos.12607
Web of Science ®Google Scholar
Bodnar, T., H. Dette, and N. Parolya. 2019. “Testing for Independence of Large Dimensional Vectors.” The Annals of Statistics 47 (5): 2977–3008. https://doi.org/10.1214/18-AOS1771
Web of Science ®Google Scholar
Brockwell, P. J., and R. A. Davis. 1991. Time Series: Theory and Methods. New York: Springer Science & Business Media.
Google Scholar
Chen, S., and H. B. Nembhard. 2011. “A High-Dimensional Control Chart for Profile Monitoring.” Quality and Reliability Engineering International 27 (4): 451–464. https://doi.org/10.1002/qre.1140
Web of Science ®Google Scholar
Crosier, R. 1988. “Multivariate Generalizations of Cumulative Sum Quality-Control Schemes.” Technometrics 30 (3): 291–303. https://doi.org/10.1080/00401706.1988.10488402
Web of Science ®Google Scholar
Frisén, M. 1992. “Evaluations of Methods for Statistical Surveillance.” Statistics in Medicine 11 (11): 1489–1502. https://doi.org/10.1002/sim.4780111107
PubMed Web of Science ®Google Scholar
Golosnoy, V., and W. Schmid. 2007. “EWMA Control Charts for Monitoring Optimal Portfolio Weights.” Sequential Analysis 26 (2): 195–224. https://doi.org/10.1080/07474940701247099
Google Scholar
Golosnoy, W., I. Okhrin, S. Ragulin, and W. Schmid. 2011. “On the Application of SPC in Finance.” Frontiers in Statistical Quality Control 9: 119–32.
Google Scholar
Hamilton, J. D. 1994. Time Series Analysis. New Jersey: Princeton University Press.
Google Scholar
Harville, D. A. 1997. Matrix Algebra from Statistician’s Perspective. New York: Springer.
Google Scholar
Hotelling, H. 1947. “Multivariate Quality Control—Illustrated by the Air Testing of Sample Bombsights.” In Techniques of Statistical Analysis, edited by C. Eisenhart, M. W. Hastay, and W. Wallis, 111–184. New York: McGraw Hill.
Google Scholar
Jardim, F. S., S. Chakraborti, and E. K. Epprecht. 2020. “Two Perspectives for Designing a Phase II Control Chart with Estimated Parameters: The Case of the Shewhart X Chart.” Journal of Quality Technology 52 (2): 198–217. https://doi.org/10.1080/00224065.2019.1571345
Web of Science ®Google Scholar
Jensen, W. A., L. A. Jones-Farmer, C. W. Champ, and W. H. Woodall. 2006. “Effects of Parameter Estimation on Control Chart Properties: A Literature Review.” Journal of Quality Technology 38 (4): 349–364. https://doi.org/10.1080/00224065.2006.11918623
Web of Science ®Google Scholar
Knoth, S. 2021. “Steady-State Average Run Length (s): Methodology, Formulas, and Numerics.” Sequential Analysis 40 (3): 405–426. https://doi.org/10.1080/07474946.2021.1940501
Web of Science ®Google Scholar
Knoth, S., and W. Schmid. 2002. “Monitoring the Mean and the Variance of a Stationary Process.” Statistica Neerlandica 56 (1): 77–100. https://doi.org/10.1111/1467-9574.03000
Web of Science ®Google Scholar
Kramer, H., and W. Schmid. 1997. “EWMA Charts for Multivariate Time Series.” Sequential Analysis 16 (2): 131–154. https://doi.org/10.1080/07474949708836378
Google Scholar
Kramer, H., and W. Schmid. 2000. “The Influence of Parameter Estimation on the ARL of Shewhart Type Charts for Time Series.” Statistical Papers 41 (2): 173–196. https://doi.org/10.1007/BF02926102
Web of Science ®Google Scholar
Lawson, A., and K. Kleinman. 2005. Spatial & Syndromic Surveillance. New York: Wiley.
Google Scholar
Lax, P. D. 2007. Linear Algebra and Its Applications, Vol. 78. New Jersey: John Wiley & Sons.
Google Scholar
Li, Y., Y. Liu, C. Zou, and W. Jiang. 2014. “A Self-Starting Control Chart for High-Dimensional Short-Run Processes.” International Journal of Production Research 52 (2): 445–461. https://doi.org/10.1080/00207543.2013.832001
Web of Science ®Google Scholar
Lowry, C., W. Woodall, C. Champ, and S. Rigdon. 1992. “A Multivariate Exponentially Weighted Moving Average Control Chart.” Technometrics 34 (1): 46–53. https://doi.org/10.2307/1269551
Web of Science ®Google Scholar
Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. Berlin: Springer Science & Business Media.
Google Scholar
Mathai, A. M., and S. B. Provost. 1992. Quadratic Forms in Random Variables: Theory and Applications. New York: Dekker.
Google Scholar
Messaoud, A., C. Weihs, and F. Hering. 2008. “Detection of Chatter Vibration in a Drilling Process Using Multivariate Control Charts.” Computational Statistics & Data Analysis 52 (6): 3208–3219. https://doi.org/10.1016/j.csda.2007.09.029
Web of Science ®Google Scholar
Montgomery, D. C. 2020. Introduction to Statistical Quality Control. Hoboken: John Wiley & Sons.
Google Scholar
Muirhead, R. J. 1982. Aspects of Multivariate Statistical Theory. New York: Wiley.
Google Scholar
Ngai, H., and J. Zhang. 2001. “Multivariate Cumulative Sum Control Charts Based on Projection Pursuit.” Statistica Sinica 11: 747–766.
Web of Science ®Google Scholar
Otto, P., and W. Schmid. 2023. “A General Framework for Spatial Garch Models.” Statistical Papers 64 (5): 1721–1747. https://doi.org/10.1007/s00362-022-01357-1
Web of Science ®Google Scholar
Page, E. S. 1954. “Continuous Inspection Schemes.” Biometrika 41 (1-2): 100–115. https://doi.org/10.1093/biomet/41.1-2.100
Web of Science ®Google Scholar
Pignatiello, J., and G. Runger. 1990. “Comparisons of Multivariate CUSUM Charts.” Journal of Quality Technology 22 (3): 173–186. https://doi.org/10.1080/00224065.1990.11979237
Web of Science ®Google Scholar
Qiu, P. 2013. Introduction to Statistical Process Control. Boca Raton, FL: CRC press.
Google Scholar
Reinsel, G. 1993. Multivariate Time Series Analysis. New York: John Wiley & Sons.
Google Scholar
Roberts, S. 1959. “Control Chart Tests Based on Geometric Moving Averages.” Technometrics 1 (3): 239–250. https://doi.org/10.1080/00401706.1959.10489860
Google Scholar
Saleh, N. A., M. A. Mahmoud, L. A. Jones-Farmer, I. Zwetsloot, and W. H. Woodall. 2015. “Another Look at the EWMA Control Chart with Estimated Parameters.” Journal of Quality Technology 47 (4): 363–382. https://doi.org/10.1080/00224065.2015.11918140
Web of Science ®Google Scholar
Sarmiento, M. G., F. S. Jardim, S. Chakraborti, and E. K. Epprecht. 2022. “Design of Variance Control Charts with Estimated Parameters: A Head to Head Comparison between Two Perspectives.” Journal of Quality Technology 54 (3): 249–268. https://doi.org/10.1080/00224065.2020.1834892
Web of Science ®Google Scholar
Schipper, S., and W. Schmid. 2001. “Sequential Methods for Detecting Changes in the Variance of Economic Time Series.” Sequential Analysis 20 (4): 235–262. https://doi.org/10.1081/SQA-100107647
Google Scholar
Schmid, W. 1995. “On the Run Length of a Shewhart Chart for Correlated Data.” Statistical Papers 36 (1): 111–130. https://doi.org/10.1007/BF02926025
Web of Science ®Google Scholar
Schmid, W. 1997a. “CUSUM Control Schemes for Gaussian Processes.” Statistical Papers 38 (2): 191–217. https://doi.org/10.1007/BF02925223
Web of Science ®Google Scholar
Schmid, W. 1997b. “On EWMA Charts for Time Series.” In Frontiers in Statistical Quality Control, edited by Hans-Joachim Lenz, Peter-Theodor Wilrich, 115–137. Berlin, Heidelberg: Springer.
Google Scholar
Schmid, W., and A. Schöne. 1997. “Some Properties of the EWMA Control Chart in the Presence of Autocorrelation.” The Annals of Statistics 25 (3): 1277–1283. https://doi.org/10.1214/aos/1069362748
Web of Science ®Google Scholar
Schmid, W., and D. Tzotchev. 2004. “Statistical Surveillance of the Parameters of a One-Factor Cox-Ingersoll-Ross Model.” Sequential Analysis 23 (3): 379–412. https://doi.org/10.1081/SQA-200027052
Google Scholar
Severin, T., and W. Schmid. 1999. “Monitoring Changes in GARCH Models.” Allgemeines Statistisches Archiv (AStA) 83: 281–307.
Google Scholar
Shewhart, W. A. 1926. “Quality Control Charts.” Bell System Technical Journal 5 (4): 593–603. https://doi.org/10.1002/j.1538-7305.1926.tb00125.x
Google Scholar
Śliwa, P., and W. Schmid. 2005. “Monitoring the Cross-Covariances of a Multivariate Time Series.” Metrika 61 (1): 89–115. https://doi.org/10.1007/s001840400326
Web of Science ®Google Scholar
Sonesson, C., and D. Bock. 2003. “A Review and Discussion of Prospective Statistical Surveillance in Public Health.” Journal of the Royal Statistical Society Series A: Statistics in Society 166 (1): 5–21. https://doi.org/10.1111/1467-985X.00256
Web of Science ®Google Scholar
Theodossiou, P. T. 1993. “Predicting Shifts in the Mean of a Multivariate Time Series Process: An Application in Predicting Business Failures.” Journal of the American Statistical Association 88 (422): 441–449. https://doi.org/10.1080/01621459.1993.10476294
Web of Science ®Google Scholar
Wang, K., and W. Jiang. 2009. “High-Dimensional Process Monitoring and Fault Isolation via Variable Selection.” Journal of Quality Technology 41 (3): 247–258. https://doi.org/10.1080/00224065.2009.11917780
Web of Science ®Google Scholar
Wang, Z., Y. Li, and X. Zhou. 2017. “A Statistical Control Chart for Monitoring High-Dimensional Poisson Data Streams.” Quality and Reliability Engineering International 33 (2): 307–321. https://doi.org/10.1002/qre.2005
Web of Science ®Google Scholar

Control charts for high-dimensional time series with estimated in-control parameters

Abstract

1. INTRODUCTION

2. CONTROL CHARTS BASED ON MEWMA RECURSIONS

3. MEWMA CONTROL CHARTS WITH ESTIMATED PARAMETERS

4. BEHAVIOR OF THE CONTROL STATISTICS UNDER MISSPECIFICATION

5. COMPARISON STUDY

5.1. Design of the Comparison Study

5.2. Behavior of the Control Statistics under Misspecification

Table 1. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 20-dimensional VAR(1) process.

Table 2. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 100-dimensional VAR(1) process.

6. CONCLUSION

DISCLOSURE

ACKNOWLEDGEMENT

REFERENCES

Information for

Open access

Opportunities

Help and information

Control charts for high-dimensional time series with estimated in-control parameters

Abstract

1. INTRODUCTION

2. CONTROL CHARTS BASED ON MEWMA RECURSIONS

3. MEWMA CONTROL CHARTS WITH ESTIMATED PARAMETERS

4. BEHAVIOR OF THE CONTROL STATISTICS UNDER MISSPECIFICATION

5. COMPARISON STUDY

5.1. Design of the Comparison Study

5.2. Behavior of the Control Statistics under Misspecification

Table 1. ARLs of the T1,t*,T3,t*,T4,t*,T6,t*,T7,t*,TMah,t*, and TMahInf,t* MEWMA control charts for r∈{0.1,0.2,…,1.0} when the in-control process is the 20-dimensional VAR(1) process.

Table 2. ARLs of the T1,t*,T3,t*,T4,t*,T6,t*,T7,t*,TMah,t*, and TMahInf,t* MEWMA control charts for r∈{0.1,0.2,…,1.0} when the in-control process is the 100-dimensional VAR(1) process.

6. CONCLUSION

DISCLOSURE

ACKNOWLEDGEMENT

Additional information

Funding

REFERENCES

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 20-dimensional VAR(1) process.

Table 2. ARLs of the $T_{1, t}^{}, T_{3, t}^{}, T_{4, t}^{}, T_{6, t}^{}, T_{7, t}^{}, T_{Mah, t}^{},$ and $T_{MahInf, t}^{*}$ MEWMA control charts for $r \in {0.1, 0.2, \dots, 1.0}$ when the in-control process is the 100-dimensional VAR(1) process.