Full article: Tests for Large-Dimensional Shape Matrices via Tyler’s M Estimators

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Tyler’s M estimator, as a robust alternative to the sample covariance matrix, has been widely applied in robust statistics. However, classical theory on Tyler’s M estimator is mainly developed in the low-dimensional regime for elliptical populations. It remains largely unknown when the parameter of dimension p grows proportionally to the sample size n for general populations. By using the eigenvalues of Tyler’s M estimator, this article develops tests for the identity and equality of shape matrices in a large-dimensional framework where the dimension-to-sample size ratio p/n has a limit in (0, 1). The proposed tests can be applied to a broad class of multivariate distributions including the family of elliptical distributions (see model (2.1) for details). To analyze both the null and alternative distributions of the proposed tests, we provide a unified theory on the spectrum of a large-dimensional Tyler’s M estimator when the underlying population is general. Simulation results demonstrate good performance and robustness of our tests. An empirical analysis of the Fama-French 49 industrial portfolios is carried out to demonstrate the shape of the portfolios varying. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

KEYWORDS:

1 Introduction

Tests for shape (or scatter) matrix for multivariate nonnormal data play the same role as tests for covariance matrix in multivariate normal distributions. This is one of the central parts in multivariate analysis (Anderson Citation1984). Many applications including, but not limited to, signal processing, wireless communication system, biology, and financial engineering may impose certain structures on shape or covariance matrix. Thus, test of shape or covariance matrix structure is of primary interest in modern multivariate analysis. Motivated by an empirical analysis of financial economic data in Section 3.3, this article aims to develop tests for shape matrix of large-dimensional data without imposing normality assumption.

As a natural extension of multivariate normal distribution, elliptical symmetric distributions (Fang, Kotz, and Ng Citation1990) are commonly used in the literature of nonparametric multivariate analysis. The shape or scatter matrix of elliptical distribution is proportional to the covariance matrix if its second moment exists and is finite. This motivates researchers to develop statistical inference procedures for the shape matrix rather than the covariance matrix of elliptical distributions. In many real-world applications such as the empirical analysis in Section 3.3, data may exhibit heavy-tailed nature or potentially contaminated by outliers. As it is well known, sample covariance matrix based inference on the shape is sensitive to outliers and heavy tails, and therefore it will lose its efficiency and become less reliable when the underlying populations are elliptically distributed. To cope with this issue, many robust alternatives are used in the literature, including the M-estimators (Huber Citation1964), the minimum volume ellipsoid and minimum determinant estimators (Rousseeuw Citation1985) and Tyler’s M estimator (Tyler Citation1987), etc. Hallin and Paindaeine (Citation2006) and Hallin, Oja, and Paindaeine (Citation2006) proposed a class of rank-based procedures for testing sphericity of elliptical distribution (equivalently testing the shape matrix being an identity matrix), and showed their test for shape is semiparametrically efficient. All aforementioned works assume that the data are finite and fixed-dimensional. The theoretical analyses in these works may not be applicable for large-dimensional data, which are common in many real-world applications nowadays. Thus, it is of great importance in developing new theory that is applicable for establishing asymptotic properties of statistical inference procedures for large-dimensional shape matrix.

Tests for large-dimensional covariance matrix are closely related to tests on shape matrix and have received increasing attention in the literature. Tests for the identity and sphericity of covariance matrix have been studied in the literature Ledoit and Wolf Citation2002; Bai et al. Citation2009; Chen, Zhang, and Zhong Citation2010. Tests for the equality and proportionality of covariance matrices in the two-sample problems have been studied in Schott (Citation2007), Li and Chen (Citation2012), and Zou et al. (Citation2021). A common feature of these conventional tests is that their statistics are constructed from sample covariance matrix or sample correlation matrix, which are both well-behaved for those Gaussian-type and light-tailed populations, but are sensitive to outliers and heavy-tailed distributions. Meanwhile, random matrix theory for sample covariance or correlation matrices have been well developed for Gaussian-type and light-tailed populations, but is less developed for heavy-tailed populations. One of primary goals of this article is to develop new theory for large-dimensional shape matrix estimate with heavy-tailed populations, and then apply the new theory for developing tests on shape matrices.

In this article, we develop tests for shape matrix in a large-dimensional regime where the parameter of dimension p grows to infinity proportionally to the sample size n such that $p / n \to c \in (0, 1)$ . We focus on two fundamental problems in statistical inference on the shape matrix: test whether the shape matrix of a population is identity and test whether the shape matrices of two populations are equal. The proposed test statistics are based on the eigenvalues of Tyler’s M estimator since among all aforementioned robust estimators, Tyler’s M estimator enjoys certain superiorities: it is affine equivariant and, under ellipticity assumption, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). The proposed procedures are applicable to a broad class of multivariate distributions defined in (2.1). In order to analyze both the null and alternative distributions of the proposed statistics, we develop new theory on the limiting spectrum of a large-dimensional Tyler’s M estimator. Specifically, under the population model (2.1), we establish both the convergence of the empirical spectral distribution (ESD) and the central limit theorem (CLT) for linear spectral statistics (LSSs) of a large-dimensional Tyler’s M estimator. Such theoretical results are brand new and are of independent interest in random matrix theory, with potential applications in developing new approaches for other related robust inferences in large dimensions. Let’s briefly summarize the major contributions of this work below.

From methodological point of view, the proposed tests have several appealing properties. First, they are all distribution-free. We show that their limiting null distributions depend only on the dimension p and sample size n of a dataset. Second, Tyler’s M estimator is an affine equivariant estimator of the shape matrix in the sense that when we apply the affine transformation $x \mapsto Ax ≜ z$ to the data, their corresponding Tyler’s M estimators, denoted as $C_{n} (x)$ and $C_{n} (z)$ , satisfy $C_{n} (z) = A C_{n} (x) A'$ for any deterministic full-rank square matrix $A$ . Thus, Tyler’s M estimator behaves similarly to the sample covariance matrix under linear transformations. In particular, for tests of the two-sample shape matrix problem, this affine equivariance property of Tyler’s M estimator can eliminate the impact from the unknown shape matrix under the null hypothesis. This ensures that the proposed tests are universal to the underlying populations. At the same time, it allows us to investigate some nontrivial cases, including those where the shape matrix has an unbounded spectral norm, etc. These findings are also conformed by both theoretical analysis and Monte Carlo studies.

From technical aspects, we develop new strategies and techniques to study the limiting spectrum of a large-dimensional Tyler’s M estimator from general populations. As far as we know, existing theories on Tyler’s M estimator are mainly developed in the low-dimensional regime for elliptical populations, but remain largely unknown when the dimension p grows to infinity and the underlying population deviates away from the elliptical ones. In addition, Tyler’s M estimator is defined to be the solution to a fixed point equation and has no explicit expression. Therefore, standard approaches and tools in random matrix theory given in Bai and Silverstein (Citation2010) cannot be directly imitated here. To solve this problem, we propose a new theoretical development strategy by comparing the resolvent of Tyler’s M estimator to that of sample spatial-sign covariance matrix (Locantore et al. Citation1999) for the first time, which is eventually achieved by employing a series of new techniques, mainly the method of moments. Note that our techniques can be extended similarly to the study of other implicitly defined M-estimators, such as the regularized Tyler’s M estimator (Kammoun et al. Citation2016), where $lim p / n$ is then allowed to be larger than 1. However, this is beyond the scope of this article.

The rest of the article is organized as follows. Section 2 introduces our model assumptions, and develops tests for shape matrices in both the one-sample and two-sample cases. In Section 3, we conduct simulation results, and illustrate the proposed methodology via an empirical analysis of a real dataset. Section 4 is dedicated to the development of new theory regarding the limiting spectrum of a large-dimensional Tyler’s M-estimator. These are the core technical tools used to derive both the null and alternative distributions of our test statistics. All the proofs are provided in the supplementary material of this article.

2 New Test Procedures for Large-Dimensional Shape Matrix

2.1 Model of the Population

Consider a random vector $x \in R^{p}$ admitting the following stochastic representation, (2.1) $x \overset{d}{=} w Σ^{\frac{1}{2}} z,$ (2.1) where w > 0 is a scalar random variable, $Σ$ is a p × p deterministic positive definite matrix normalized by $tr (Σ) = p$ (normalization of $Σ$ guarantees the model identifiability between w and $Σ$ ), referred to as the shape matrix or scatter matrix of the population, $z = (z_{1}, \dots, z_{p})' \in R^{p}$ is a random vector with iid continuous coordinates $(z_{i})$ having bounded densities and satisfying (2.2) $E (z_{i}) = 0, E (z_{i}^{2}) = 1, E (z_{i}^{4}) ≜ τ, E (z_{i}^{6}) < \infty .$ (2.2)

Remark 2.1.

It is worth noting that w and $z$ in (2.1) are not necessarily independent. Therefore, our model is quite general and covers several important classes of multivariate distributions.

When the random variable w degenerates to a constant, model (2.1) becomes the well studied independent components model (ICM), which provides the most standard setup for independent component analysis (ICA). See Ilmonen and Paindaeine (Citation2011) for detailed discussion about ICM and ICA. The ICM is also a conventional structure assumption in the study of random matrix theory (Bai and Silverstein Citation2010). As demonstrated by the real data example in Section 3.3, model (2.1) can be more general than ICM in practice.
Model (2.1) includes the elliptical distribution family with zero location vector as a special case. To see this, since if $x$ follows an elliptical distribution with zero location vector, it has the stochastic representation $x \overset{d}{=} v Σ^{\frac{1}{2}} u,$ where v > 0 is a scalar random variable and u is a random vector uniformly distributed on the unit sphere in $R^{p}$ (Fang, Kotz, and Ng Citation1990). Further, let $u \overset{d}{=} z / | | z | |$ and $w = v / | | z | |$ , where $z$ is set to be a standard Gaussian random vector (τ = 3). Thus, we have $x \overset{d}{=} v Σ^{\frac{1}{2}} u \overset{d}{=} w Σ^{\frac{1}{2}} z,$ which is indeed a special case of our model (2.1).
By using the same argument as that in (b), model (2.1) includes multivariate $l_{1}$ -norm symmetric distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of $z$ be iid according to the exponential distribution with mean 1.
With a similar argument to that in (b), model (2.1) includes a subclass of multivariate Liouville distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of $z$ be iid according to a Gamma distribution.

Detailed definitions and properties of elliptical distributions, $l_{1}$ -norm symmetric distributions and multivariate Liouville distributions are referred to Fang, Kotz, and Ng (Citation1990).

2.2 Testing the Identity of a Shape Matrix

Let $x_{1}, \dots, x_{n} \in R^{p}$ be a sequence of iid observations from the population $x$ defined in (2.1). Consider testing the following hypothesis for the shape matrix $Σ$ , (2.3) $H_{0} : Σ = I_{p} versus H_{1} : Σ \neq I_{p}$ (2.3) in the large-dimensional asymptotic framework where $p \to \infty, n \to \infty, c_{n} ≜ p / n \to c \in (0, 1) .$

Our procedures are based on the eigenvalues of Tyler’s M estimator (Tyler Citation1987) of the shape $Σ$ , which is defined to be the matrix $C_{n}$ satisfying (2.4) $C_{n} = \frac{p}{n} \sum_{j = 1}^{n} \frac{x_{j} x_{j}^{'}}{x_{j}^{'} C_{n}^{- 1} x_{j}}, tr C_{n} = p .$ (2.4)

Tyler’s M estimator is affine equivariant and under ellipticity assumption and compared with existing robust estimators, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). Kent and Tyler (Citation1988) showed that if any linear subspace in $R^{p}$ of dimension d ( $1 \leq d \leq p - 1$ ) contains less than nd/p observations, then there exists a unique solution to the (2.4). In particular, this condition holds with probability one for the random sample ${x_{j}, j = 1, \dots, n}$ from model (2.1).

We consider the following two statistics for testing (2.3), $T_{1, tr} = tr (C_{n}^{2}) and T_{1, log} = log det (C_{n}) .$

The trace-based statistic $T_{1, tr}$ is motivated by the tests for covariance matrix from John (Citation1971, Citation1972), which measures the squared Frobenius distance between the Tyler’s M estimator $C_{n}$ and the shape matrix $Σ$ under the null. And the log-based statistic $T_{1, log}$ is an analogue of likelihood ratio statistic (Anderson Citation1984), which is based on the entropy loss $L (C_{n}, Σ) ≜ tr C_{n} Σ^{- 1} - log det (C_{n} Σ^{- 1}) - p$ between $C_{n}$ and $Σ$ under the null (James and Stein Citation1961). In the definition of these test statistics, we use the normalization $tr C_{n} = p$ , see (2.4). The null distributions of the two statistics are presented in the following theorem.

Theorem 2.1.

In the large-dimensional asymptotic framework, we have under H₀, $\begin{matrix} \frac{T_{1, tr} - p (1 + c_{n}) + c (c - 3) / (1 - c)}{2 c} \overset{D}{\to} N (0, 1), \\ \frac{T_{1, log} + p - (p - n) log (1 - c_{n}) - 2^{- 1} log (1 - c) + c / (1 - c)}{\sqrt{- 2 log (1 - c) - 2 c}} \\ \overset{D}{\to} N (0, 1) . \end{matrix}$

We prove this theorem by using the new theory on the CLT for LSSs of Tyler’s M estimator, see Theorem 4.2 in Section 4. In particular, the limiting null distributions of the two statistics are only relevant to the population dimension p and sample size n, independent of the fourth moment τ of $(z_{i})$ and also the true underlying distribution of the population $x$ .

To analyze the power of the proposed tests, we consider a local alternative such that $Σ$ is diagonal or τ = 3, and compare our Tyler’s M based test $T_{1, tr}$ with John’s test (John Citation1972), defined as ${\tilde{T}}_{1, tr} ≜ tr (S_{n}^{2}) / {[tr (S_{n}) / p]}^{2},$ where $S_{n} = n^{- 1} \sum_{j = 1}^{n} x_{j} x_{j}^{'} = \frac{1}{n} \sum_{j = 1}^{n} w_{j}^{2} Σ^{\frac{1}{2}} z_{j} z_{j}^{'} Σ^{\frac{1}{2}}$ is the sample covariance matrix. Under the local alternative, the asymptotical distributions of both $T_{1, tr}$ and ${\tilde{T}}_{1, tr}$ are derived in Theorems 2.2 and 2.3, respectively. Using these two theorems, we further derive the asymptotic power functions of these two tests in Proposition 2.1.

Theorem 2.2.

Suppose that $0 < lim inf_{p} λ_{min} (Σ) \leq lim sup_{p} λ_{max} (Σ) < \infty$ , and as $p \to \infty$ , $β_{k, p} ≜ p^{- 1} tr Σ^{k} \to β_{k}, k = 2, 3, 4.$ Then, under either of the two conditions: (i) $Σ$ is diagonal or (ii) τ = 3, we have $T_{1, tr} - tr (Σ^{2}) - p c_{n} \overset{D}{\to} N (μ, σ^{2}),$ where the limiting mean and limiting variance are $μ = 2 c (c - 2) {(c - 1)}^{- 1} - c β_{2} + c (τ - 1) (β_{2} - 1),$ $σ^{2} = 4 c^{2} β_{2}^{2} + 4 c (τ - 1) (β_{2}^{3} - 2 β_{2} β_{3} + β_{4}) .$

Theorem 2.3.

In addition to the assumptions of Theorem 2.2, suppose that w is independent of $z$ and has finite eighth moments, that is, $β_{wk} ≜ E (w^{k}) < \infty, k \in {1, \dots, 8} .$ Then, we have (2.5) $\begin{matrix} \frac{{\tilde{T}}_{1, tr} - tr Σ^{2} - c_{n} p β_{w 2}^{- 2} β_{w 4} - n^{- 1} tr Σ^{2} [(τ - 1) β_{w 2}^{- 2} β_{w 4} - 1]}{σ_{nJ}} \\ \overset{D}{\to} N (0, 1), \end{matrix}$ (2.5) where the scaling parameter σ_nJ is defined by $σ_{nJ}^{2} = {\begin{matrix} n c_{n}^{4} [4 β_{w 4} (β_{w 4}^{2} - β_{w 2} β_{w 6}) β_{w 2}^{- 2} + β_{w 8} - β_{w 4}^{2}] β_{w 2}^{- 4}, \\ var (w^{2}) > 0, \\ 4 c^{2} β_{2}^{2} + 4 c (τ - 1) (β_{2}^{3} - 2 β_{2} β_{3} + β_{4}), \\ var (w^{2}) = 0. \end{matrix}$

Proposition 2.1.

Suppose that the assumptions of Theorem 2.3 hold. Then, the asymptotic power functions of $T_{1, tr}$ and ${\tilde{T}}_{1, tr}$ are given by (2.6) $β_{T_{1, tr}} = Φ (\frac{- 2 c z_{α} + (tr Σ^{2} - p) [1 + n^{- 1} (τ - 2)]}{\sqrt{4 c^{2} β_{2}^{2} + 4 c (τ - 1) (β_{2}^{3} - 2 β_{2} β_{3} + β_{4})}}),$ (2.6) (2.7) $β_{{\tilde{T}}_{1, tr}} = Φ (- z_{α} \frac{σ_{nJ 0}}{σ_{nJ}} + \frac{(tr Σ^{2} - p) [1 + n^{- 1} (τ - 1) β_{w 2}^{- 2} β_{w 4} - n^{- 1}]}{σ_{nJ}}),$ (2.7) where $Φ$ is the standard normal distribution function, $z_{α} = Φ^{- 1} (1 - α)$ is the upper α quantile and (2.8) $σ_{nJ 0}^{2} = {\begin{matrix} n c_{n}^{4} [4 β_{w 4} (β_{w 4}^{2} - β_{w 2} β_{w 6}) β_{w 2}^{- 2} + β_{w 8} - β_{w 4}^{2}] β_{w 2}^{- 4}, \\ var (w^{2}) > 0, \\ 4 c^{2}, \\ var (w^{2}) = 0. \end{matrix}$ (2.8)

Moreover, the asymptotic relative efficiency (ARE) of $T_{1, tr}$ with respect to ${\tilde{T}}_{1, tr}$ is $\begin{matrix} ARE (T_{1, tr}, {\tilde{T}}_{1, tr}) ≐ \frac{σ_{nJ}}{\sqrt{4 c^{2} β_{2}^{2} + 4 c (τ - 1) (β_{2}^{3} - 2 β_{2} β_{3} + β_{4})}} \\ \to {\begin{matrix} \infty, & var (w^{2}) > 0, \\ 1, & var (w^{2}) = 0. \end{matrix} \end{matrix}$

Remark 2.2.

Theorem 2.3 unveils a phase transition phenomenon for ${\tilde{T}}_{1, tr}$ such that its convergence rate is of the order $O (\sqrt{n})$ for the case of nondegenerate w² such as scale mixture models and multivariate t-distributions, while is of the order O(1) for the degenerate case where w² is a constant, say, the independent components model.

Remark 2.3.

For the independent components model, Proposition 2.1 demonstrates an equivalent role of Tyler’s M estimator to the sample covariance matrix in terms of the asymptotic power functions (the two are exactly the same). However, in contrast to the test $T_{1, tr}$ , we note that by setting $w \equiv σ$ and $Σ = I_{p}$ in (2.5), the null distribution of the sample covariance matrix based test ${\tilde{T}}_{1, tr}$ still depends on the unknown fourth moment parameter τ of the atom random variable z_i, which is not distribution free. Thus, an extra estimation of this parameter should be carried out for the test procedure.

Remark 2.4.

For populations with nondegenerate w², Proposition 2.1 reveals an overwhelming superiority of Tyler’s M estimator to the sample covariance matrix. In particular, for those local alternatives where $tr Σ^{2} - p = o (\sqrt{p})$ , Proposition 2.1 shows that the power of the Tyler’s M based test $T_{1, tr}$ tends to 1 while the sample covariance matrix based test ${\tilde{T}}_{1, tr}$ has no power. An intuitive explanation for this is that the noise generated by the variable w is stronger than the signal $tr Σ^{2} - p$ . Therefore, removing the irrelevant variable w from the shape estimation is the main achievement of using Tyler’s M estimator.

2.3 Testing the Equality of Two Shape Matrices

Let $x_{1}$ and $x_{2}$ be two p-dimensional populations with stochastic representations $x_{1} \overset{d}{=} w_{1} Σ_{1}^{\frac{1}{2}} z_{1}$ and $x_{2} \overset{d}{=} w_{2} Σ_{2}^{\frac{1}{2}} z_{2},$ respectively, where both $z_{1} = (z_{1 j})$ and $z_{2} = (z_{2 j})$ satisfy the moment conditions in (2.2) with $E (z_{ij}^{4}) = τ_{i}$ for i = 1, 2. In this section, we consider the following hypothesis for the shape matrices $Σ_{1}$ and $Σ_{2}$ , (2.9) $H_{0} : Σ_{1} = Σ_{2} = Σ versus H_{1} : Σ_{1} \neq Σ_{2},$ (2.9) where $Σ$ is unknown, normalized by $tr (Σ) = p$ . Assume $Σ_{1}$ is invertible and denote by D_p the squared Frobenius distance between the matrix $Σ_{1}^{- 1} Σ_{2} / [tr (Σ_{1}^{- 1} Σ_{2}) / p]$ and $I_{p}$ , (2.10) $D_{p} ≜ \frac{1}{p} tr {[\frac{Σ_{1}^{- 1} Σ_{2}}{tr (Σ_{1}^{- 1} Σ_{2}) / p} - I_{p}]}^{2} = \frac{p tr {(Σ_{1}^{- 1} Σ_{2})}^{2}}{\overset{2}{tr} (Σ_{1}^{- 1} Σ_{2})} - 1,$ (2.10) which equals zero if and only if H₀ is true.

Let ${x_{1 j}, j = 1, \dots, n_{1}}$ and ${x_{2 j}, j = 1, \dots, n_{2}}$ be two independent sets of observations from the two populations $x_{1}$ and $x_{2}$ , respectively. Their sample sizes n₁, n₂ and population dimension p are assumed to be large such that $p \to \infty, n_{1} \to \infty, n_{2} \to \infty, c_{n_{1}} ≜ p / n_{1} \to c_{1} \in (0, 1), c_{n_{2}} ≜ p / n_{2} \to c_{2} \in (0, 1)$ . Denote by $M_{1 n_{1}}$ and $M_{2 n_{2}}$ Tyler’s M estimators of the two shapes. That is, they are the unique solutions to the fixed point equations $M_{i n_{i}} = p n_{i}^{- 1} \sum_{j = 1}^{n_{i}} {(x'_{ij} M_{i n_{i}}^{- 1} x_{ij})}^{- 1} x_{ij} x'_{ij}$ , $tr M_{i n_{i}} = p$ , i = 1, 2. Then, plugging $M_{1 n_{1}}$ and $M_{2 n_{2}}$ into (2.10) yields a naive estimate of D_p, (2.11) ${\hat{D}}_{p} ≜ \frac{p tr {(M_{1 n_{1}}^{- 1} M_{2 n_{2}})}^{2}}{\overset{2}{tr} (M_{1 n_{1}}^{- 1} M_{2 n_{2}})} - 1.$ (2.11)

However, in the large-dimensional asymptotic framework, such ${\hat{D}}_{p}$ is not a consistent estimator of D_p as demonstrated in the first conclusion of Theorem 2.5. It has a bias $c_{n_{1}} / (1 - c_{n_{1}}) + c_{n_{2}}$ which is not negligible asymptotically. By subtracting this bias from ${\hat{D}}_{p}$ , we finally define the following consistent estimator of D_p for testing (2.9), $\begin{matrix} T_{2, tr} ≜ {\hat{D}}_{p} - \frac{c_{n_{1}}}{1 - c_{n_{1}}} - c_{n_{2}} \\ = \frac{p tr {(M_{1 n_{1}}^{- 1} M_{2 n_{2}})}^{2}}{\overset{2}{tr} (M_{1 n_{1}}^{- 1} M_{2 n_{2}})} - \frac{1}{1 - c_{n_{1}}} - c_{n_{2}}, \end{matrix}$ and a large value of such an estimate constitutes an evidence in favor of H₁. The limiting null distribution of $T_{2, tr}$ is given in the following theorem, which is also derived from our general theory presented in Section 4.

Theorem 2.4.

In the large-dimensional asymptotic framework, we have under H₀, $p T_{2, tr} \overset{D}{\to} N (μ, σ^{2}),$ where the limiting mean and limiting variance are $\begin{matrix} μ = \frac{- 3 c_{1} + c_{1}^{2} - 3 c_{2} + 8 c_{1} c_{2} - 3 c_{1}^{2} c_{2} + c_{2}^{2} - c_{1} c_{2}^{2}}{(c_{2} - 1) {(c_{1} - 1)}^{2}}, \\ σ^{2} = \frac{4 c_{1}^{2} + 8 c_{1}^{3} + 8 c_{1} c_{2} - 8 c_{1}^{3} c_{2} + 4 c_{2}^{2} - 8 c_{1} c_{2}^{2} + 4 c_{1}^{2} c_{2}^{2}}{{(c_{1} - 1)}^{4}} . \end{matrix}$

Remark 2.5.

The matrix $M_{1 n_{1}}^{- 1} M_{2 n_{2}}$ in the test statistic $T_{2, tr}$ is the F-ratio of the two Tyler’s M estimators. By their affine equivariance, this test can eliminate the impact from the unknown shape matrix $Σ$ under the null hypothesis. To see this, we denote $C_{1 n_{1}}$ and $C_{2 n_{2}}$ as two Tyler’s M estimators from the atom random vectors ${z_{1 j}, j = 1, \dots, n_{1}}$ and ${z_{2 j}, j = 1, \dots, n_{2}}$ , respectively. That is, they satisfy $C_{i n_{i}} = p n_{i}^{- 1} \sum_{j = 1}^{n_{i}} {(z'_{ij} C_{i n_{i}}^{- 1} z_{ij})}^{- 1} z_{ij} z'_{ij}$ , $tr C_{i n_{i}} = p$ , i = 1, 2. From the affine equivariance, we have the relation between $M_{i n_{i}}$ and $C_{i n_{i}}$ as (2.12) $M_{i n_{i}} = \frac{Σ_{i}^{\frac{1}{2}} C_{i n_{i}} Σ_{i}^{\frac{1}{2}}}{tr (Σ_{i}^{\frac{1}{2}} C_{i n_{i}} Σ_{i}^{\frac{1}{2}}) / p}, i = 1, 2.$ (2.12)

Note that in (2.12), we normalize the matrix $Σ_{i}^{\frac{1}{2}} C_{i n_{i}} Σ_{i}^{\frac{1}{2}}$ so that the trace of the term on the right hand side equals p, which meets the condition $tr M_{i n_{i}} = p$ . Then, under $H_{0} : Σ_{1} = Σ_{2} = Σ$ , we have from (2.11) and (2.12), $\begin{matrix} {\hat{D}}_{p} = \frac{p tr {(Σ_{1}^{- \frac{1}{2}} C_{1 n_{1}}^{- 1} Σ_{1}^{- \frac{1}{2}} Σ_{2}^{\frac{1}{2}} C_{2 n_{2}} Σ_{2}^{\frac{1}{2}})}^{2}}{\overset{2}{tr} (Σ_{1}^{- \frac{1}{2}} C_{1 n_{1}}^{- 1} Σ_{1}^{- \frac{1}{2}} Σ_{2}^{\frac{1}{2}} C_{2 n_{2}} Σ_{2}^{\frac{1}{2}})} - 1 \\ = \frac{p tr {(C_{1 n_{1}}^{- 1} C_{2 n_{2}})}^{2}}{\overset{2}{tr} (C_{1 n_{1}}^{- 1} C_{2 n_{2}})} - 1, \end{matrix}$ which is indeed independent of the common shape matrix $Σ$ under H₀. Therefore, the conclusion of Theorem 2.4 holds true only if the shape matrix $Σ$ is invertible and no other constraints have to be imposed. This point allows us to investigate some nontrivial cases, for example, the one that $Σ$ has an unbounded spectral norm, etc. This case is also studied by our simulations.

Next, we present the convergence of $T_{2, tr}$ for general $Σ_{1}$ and $Σ_{2}$ , from which its power performance can be evaluated theoretically.

Theorem 2.5.

Suppose that $0 < lim inf_{p} λ_{min} (Σ_{1}^{- 1} Σ_{2}) \leq lim sup_{p} λ_{max} (Σ_{1}^{- 1} Σ_{2}) < \infty$ and as $p \to \infty$ , (2.13) $\frac{1}{p} tr {(\frac{Σ_{1}^{- 1} Σ_{2}}{tr (Σ_{1}^{- 1} Σ_{2}) / p})}^{k} \to γ_{k}, k = 2, 3, 4.$ (2.13)

The statistic $T_{2, tr}$ is a strongly consistent estimator of D_p, that is, $T_{2, tr} - D_{p} \to 0$ , almost surely.
If the moment conditions in (2.2) hold for the two populations with $τ_{1} = τ_{2} = 3$ , then we have $p T_{2, tr} - p D_{p} \overset{D}{\to} N (μ, σ^{2}) .$

The limiting mean and limiting variance are $\begin{matrix} μ = \frac{2 c_{1}}{{(c_{1} - 1)}^{2}} - \frac{c_{1} c_{2} + c_{1} γ_{2}}{c_{1} - 1} - \frac{2 c_{2}}{c_{2} - 1} + c_{2} γ_{2}, \\ σ^{2} = 4 c_{2} (\frac{c_{1}^{2} (6 γ_{2} + c_{2})}{{(c_{1} - 1)}^{2}} - \frac{2 c_{1} (γ_{2} c_{2} - γ_{2}^{2} + 2 γ_{3})}{c_{1} - 1} \\ + 2 γ_{4} + γ_{2} (2 γ_{2}^{2} + c_{2} γ_{2} - 4 γ_{3}) - \frac{4 c_{1}^{3}}{{(c_{1} - 1)}^{3}}) \\ + \frac{12 c_{1}^{4}}{{(c_{1} - 1)}^{4}} - \frac{16 c_{1}^{3} γ_{2}}{{(c_{1} - 1)}^{3}} + \frac{4 c_{1}^{2} (4 γ_{3} - 3 γ_{2}^{2})}{{(c_{1} - 1)}^{2}} \\ - \frac{8 c_{1} (γ_{2}^{3} - 2 γ_{3} γ_{2} + γ_{4})}{c_{1} - 1} . \end{matrix}$

Remark 2.6.

The first conclusion of Theorem 2.5 demonstrates the consistency of $T_{2, tr}$ toward the distance D_p and the second conclusion provides the non-null distribution of $T_{2, tr}$ for a local alternative with $τ_{1} = τ_{2} = 3$ . In particular, the test can achieve full power if the quantity pD_p grows to infinity.

2.4 Adjustments When the True Location is Unknown

In this section, we discuss adjustments to the tests developed in Sections 2.2 and 2.3 when the true location vector $μ$ of the population $x$ is unknown, that is, (2.14) $x \overset{d}{=} μ + w Σ^{\frac{1}{2}} z, tr (Σ) = p .$ (2.14)

In addition to the assumptions of the model (2.1), we further assume that the random variable w is independent of $z$ and they satisfy (2.15) $E (w^{- r}) < \infty, E (w^{r}) < \infty, E (z_{i}^{γ}) < \infty$ (2.15) for some r > 4 and $γ > 8$ .

For the one-sample case, given a sequence of iid observations $x_{1}, \dots, x_{n}$ following the same distribution as $x$ in (2.14), Tyler’s M estimator, denoted by ${\bar{C}}_{n}$ , is defined to be the solution to the equation ${\bar{C}}_{n} = p n^{- 1} \sum_{j = 1}^{n} {((x_{j} - \hat{μ})' {\bar{C}}_{n}^{- 1} (x_{j} - \hat{μ}))}^{- 1} (x_{j} - \hat{μ}) (x_{j} - \hat{μ})', tr ({\bar{C}}_{n}) = p$ , where $\hat{μ}$ is an estimate of the true location $μ$ . For robustness, we take into account the location estimate $\hat{μ}$ of the form $\hat{μ} ≜ \sum_{j = 1}^{n} α_{j} x_{j} = μ + \sum_{j = 1}^{n} α_{j} w_{j} Σ^{\frac{1}{2}} z_{j}$ , where $α_{j} = {(\sum_{j = 1}^{n} | | x_{j} - \bar{x} | |^{- 2})}^{- 1} | | x_{j} - \bar{x} | |^{- 2}$ and $\bar{x} = n^{- 1} \sum_{i = 1}^{n} x_{i}$ is the sample mean. Thus, the two test statistics for the hypothesis (2.3) are ${\bar{T}}_{1, tr} = tr ({\bar{C}}_{n}^{2}) and {\bar{T}}_{1, log} = log det ({\bar{C}}_{n}) .$

For the two-sample case, let ${\hat{μ}}_{i} ≜ \sum_{j = 1}^{n_{i}} α_{ij} x_{ij} = μ_{i} + \sum_{j = 1}^{n_{i}} α_{ij} w_{ij} Σ_{i}^{\frac{1}{2}} z_{ij}$ be the robust estimates of the unknown locations $μ_{i}$ , i = 1, 2, respectively, where $α_{ij} = {(\sum_{j = 1}^{n_{i}} | | x_{ij} - {\bar{x}}_{i} | |^{- 2})}^{- 1} | | x_{ij} - {\bar{x}}_{i} | |^{- 2}$ and ${\bar{x}}_{i} = \sum_{j = 1}^{n_{i}} x_{ij} / n_{i}$ being the sample means. Similarly, the two Tyler’s M estimators ${\bar{M}}_{1 n_{1}}$ and ${\bar{M}}_{2 n_{2}}$ are the solutions to the equations ${\bar{M}}_{i n_{i}} = p n_{i}^{- 1} \sum_{j = 1}^{n_{i}} {((x_{ij} - {\hat{μ}}_{i 2})' {\bar{M}}_{i n_{i}}^{- 1} (x_{ij} - {\hat{μ}}_{i 2}))}^{- 1} (x_{ij} - {\hat{μ}}_{i 2}) (x_{ij} - {\hat{μ}}_{i 2})'$ , $tr {\bar{M}}_{i n_{i}} = p$ . Our adjusted test statistic for the equality hypothesis in (2.9) is denoted by (2.16) ${\bar{T}}_{2, tr} = \frac{p tr {({\bar{M}}_{1 n_{1}}^{- 1} {\bar{M}}_{2 n_{2}})}^{2}}{\overset{2}{tr} ({\bar{M}}_{1 n_{1}}^{- 1} {\bar{M}}_{2 n_{2}})} - \frac{n_{1} - 1}{n_{1} - p - 1} - \frac{p}{n_{2} - 1} .$ (2.16)

The asymptotic null distributions of ${{\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\bar{T}}_{2, tr}}$ are listed below, which are direct conclusions of our newly developed theory on the CLT for LSSs of mean-adjusted Tyler’s M estimator detailed in Theorem 4.3.

Theorem 2.6.

In the large-dimensional asymptotic framework and under the null hypothesis in (2.3), we have $\begin{matrix} {\bar{T}}_{1, tr} - p (1 + \frac{p}{n - 1}) \overset{D}{\to} N (\frac{3 c - c^{2}}{1 - c}, 4 c^{2}), \\ {\bar{T}}_{1, log} + p - (p - n + 1) log (1 - \frac{p}{n - 1}) \\ \overset{D}{\to} N (\frac{1}{2} log (1 - c) - \frac{c}{1 - c}, - 2 log (1 - c) - 2 c) . \end{matrix}$

Theorem 2.7.

In the large-dimensional asymptotic framework and under the null hypothesis in (2.9), we have $p {\bar{T}}_{2, tr} \overset{D}{\to} N (μ, σ^{2})$ , where the limiting mean and limiting variance are the same as those given in Theorem 2.4.

Comparing Theorems 2.6 and 2.7 with Theorems 2.1 and 2.4, one can see that the three adjusted statistics ${{\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\bar{T}}_{2, tr}}$ share the same null distributions with ${T_{1, tr}, T_{1, log}, T_{2, tr}}$ except that we have replaced the sample sizes $(n, n_{1}, n_{2})$ with $(n - 1, n_{1} - 1, n_{2} - 1)$ in the centering terms. We will discuss the theoretical foundation of such substitution in Section 4.4.

Remark 2.7.

Note that in Kent and Tyler (Citation1991), the authors studied the existence and uniqueness of simultaneous M-estimates of multivariate location and scatter. These estimates are defined to minimize the objective function $L (μ, Σ) = \sum_{i = 1}^{n} ρ {{(x_{i} - μ)}^{'} Σ^{- 1} (x_{i} - μ)} + \frac{1}{2} n log | Σ |,$ and thus have the following form (2.17) ${\begin{matrix} {\hat{μ}}_{M} = \sum_{i = 1}^{n} v_{i} x_{i} / \sum_{i = 1}^{n} v_{i}, \\ {\hat{Σ}}_{M} = \frac{1}{n} \sum_{i = 1}^{n} v_{i} (x_{i} - {\hat{μ}}_{M}) {(x_{i} - {\hat{μ}}_{M})}^{'}, \end{matrix}$ (2.17) where $v_{i} = u (s_{i}), u (s) = 2 ρ^{'} (s)$ and $s_{i} = {(x_{i} - {\hat{μ}}_{M})}^{'} {\hat{Σ}}_{M}^{- 1} (x_{i} - {\hat{μ}}_{M})$ . By taking $ρ (s) = \frac{1}{2} log s$ , the solution to (2.17) corresponds to Tyler’s M estimator ${\hat{Σ}}_{M}$ and another estimate of location ${\hat{μ}}_{M}$ . However, this choice of $ρ (s)$ does not satisfy the Condition M outlined in Kent and Tyler (Citation1991) for the existence and uniqueness of the solution to (2.17). Although there is potential for further exploration in this direction for simultaneous estimation when the true location is unknown, such theoretical analysis is currently beyond the scope of this article.

Nevertheless, to understand the simultaneous estimates better for future investigations, we have empirically implemented an iterative algorithm to obtain a solution $({\hat{μ}}_{M}, {\hat{Σ}}_{M})$ to (2.17), temporarily setting aside the issue of existence and uniqueness of the solution. Detailed simulations are presented at the end of the supplementary material, and the simulation results seem to imply that such ${\hat{μ}}_{M}$ is asymptotically equivalent to our explicitly defined location estimate $\hat{μ}$ , in terms of the $l_{2}$ -norm.

3 Simulations and Real Data Analysis

In this section, we conduct simulation studies to compare the performance of the proposed tests with existing ones, and illustrate the proposed methodology via a real data example.

3.1 Simulations for Testing (2.3)

We numerically examine the performance of the proposed tests $T_{1, tr}, {\bar{T}}_{1, tr}, T_{1, log}$ , and ${\bar{T}}_{1, log}$ . For comparison, we also conduct John’s test ${\tilde{T}}_{1, tr}$ from Theorem 2.3 and the likelihood ratio test ${\tilde{T}}_{1, log}$ from Bai et al. (Citation2009), both of which are based on the sample covariance matrix. Simulation data are generated from the following two models.

Model (I). The underlying population is multivariate normal $x \sim N (0, Σ)$ where $Σ = I_{p} + Σ_{0}$ with $Σ_{0} = (θ^{| i - j |})$ . We set θ = 0 under the null hypothesis and $θ = 0.2$ under the alternative.

Model (II). The underlying population is a scale mixture, that is, $x = 0.5 N (0, Σ) + 0.5 N (0, 4 Σ)$ . The matrix $Σ$ is the same as the one in Model (I).

The population dimension p is set to be p = 100, 200, 400 and the dimension-to-sample size ratio is c_n = 0.25 and 0.5. The nominal level is fixed at 5%. The test ${\tilde{T}}_{1, tr}$ depends on the mixing parameters of Model (II), which are treated as known in our simulations. All statistics are averaged from 10,000 independent replications.

Empirical sizes of the six tests are summarized in . The results show that the four Tyler’s M based tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}$ , and the test ${\tilde{T}}_{1, tr}$ all have reasonable sizes under both models. Meanwhile, Tyler’s M based tests have slight size inflation when p and n are small. As the dimensions grow, their sizes reach the nominal level of 0.05. For the test ${\tilde{T}}_{1, log}$ , it has accurate sizes under Model (I) but suffers from serious size distortions under Model (II). This is reasonable since the test ${\tilde{T}}_{1, log}$ is only justified for the independent components model.

Table 1 Empirical sizes in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

Display Table

depicts the empirical powers of the six tests. The results of ${\tilde{T}}_{1, log}$ under the mixture model are excluded due to its size inflation. Under Model (I), the powers of these tests are comparable. That is, the tests based on Tyler’s M estimator keep the same efficiency as those based on the sample covariance matrix, and all achieve full power as the dimensions grow. When the underlying population shifts from the normal distribution to the scale mixture, that is, Model (II), the four Tyler’s M based tests remain well-behaved, but the test ${\tilde{T}}_{1, tr}$ shows a significant loss of power, as predicted by Theorem 2.3. For instance, when $(c_{n}, n) = (0.5, 400)$ , the power of ${\tilde{T}}_{1, tr}$ declines from 99.08 to 50.40. This demonstrates that Tyler’s M based tests can gain satisfactory robustness against heavy-tailed and tail-dependent data.

Table 2 Empirical powers in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

Display Table

3.2 Simulations for Testing (2.9)

This section evaluates the performance of $T_{2, tr}$ and ${\bar{T}}_{2, tr}$ . The population dimension is set to be p = 100, 200, 400 and the dimension-to-sample size ratios are $(c_{n 1}, c_{n 2}) = (0.25, 0.5)$ and $(0.25, 0.25)$ . Our first experiment is to examine the robustness of the proposed tests by comparing them with the one constructed from sample covariance matrix (Liu et al. Citation2014), referred to as ${\tilde{T}}_{2, tr}$ . The following two models are considered.

Model (III). The two populations are multivariate normal such that $x_{1} \sim N (0, I_{p})$ and $x_{2} \sim N (0, Γ)$ , where $Γ = diag (γ_{1}, \dots, γ_{p})$ is diagonal with $γ_{j} = 1 - θ + 2 θ (j - 1) / (p - 1)$ . We set θ = 0 ( $Γ = I_{p}$ ) under the null hypothesis and $θ = 0.3$ under the alternative.
Model (IV). The first population is standard normal $x_{1} \sim N (0, I_{p})$ and the second is a contaminated normal formulated by a scale mixture $x_{2} \sim (1 - ε) N (0, Γ) + ε N (0, 4 I_{p}),$ where $ε \in (0, 1)$ represents the proportion of contamination and the matrix $Γ$ is defined as in Model (III). We set the proportion to be fixed at $ε = 0.05$ and hence, under the null hypothesis ( $Γ = I_{p}$ ), the population $x_{2}$ generates a dataset containing 5% outliers.

depicts empirical sizes and powers of the three tests under Models (III) and (IV). It shows that, despite having small size inflation, the tests $T_{2, tr}$ and ${\bar{T}}_{2, tr}$ are both robust against outliers. However, the test ${\tilde{T}}_{2, tr}$ shows a significant size distortion when outliers are present. This again demonstrates the robustness of Tyler’s M based tests in comparison to sample covariance matrix based tests.

Table 3 Empirical sizes and powers in percentiles of $T_{2, tr}, {\bar{T}}_{2, tr}$ , and ${\tilde{T}}_{2, tr}$ at 5% significance under Models (III) and (IV).

Display Table

Our second experiment is designed such that the difference between the two shape matrices only lies in a finite-dimensional subspace. Hence, we adopt the spiked population model (Johnstone Citation2001) as follows.

Model (V). $Σ_{1} \propto I_{p} + θ uu' and Σ_{2} \propto I_{p} + θ vv', θ \in (- 1, + \infty),$ $u$ and $v$ are two unit vectors.

For this model, when θ is close to –1, the matrix $Σ_{j}$ (j = 1, 2) has an eigenvalue close to zero, implying the existence of collinearity among the components of the population. When θ is large or even divergent along with the dimension p, a dominant principle component of the population exists. Hence, our two-sample test detects whether the direction of collinearity or the dominant component of the population changes over spatial scales and through time. Moreover, the common base component $I_{p}$ in $Σ_{1}$ and $Σ_{2}$ can be extended to any positive definite matrix with bounded eigenvalues.

The experimental settings are as follows. The parameter $θ = - 1 + 1 / \sqrt{p}$ for the case of collinearity and $θ = \sqrt{p}$ for the case of the dominant factor. The vectors are $u = w_{1}$ and $v = cos (α) w_{1} + sin (α) w_{2}$ , where $w_{1}$ and $w_{2}$ are orthogonal vectors chosen randomly from the unit sphere in $R^{p}$ . The parameter α which represents the angle between $u$ and $v$ takes values in $[0, π / 4]$ . Samples are drawn from normal populations with dimensions $(p, n_{1}, n_{2}) = (100, 200, 400)$ , (200, 400, 800) and (300, 600, 1200). Empirical powers of $T_{2, tr}$ and ${\bar{T}}_{2, tr}$ under both cases at 5% significance from 10,000 independent replications are plotted in . It shows that the powers grow to 1 as the angle α and/or the dimensions $(p, n_{1}, n_{2})$ increase, which confirms the capacity of $T_{2, tr}$ and ${\bar{T}}_{2, tr}$ to distinguish the two shape matrices.

Fig. 1 Empirical powers of $T_{2, tr}$ and ${\bar{T}}_{2, tr}$ for $θ = - 1 + 1 / \sqrt{p}$ (Left panel) and $θ = \sqrt{p}$ (Right panel) under Model (V) at 5% significance (red horizontal line) from 10,000 independent replications. The dimensions are $(p, n_{1}, n_{2}) = (100, 200, 400)$ , (200, 400, 800) and $(300, 600, 1200)$ .

Fig. 1 Empirical powers of T2,tr and T¯2,tr for θ=−1+1/p (Left panel) and θ=p (Right panel) under Model (V) at 5% significance (red horizontal line) from 10,000 independent replications. The dimensions are (p,n1,n2)=(100,200,400), (200, 400, 800) and (300,600,1200).

3.3 A Real Data Example

To illustrate the proposed methodology, we apply the proposed tests to an empirical analysis of Fama-French 49 industrial portfolios. The dataset containing average value-weighted daily returns of the portfolios from Years 2002 to 2019 can be downloaded from $_$ library.html\#Benchmarks" class="url" >http://mba.tuck.dartmouth.\edu/pages/faculty/ken.french/data{} $_$ library.html\#Benchmarks. To avoid the weekend effect in the stock market, we consider only data subset consisting of every Wednesday’s returns. Since there was a financial crisis in 2008–2009, we focus on two segments data: one from 2002 to 2007 and the other one from 2010 to 2019. Thus, the sample sizes of the data subsets from 2002–2007 and 2010–2019 are 310 and 515, respectively. We apply the proposed tests to examine whether the shape of data changes over time during these two periods.

To demonstrate the proposed model in (2.14) is more general than the ICM in practice, we conduct an exploratory data analysis to illustrate the ICM does not fit the datasets well. Define $T_{1} = E | | x - E (x) | |^{2} / p$ and $T_{2} = E | | x - E (x) | |^{4} / p^{2} .$ Then it can be shown that under assumptions imposed in Section 2.4, if $x$ has stochastic representation (2.14), $T_{2} / T_{1}^{2} \to E (w^{4}) / {E (w^{2})}^{2}$ almost surely as $p \to \infty$ . Note that the ICM corresponds to a constant w, and therefore the ratio $E (w^{4}) / {E (w^{2})}^{2} = 1$ . In general, the ratio $E (w^{4}) / {E (w^{2})}^{2}$ is greater than 1. This motivates us to examine scatterplots of ( $T_{1}^{2}, T_{2}$ ) based on its sample version.

To construct the scatterplots, we generate B = 200 bootstrap samples and calculate the sample version of ( $T_{1}^{2}, T_{2}$ ) for each bootstrap sample. Specifically, for a given bootstrap sample ${x_{j}^{*}, j = 1, \dots, n}$ . Denote ${\hat{T}}_{2}^{*} = n^{- 1} \sum_{j = 1}^{n} | | x_{j}^{*} - {\bar{x}}^{*} | |^{4} / p^{2}$ , and ${\hat{T}}_{1}^{*} = n^{- 1} \sum_{j = 1}^{n} | | x_{j}^{*} - {\bar{x}}^{*} | |^{2} / p$ , where ${\bar{x}}^{*}$ is the sample mean of the bootstrap sample. We obtain $(T_{1}^{* 2}, T_{2}^{*})$ for each bootstrap sample. The scatterplots of $(T_{1}^{* 2}, T_{2}^{*})$ based on the B = 200 bootstrap samples are depicted in . If ICM is valid, one would expect the points $(T_{1}^{* 2}, T_{2}^{*})$ scattering around the straight line y = x since the ratio of $T_{2}^{*}$ to $T_{1}^{* 2}$ should approximately equal 1 under the ICM model assumption. clearly shows that the ratio of $T_{2}^{*}$ to $T_{1}^{* 2}$ is dramatically greater than 1.

Fig. 2 Scatterplots of the statistics ${(T_{1}^{* 2}, T_{2}^{*})}$ for the data from 2002 to 2007 (left panel) and from 2010 to 2019 (right panel). These statistics are calculated from 200 bootstrap samples.

We now examine whether the shape matrix of the 49 assets would change over Years 2002–2007. To this end, we apply the proposed testing procedure for equality of two shape matrices to examine whether the shape matrix for data from 2002 to 2003 equals the shape matrix for data from 2004 to 2005. The resulting p-value is 0.1745, which implies that the shape does not have dramatically change. We further apply the testing procedure for data from 2004 to 2005 versus data from 2006 to 2007, and the resulting p-value is less than 0.0001. This implies that the shape of data has been changing from 2004–2005 to 2006–2007.

Similarly, we test whether the shape matrix of the 49 assets would change over the period of 2010–2019. We apply the testing procedure for data subsets over every two-year period. The resulting p-values for data from 2010–2011 versus 2012–2013, 2012–2013 versus 2014–2015, 2014–2015 versus 2016–2017, 2016–2017 versus 2018–2019 are 0.0030, 0.0112, 0.0037, and 0.0007, respectively. This implies the shape matrix of the data varies over this period.

4 Theoretical Results on the Limiting Spectrum of Tyler’s M Estimator from a General Population

Recall the definitions of our proposed statistics $T_{1, tr}, T_{1, log}, T_{2, tr}$ and their adjusted versions, which are all functionals of the eigenvalues of Tyler’s M estimator (or their F-ratio). To our best knowledge, limiting properties of the eigenvalue distributions of large-dimensional Tyler’s M estimator with general populations have not been explored fully yet and almost all existing works in the literature on M-estimators assume elliptical distributions, see Maronna (Citation1976), Tyler (Citation1987), Couillet, Pascal, and Silverstein (Citation2015), Kammoun et al. (Citation2016), Zhang, Cheng, and Singer (Citation2016), Goes, Lerman, and Nadler (Citation2020), etc. Note that by assuming the parameter of dimension p grows to infinity proportionally to the sample size n (also referred to as Marčenko-Pastur asymptotic regime in random matrix theory), it has been established only recently that for those elliptical distributions, the estimators of Maronna’s M (Couillet, Pascal, and Silverstein Citation2015), Tyler’s M (Zhang, Cheng, and Singer Citation2016) and regularized Tyler’s M (Kammoun et al. Citation2016) all behave similarly and are asymptotically equivalent to the sample covariance matrix from normal distribution. Such equivalence gives the first-order approximation. That is, the ESDs of these robust estimators from elliptical distributions all converge to the renowned Marčenko-Pastur law (Marčenko and Pastur Citation1967). As for the second-order asymptotics for these robust scatters, existing work is even limited and the only available reference is Couillet, Kammoun, and Pascal (Citation2016), where the authors focused on the fluctuation of bilinear forms of regularized Tyler’s M estimator, also from elliptical populations, and it was again established to be the same as the sample covariance matrix asymptotically. However, in the literature of random matrix theory, the convergence rate of bilinear forms of certain random matrix models is $O_{p} (\sqrt{p})$ (Bai, Miao, and Pan Citation2007; Couillet, Kammoun, and Pascal Citation2016), which is in general much slower than the convergence rate of LSSs Bai and Silverstein Citation2004; Pan and Zhou Citation2008; Zheng Citation2012. Therefore, unlike the case for dealing with spectral norms (Couillet, Pascal, and Silverstein Citation2015; Kammoun et al. Citation2016; Zhang, Cheng, and Singer Citation2016) or bilinear forms (Couillet, Kammoun, and Pascal Citation2016), the LSSs of Tyler’s M estimator may fluctuate differently from those of a sample covariance matrix asymptotically.

Therefore, this section aims to develop new theoretical results on the convergence of ESDs (first-order asymptotic) and the CLT for LSSs (second-order asymptotic) of large-dimensional Tyler’s M estimator from general populations. Note that the technique developed in this article is different from the one in Zhang, Cheng, and Singer (Citation2016). The latter depends on the properties of Gaussian and elliptical distributions, say, some concentration bounds, and thus cannot be directly extended to other distributions. When it comes to the second-order asymptotics, the processing becomes more intricate as it involves obtaining high-precision approximations of certain moments of a resolvent matrix, rather than just comparing two matrices in spectral norms. Our strategy is mainly on the method of moment and a comparison technique. To be concrete, the method of moment helps to get rid of the particular distribution of the underlying population, and the comparison technique allows us to compare Tyler’s M estimator with sample spatial-sign covariance matrix for the first time. We investigate the inner structure of their resolvent matrices and finally show that the two share the same first and second order asymptotics in some sense.

4.1 Notations and Assumptions

First, we introduce some notations that are repeatedly used in this article. Let $A_{p}$ be a p × p symmetric or Hermitian matrix with eigenvalues ${(l_{i})}_{1 \leq i \leq p}$ . The ESD of $A_{p}$ is defined as the random measure $F^{A_{p}} = p^{- 1} \sum_{i = 1}^{p} δ_{l_{i}},$ where $δ_{l_{i}}$ is the Dirac mass at point $l_{i}$ . If the ESD sequence ${F^{A_{p}}}$ has a limit when $p \to \infty$ , this limit is referred to as the limiting spectral distribution (LSD) of the sequence. For a given analytical function f(x) defined on $R$ , we call $\int f (x) d F^{A_{p}} (x) = p^{- 1} \sum_{i = 1}^{p} f (l_{i})$ an LSS of the random matrix $A_{p}$ associated with the test function f. The Stieltjes transform of a probability measure G is defined as $m_{G} (z) = \int {(x - z)}^{- 1} dG (x), z \in C^{+} ≜ {z \in C : I (z) > 0}$ . This definition can be extended to the whole complex plane except the support of G.

Next, we recall our assumptions and definitions for easy reading. Let $x_{1}, \dots, x_{n} \in R^{p}$ be a random sample having the following stochastic representation, (4.1) $x_{j} \overset{d}{=} w_{j} Σ^{\frac{1}{2}} z_{j}, tr (Σ) = p .$ (4.1)

Our main assumptions on this model are listed below.

Assumption

(1). Both the sample size n and population dimension p tend to infinity such that $p \to \infty, n \to \infty, c_{n} ≜ p / n \to c \in (0, 1)$ .

Assumption

(2). The ESD of $Σ$ , denoted as H_p, has a bounded support, that is $Supp (H_{p}) \subset [a, b] \subset (0, \infty)$ , and converges weakly to a probability distribution H.

Assumption

(3). The coordinates $(z_{ij})$ of $z_{j}$ are iid continuous random variables having bounded density and satisfying the following moment conditions for some $Δ > 0, E (z_{ij}) = 0, E (z_{ij}^{2}) = 1, E (z_{ij}^{4}) = τ, E | z_{ij} |^{4 + Δ} < \infty$ .

In the following Sections 4.2 and 4.3, we establish the LSD and the CLT for LSSs of Tyler’s M estimator $C_{n}$ defined by $C_{n} = p n^{- 1} \sum_{j = 1}^{n} {(x_{j}^{'} C_{n}^{- 1} x_{j})}^{- 1} x_{j} x_{j}^{'}$ , $tr C_{n} = p$ , where $x_{1}, \dots, x_{n}$ is a random sample from (4.1) satisfying Assumptions (1)–(3). Finally, our results are generalized to cover the case of unknown location in Section 4.4.

Remark 4.1.

We discuss here the reason for imposing Assumption (1). Initially, the definition of Tyler’s M estimator implies that the matrix $C_{n}$ is well-defined only when p < n. As a result, we set the constraint c < 1. It’s worth noting that recent works, such as Zhang, Cheng, and Singer (Citation2016) and Goes, Lerman, and Nadler (Citation2020), have explored the properties of Tyler’s M estimator within this very regime. To bypass the constraint c < 1, one might consider alternative matrix models like the regularized Tyler’s M estimator, as proposed by Ollila and Tyler (Citation2014). While the techniques developed in this article could potentially address non-explicitly defined random matrix models, such discussions are beyond the scope of this article and could be a good topic for future research.

4.2 Limiting Spectral Distribution of $C_{n}$

The following theorem is on the LSD of Tyler’s M estimator $C_{n}$ .

Theorem 4.1.

Under Assumptions (1)–(3), almost surely, the ESD $F^{C_{n}}$ of Tyler’s M estimator $C_{n}$ converges weakly to a probability measure $F^{c, H}$ , whose Stieltjes transform m(z) is a solution to (4.2) $m (z) = \int \frac{1}{t {1 - c - czm (z)} - z} d H (t), z \in C^{+} .$ (4.2)

The solution is unique in the set ${m (z) \in C : - (1 - c) / z$ $+ cm (z) \in C^{+}}$ .

Remark 4.2.

Theorem 4.1 extends the main conclusion in Zhang, Cheng, and Singer (Citation2016), where the same LSD was derived under elliptical distributions. Indeed, according to Remark 2.1(b), elliptical distributions require those atom random vectors ${z_{j}}$ in (4.1) to be standard multivariate normal $N (0, I_{p})$ . In that case, any finite moment of ${z_{ij}}$ exists. Our Theorem 4.1 is indeed universal. It reaches the same conclusion not only for those standard normal ${z_{j}}$ but also for any continuous distributions with finite $(4 + Δ)$ th moment.

Remark 4.3.

The LSD $F^{c, H}$ , characterized by its Stieltjes transform m(z) as defined in (4.2), is a generalized Marčenko-Pastur law, initially introduced in the seminal paper Marčenko and Pastur (Citation1967) to describe the LSD of sample covariance matrices. Hence, Theorem 4.1 illustrates that, in terms of the first-order asymptotics for the ESD, Tyler’s M estimator $C_{n}$ constructed by samples ${x_{j}}_{j = 1}^{n}$ originating from the model (4.1) with shape matrix $Σ$ , is asymptotically equivalent to the sample covariance matrix $S_{n}$ formed by samples drawn from the independent components model with the same population covariance matrix $Σ$ , that is, $S_{n} = n^{- 1} \sum_{i = 1}^{n} Σ^{\frac{1}{2}} z_{j} z'_{j} Σ^{\frac{1}{2}} .$ Indeed, the auxiliary Lemma S1.1, which is crucial for proving our main results, has yielded a stronger conclusion than the current one in Theorem 4.1. Lemma S1.1 in the supplementary material demonstrates that $| | C_{n} - B_{n} | | \to 0$ , almost surely, where $B_{n} = p n^{- 1} \sum_{j = 1}^{n} | | x_{j} | |^{- 2} x_{j} x_{j}^{'}$ is the spatial-sign covariance matrix constructed by samples ${x_{j}}_{j = 1}^{n}$ . Additionally, it is straightforward to verify that $| | B_{n} - S_{n} | | \to 0$ , almost surely. These facts imply that the three matrices, Tyler’s M estimator $C_{n}$ , the spatial-sign covariance matrix $B_{n}$ , and the sample covariance matrix $S_{n}$ , are asymptotically equivalent in terms of spectral norm.

Remark 4.4.

Note that while Theorem 4.1 is established under Assumption (1), if we allow c approach 0, the result indicates that the eigenvalue’s distribution of Tyler’s M estimator $C_{n}$ converges to that of the shape matrix $Σ$ . This finding aligns with the established result derived in Tyler (Citation1987) for the fixed-p scenario.

Remark 4.5.

Let $\underline{m} (z) ≜ cm (z) - (1 - c) / z$ , which is the Stieltjes transform of ${\underline{F}}^{c, H} ≜ c F^{c, H} + (1 - c) δ_{0}$ . Then, (4.2) can be rewritten using $\underline{m} (z)$ as (4.3) $z = - \frac{1}{\underline{m} (z)} + c \int \frac{t}{1 + t \underline{m} (z)} dH (t), z \in C^{+} .$ (4.3)

This quantity $\underline{m} (z)$ will be frequently recalled in the next subsection. For simplicity of notation, we shall sometimes suppress the independent variable z in the functions $\underline{m} (z)$ and m(z).

4.3 Central Limit Theorem for Linear Spectral Statistics of $C_{n}$

This section aims to establish the CLT for LSSs of Tyler’s M estimator $C_{n}$ , that is, the statistics of the type $\int f (x) d F^{C_{n}} (x),$ where f(x) is a given test function. To establish our main result, we introduce a deterministic matrix T, which relates to the shape matrix $Σ$ in the following way, (4.4) $\begin{matrix} T ≜ Σ - \frac{2}{p} Σ^{2} - \frac{τ - 3}{p} Σ^{\frac{1}{2}} diag (Σ) Σ^{\frac{1}{2}} \\ + (\frac{2}{p^{2}} tr Σ^{2} + \frac{τ - 3}{p^{2}} tr [Σ ° Σ]) Σ, \end{matrix}$ (4.4) where “ $°$ ” denotes the Hadamard product of two matrices. Note that the matrix T is asymptotically equivalent to $Σ$ . However, the remaining terms which are of the order $O (1 / p)$ in spectral norm will still contribute when we study the CLT for LSSs of $C_{n}$ . Let ${\tilde{H}}_{p}$ denote the ESD of T. Then, by replacing (c, H) with $(c_{n}, {\tilde{H}}_{p})$ in (4.2), this new equation defines a unique solution, referred as ${\tilde{m}}_{0} (z)$ , and its induced probability measure is referred as $F^{c_{n}, {\tilde{H}}_{p}}$ . In virtue of this measure, we may normalize the LSS of $C_{n}$ as (4.5) $G_{n} (f) ≜ p (\int f (x) d F^{C_{n}} (x) - \int f (x) d F^{c_{n}, {\tilde{H}}_{p}} (x)) .$ (4.5)

In addition, we need the following Assumption (4) for establishing our general CLT, which indeed is required for handling the case when $E (z_{ij}^{4}) = τ \neq 3$ . Note that such an assumption depends not only on the eigenvalues of $Σ$ , but also on its eigenvectors.

Assumption

(4). As $p \to \infty$ , the limits of the following three auxiliary quantities exist, (4.6) $\begin{matrix} ζ_{p} = \frac{1}{p} tr [Σ ° Σ] \to ζ, \\ h_{p} (z) = \frac{1}{p} tr [(Σ^{\frac{1}{2}} {(Σ - z I)}^{- 1} Σ^{\frac{1}{2}}) ° Σ] \to h (z), \\ g_{p} (z_{1}, z_{2}) = \frac{1}{p} tr [(Σ^{\frac{1}{2}} {(Σ - z_{1} I)}^{- 1} Σ^{\frac{1}{2}}) ° (Σ^{\frac{1}{2}} {(Σ - z_{2} I)}^{- 1} Σ^{\frac{1}{2}})] \\ \to g (z_{1}, z_{2}), \end{matrix}$ (4.6) where $z, z_{1}, z_{2} \in C^{+}$ .

We next show that the linear functional ${G_{n} (f)}$ of the eigenvalues of $C_{n}$ defined in (4.5) converges weakly to a Gaussian process.

Theorem 4.2.

Suppose that Assumptions (1)–(4) hold with some $Δ \geq 2$ . Let $f_{1}, \dots, f_{k}$ be k functions analytic on an open set that includes the interval $I_{c} = [\underset{p \to \infty}{lim inf} λ_{{min}^{Σ}} {(1 - \sqrt{c})}^{2}$ , $\underset{p \to \infty}{lim sup} λ_{{max}^{Σ}} {(1 + \sqrt{c})}^{2}] .$ Then the random vector ${G_{n} (f_{1}), \dots, G_{n} (f_{k})}$ converges weakly to a Gaussian random vector ${G_{f_{1}}, \dots, G_{f_{k}}}$ . Its mean function is given by (4.7) $E (G_{f}) = - \frac{1}{2 π i} \oint_{C} f (z) [μ_{1} (z) + μ_{2} (z)] dz,$ (4.7) where (4.8) $\begin{matrix} μ_{1} (z) = \int \frac{c {(\underline{m}' t)}^{2} dH (t)}{\underline{m} {(1 + \underline{m} t)}^{3}} - \int \frac{2 \underline{m}' (1 + z \underline{m}) t^{2} dH (t)}{{(1 + \underline{m} t)}^{2}} \\ + \int \frac{(β_{2} t - t^{2}) dH (t)}{1 + \underline{m} t} \int \frac{2 c \underline{m} \underline{m}' tdH (t)}{{(1 + \underline{m} t)}^{2}}, \\ + (τ - 3) {\frac{c \underline{m}'}{{\underline{m}}^{2}} g' (\frac{- 1}{\underline{m}}, \frac{- 1}{\underline{m}}) + ζ \int \frac{(1 + z \underline{m}) t \underline{m}' dH (t)}{{(1 + \underline{m} t)}^{2}} \\ - \frac{(1 + z \underline{m}) \underline{m}'}{{\underline{m}}^{2}} h' (\frac{- 1}{\underline{m}}) - \int \frac{c \underline{m}' tdH (t)}{{(1 + \underline{m} t)}^{2}} h (\frac{- 1}{\underline{m}})}, \end{matrix}$ (4.8) (4.9) $\begin{matrix} μ_{2} (z) = (z \underline{m})' {\frac{2}{c (1 - c)} z \underline{m} - \frac{2}{c} β_{2} z \underline{m} - \frac{2}{c} z + \frac{2}{c (1 - c)} + 2} \\ - \frac{2}{c} (1 + z \underline{m}) + (τ - 3) {\frac{d}{dz} [z \underline{m} h (\frac{- 1}{\underline{m}})] \\ - \frac{1}{c} (z \underline{m})' z \underline{m} ζ - \frac{1}{c} (1 + z \underline{m}) (z \underline{m})'}, \end{matrix}$ (4.9) in which $β_{2} = \int t^{2} dH (t)$ . Its covariance function is given by (4.10) $\begin{matrix} cov (G_{f}, G_{g}) \\ = - \frac{1}{4 π^{2}} \oint_{C_{1}} \oint_{C_{2}} f (z_{1}) g (z_{2}) [σ_{1} (z_{1}, z_{2}) + (τ - 3) σ_{2} (z_{1}, z_{2})] d z_{1} d z_{2}, \end{matrix}$ (4.10) where $\begin{matrix} σ_{1} (z_{1}, z_{2}) = \frac{2 \underline{m}' (z_{1}) \underline{m}' (z_{2})}{{(\underline{m} (z_{1}) - \underline{m} (z_{2}))}^{2}} \\ + \frac{2}{c} (β_{2} + \frac{1}{\underline{m} (z_{1})} + \frac{1}{\underline{m} (z_{2})}) (z_{1} \underline{m} (z_{1}))' (z_{2} \underline{m} (z_{2}))' \\ - \frac{2 \underline{m}' (z_{1})}{c {\underline{m}}^{2} (z_{1})} (z_{2} \underline{m} (z_{2}))' (1 + z_{1} \underline{m} (z_{1})) \\ - \frac{2 \underline{m}' (z_{2})}{c {\underline{m}}^{2} (z_{2})} (z_{1} \underline{m} (z_{1}))' (1 + z_{2} \underline{m} (z_{2})), \\ σ_{2} (z_{1}, z_{2}) = \frac{\partial^{2}}{\partial z_{1} \partial z_{2}} {cg (\frac{- 1}{\underline{m} (z_{1})}, \frac{- 1}{\underline{m} (z_{2})}) \\ + \frac{ζ}{c} (1 + z_{1} \underline{m} (z_{1})) (1 + z_{2} \underline{m} (z_{2})) \\ - (1 + z_{1} \underline{m} (z_{1})) h (\frac{- 1}{\underline{m} (z_{2})}) \\ - (1 + z_{2} \underline{m} (z_{2})) h (\frac{- 1}{\underline{m} (z_{1})})} . \end{matrix}$

Here, the contours in (4.7) and (4.10) (two in (4.10), which may be assumed to be nonoverlapping) are closed, counter-clockwise orientated in the complex plane and enclosing the support of $F^{c, H}$ .

Remark 4.6.

Theorem 4.2 indicates that the convergence rate for the LSSs of Tyler’s M estimator is of the order O(n) when the dimension p grows to infinity proportionally with the sample size n, while keeping their ratio away from zero asymptotically. Theoretically, our result does not cover scenarios where p is fixed or p diverges but c = 0. In the fixed-p case, Tyler (Citation1987) demonstrates that under certain conditions, $\sqrt{n} (C_{n} - Σ)$ converges in distribution to a matrix normal distribution as $n \to \infty$ . This large sample behavior of $C_{n}$ resembles the classical theory on sample covariance matrix introduced in the book by Anderson (Citation1984). When p diverges but c = 0, random matrix theory suggests that the convergence rate for LSSs becomes the order of $O (\sqrt{pn})$ , as observed in the results for sample covariance matrices in Chen and Pan (Citation2015). Particularly, in order to determine the centering term, as well as the limiting mean and limiting variance of the CLT for LSSs, we must identify the dominant terms that will contribute by controlling the remaining negligible terms up to certain orders, which depends heavily on the convergence rates. As far as we know, there is currently no unified theory that can easily accommodate all these asymptotic frameworks including fixed p, $p / n \to 0$ and $p / n \to c > 0$ . Nevertheless, for practitioners applying the CLT in real data analysis or applications, it’s worth noting that once a dataset is available, one can directly set the value of c equal to p/n. Our CLT result remains valid even if c is small. We have applied this approach in all our simulation experiments and real data analysis provided in Section 3.

Finally, we discuss the results of two special cases covered by Theorem 4.2.

Special case (1): $E z_{ij}^{4} = 3$ . This special case includes the elliptical family of distributions. From Remark 2.1(b), when the sample ${x_{1}, \dots, x_{n}}$ comes from the family of elliptical distributions, the atom random vectors $z_{1}, \dots, z_{n}$ in (4.1) reduce to the standard multivariate normal vectors $N (0, I_{p})$ , which meets $E (z_{ij}^{4}) = 3$ . According to Theorem 4.2 and by setting τ = 3, all the impacts from the eigenvectors of $Σ$ disappear. We establish in the following Proposition 4.1 that the centering term $\int f (x) d F^{c_{n}, {\tilde{H}}_{p}} (x)$ given in (4.16) can also be simplified to a functional of H_p only. In this way, all the quantities including the centering term, limiting mean and limiting variance in Theorem 4.2 are only relevant to the ESD of $Σ$ , that is, the information contained in ${H_{p}}$ .

Proposition 4.1.

Suppose that Assumptions (1)–(3) hold with τ = 3. We have $\begin{matrix} p (\int f (x) d F^{c_{n}, H_{p}} (x) - \int f (x) d F^{c_{n}, {\tilde{H}}_{p}} (x)) \\ \to - \frac{1}{2 π i} \oint_{C} f (z) μ_{3} (z) dz, \end{matrix}$ where $μ_{3} (z) = 2 \underline{m}' (z) \int \frac{β_{2} t - t^{2}}{{(1 + t \underline{m} (z))}^{2}} dH (t) and β_{2} = \int t^{2} dH (t) .$

Special case (2): identity shape matrix. When the shape matrix $Σ = I_{p}$ ( $H_{p} \equiv δ_{1}$ ), Assumption (4) holds automatically. Indeed, the three auxiliary quantities defined in (4.6) are exactly equal to their limits given by ζ = 1, $h (z) = {(1 - z)}^{- 1}$ and $g (z_{1}, z_{2}) = {(1 - z_{1})}^{- 1} {(1 - z_{2})}^{- 1}$ . After some algebra by setting $H = δ_{1}$ in Remark 4.5, we have all the limiting quantities in Theorem 4.2 involving the factor $τ - 3$ equal zero, that is, the terms in (4.8), (4.9) and $σ_{2} (z_{1}, z_{2})$ all disappear. Moreover, by (4.4), we notice that when $Σ = I_{p}$ , the matrix T reduces to the identity matrix $I_{p}$ which does not depend on the factor $τ - 3$ . Thus, for those samples with identity shape matrix, the CLT for LSSs of Tyler’s M estimator $C_{n}$ is irrelevant to the fourth moment τ. Therefore, it is universal in the sense that it does not depend on the specific underlying distribution of the population.

4.4 Adjustments to the CLT When the Location Vector is Unknown

This section considers the CLT for LSSs of Tyler’s M estimator when the true location vector is unknown. To this end, suppose we have a sequence of iid observations ${x_{1}, \dots, x_{n}} \in R^{p}$ having the form (4.11) $x_{j} \overset{d}{=} μ + w_{j} Σ^{\frac{1}{2}} z_{j}, tr (Σ) = p,$ (4.11) where $μ$ is now unknown. We update Assumption (3) of the model (2.1) as follows.

Assumption

(3’). The coordinates $(z_{ij})$ of $z_{j}$ are iid continuous random variables having bounded density and satisfying $E (z_{ij}) = 0, E (z_{ij}^{2}) = 1, E (z_{ij}^{4}) = τ, E (z_{ij}^{γ}) < \infty$ for some $γ > 8$ . The random variables ${w_{j}}$ are independent of ${z_{j}}$ and satisfying $E (w_{j}^{- r}) < \infty, E (w_{j}^{r}) < \infty$ for some r > 4.

Tyler’s M estimator corresponding to the data sample $x_{1}, \dots, x_{n}$ in (4.11), denoted as ${\bar{C}}_{n} = {\bar{C}}_{n} (\hat{μ})$ , is then defined to be the solution of equation (4.12) ${\bar{C}}_{n} = \frac{p}{n} \sum_{j = 1}^{n} \frac{(x_{j} - \hat{μ}) (x_{j} - \hat{μ})'}{(x_{j} - \hat{μ})' {\bar{C}}_{n}^{- 1} (x_{j} - \hat{μ})}, tr ({\bar{C}}_{n}) = p,$ (4.12) where $\hat{μ}$ is an estimate of the true location $μ$ . We consider a class of location estimates ${\hat{μ}}_{λ}$ of the form (4.13) ${\hat{μ}}_{λ} ≜ \sum_{j = 1}^{n} α_{j λ} x_{j} = μ + \sum_{j = 1}^{n} α_{j λ} w_{j} Σ^{\frac{1}{2}} z_{j},$ (4.13) (4.14) $α_{j λ} ≜ \frac{| | x_{j} - \bar{x} | |^{- λ}}{\sum_{j = 1}^{n} | | x_{j} - \bar{x} | |^{- λ}},$ (4.14) where λ is a pre-determined nonnegative number in $[0, 2]$ and $\bar{x} = \sum_{j = 1}^{n} x_{j} / n$ is the sample mean. Notice that the quantitie ${| | x_{j} - \bar{x} | |^{- λ}}$ is close to ${{(\sqrt{p} w_{j})}^{- λ}}$ and thus, by the construction, the weight ${α_{j λ}}$ attempts to approximate the oracle one $w_{j}^{- λ} / \sum_{j = 1}^{n} w_{j}^{- λ} .$ In particular, ${\hat{μ}}_{λ}$ reduces to the usual sample mean if we set λ = 0. With such a class of location estimates ${\hat{μ}}_{λ}$ involved, we establish the relation between the LSSs of ${\bar{C}}_{n} ({\hat{μ}}_{λ})$ and those of $C_{n}$ in the following theorem.

Theorem 4.3.

Suppose that Assumptions (1)-(2)-(3’) hold. Then, we have $p \int f (x) d F^{{\bar{C}}_{n} ({\hat{μ}}_{λ})} (x) - p \int f (x) d F^{C_{n}} (x) \to - \frac{1}{2 π i} \oint_{C} f (z) μ_{4} (z) dz$ in probability, where $\begin{matrix} μ_{4} (z) = [z \underline{m} (z)]' {\frac{- 1 - {\tilde{α}}_{λ} - 2 {\tilde{α}}_{λ} z \underline{m} (z)}{z \underline{m} (z) + {\tilde{α}}_{λ} z \underline{m} (z) [1 + z \underline{m} (z)]} + {\tilde{α}}_{λ} - 1}, \\ {\tilde{α}}_{λ} = \frac{E w^{- 2} E w^{2 - 2 λ}}{{(E w^{- λ})}^{2}} - 1. \end{matrix}$

Theorem 4.3

demonstrates that substituting the location estimate ${\hat{μ}}_{λ}$ for the true location vector $μ$ in Tyler’s M estimator only introduces a mean shift to the CLT for its LSSs. And the shift is characterized by the function $μ_{4} (z)$ , which depends on certain moments of the random variable w as specified in the parameter ${\tilde{α}}_{λ}$ . In particular, by setting λ = 2 in Theorem 4.3, we get ${\tilde{α}}_{λ} = 0$ and the function $μ_{4} (z)$ then becomes irrelevant to the moments of w and possesses a concise form (4.15) $\begin{matrix} μ_{4} (z) = \frac{(1 + z \underline{m} (z)) (\underline{m} (z) + z \underline{m}' (z))}{- z \underline{m} (z)} \\ = p \int \frac{1}{x - z} [d F^{c_{n - 1}, {\tilde{H}}_{p}} (x) - d F^{c_{n}, {\tilde{H}}_{p}} (x)] + o (1), \end{matrix}$ (4.15) where $F^{c_{n}, {\tilde{H}}_{p}}$ and $F^{c_{n - 1}, {\tilde{H}}_{p}}$ with $c_{n - 1} = p / (n - 1)$ are two finite-dimensional proxies for the LSD $F^{c, H}$ with (c, H) replaced by $(c_{n}, {\tilde{H}}_{p})$ and $(c_{n - 1}, {\tilde{H}}_{p})$ , respectively, see Zheng, Bai, and Yao (Citation2015). Such $μ_{4} (z)$ in (4.15) can be canceled out by a simple replacement of the sample size n with n – 1 in the centering term of the LSS defined in (4.5). We summarize such substitution result in the following proposition.

Proposition 4.2.

Suppose that Assumptions (1), (2), (3’), and (4) hold. Denote the normalized LSS of ${\bar{C}}_{n} ({\hat{μ}}_{2})$ as (4.16) ${\bar{G}}_{n} (f) ≜ p (\int f (x) d F^{{\bar{C}}_{n} ({\hat{μ}}_{2})} (x) - \int f (x) d F^{c_{n - 1}, {\tilde{H}}_{p}} (x)) .$ (4.16)

Then the random vector ${{\bar{G}}_{n} (f_{1}), \dots, {\bar{G}}_{n} (f_{k})}$ converges weakly to the same Gaussian random vector ${G_{f_{1}}, \dots, G_{f_{k}}}$ as that in Theorem 4.2.

Very interestingly, we find for the first time that by substituting the true location vector $μ$ with its estimate ${\hat{μ}}_{λ}$ by setting λ = 2 in (4.13), the CLT for LSSs of their corresponding Tyler’s M estimators do not change. The only difference lies in the centering terms for normalizing the LSSs. This is a parallel result to the substitution principle for LSSs of the sample covariance matrix when the unknown mean is estimated by the sample mean, see Zheng, Bai, and Yao (Citation2015).

Supplementary Materials

This supplementary material consists of proofs of the main results developed in Sections 2 and 4, namely Theorems 2.1, 2.2, 2.3, 2.4, 2.5, 4.1, 4.2, 4.3, and Proposition 4.1. Additionally, it includes some additional lemmas for proving these results and simulations for the simultaneous estimates mentioned in Remark 2.7.

Supplemental material

Supplemental Material

Download Zip (1.2 MB)

Supplemental Material

Download PDF (548 KB)

Acknowledgments

The authors would like to thank the Associate Editor and referees for their constructive comments, which lead to significant improvement of this work.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

R. Li’s research is supported by NIH grants NIH R01AI170249 and R01AI136664, W. Li’s research is supported by NSFC (No. 11971293, 12141107), and Q. Wang’s research is supported by NSFC (No. 12171099) and Natural Science Foundation of Shanghai (23ZR1406200).

References

Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis, New York: Wiley.
Google Scholar
Bai, Z., Jiang, D., Yao, J., and Zheng, S. (2009), “Corrections to LRT on Large-Dimensional Covariance Matrix by RMT,” The Annals of Statistics, 37, 3822–3840. DOI: 10.1214/09-AOS694.
Web of Science ®Google Scholar
Bai, Z., Miao, B., and Pan, G. (2007), “On Asymptotics of Eigenvectors of Large Sample Covariance Matrix,” The Annals of Probability, 35, 1532–1572. DOI: 10.1214/009117906000001079.
Web of Science ®Google Scholar
Bai, Z., and Silverstein, J. W. (2004), “CLT for Linear Spectral Statistics of Large-Dimensional Sample Covariance Matrices,” The Annals of Probability, 32, 553–605. DOI: 10.1214/aop/1078415845.
Web of Science ®Google Scholar
Bai, Z., and Silverstein, J. W. (2010), Spectral Analysis of Large Dimensional Random Matrices (2nd ed.), New York: Springer.
Google Scholar
Chen, B., and Pan, G. (2015), “CLT for Linear Spectral Statistics of Normalized Sample Covariance Matrices With the Dimension Much Larger Than the Sample Size,” Bernoulli, 21, 1089–1133. DOI: 10.3150/14-BEJ599.
Web of Science ®Google Scholar
Chen, S., Zhang, L., and Zhong, P. (2010), “Tests for High-Dimensional Covariance Matrices,” Journal of the American Statistical Association, 105, 810–819. DOI: 10.1198/jasa.2010.tm09560.
Web of Science ®Google Scholar
Couillet, R., Kammoun, A., and Pascal, F. (2016), “Second Order Statistics of Robust Estimators of Scatter. Application to GLRT Detection for Elliptical Signals,” Journal of Multivariate Analysis, 143, 249–274. DOI: 10.1016/j.jmva.2015.08.021.
Web of Science ®Google Scholar
Couillet, R., Pascal, F., and Silverstein, J. W. (2015), “The Random Matrix Regime of Maronna’s M-estimator with Elliptically Distributed Samples,” Journal of Multivariate Analysis, 139, 56–78. DOI: 10.1016/j.jmva.2015.02.020.
Web of Science ®Google Scholar
Fang, K., Kotz, S., and Ng, K. W. (1990), Symmetric Multivariate and Related Distributions, London: Chapman and Hall, Ltd.
Google Scholar
Goes, J., Lerman, G., and Nadler, B. (2020), “Robust Sparse Covariance Estimation by Thresholding Tyler’s M-estimator,” The Annals of Statistics, 48, 86–110. DOI: 10.1214/18-AOS1793.
Web of Science ®Google Scholar
Hallin, M., Oja, H., and Paindaeine, D. (2006), “Semipararamtrically Efficient Rank-based Inference for Shape II. Optimal R-estimation of Shape,” The Annals of Statistics, 34, 2757–2789. DOI: 10.1214/009053606000000948.
Web of Science ®Google Scholar
Hallin, M., and Paindaeine, D. (2006), “Seimpararamtrically Efficient Rank-based Inference for Shape I. Optimal Rank-based Test for Sphericity,” The Annals of Statistics, 34, 2707–2756. DOI: 10.1214/009053606000000731.
Web of Science ®Google Scholar
Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” The Annals of Mathematical Statistics, 35, 73–101. DOI: 10.1214/aoms/1177703732.
Google Scholar
Ilmonen, P., and Paindaeine, D. (2011), “Seimpararamtrically Efficient Inference based Signed Ranks in Symmetric Independent Component Models,” The Annals of Statistics, 39, 2448–2476. DOI: 10.1214/11-AOS906.
Web of Science ®Google Scholar
James, W., and Stein, C. (1961), “Estimation with Quadratic Loss,” in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability (Vol. I), pp. 361–379.
Google Scholar
John, S. (1971), “Some Optimal Multivariate Tests,” Biometrika, 58, 123–127. DOI: 10.2307/2334322.
Web of Science ®Google Scholar
John, S. (1972), “The Distribution of a Statistic Used for Testing Sphericity of Normal Distributions,” Biometrika, 59, 169–173. DOI: 10.2307/2334628.
Web of Science ®Google Scholar
Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” The Annals of Statistics, 29, 295–327. DOI: 10.1214/aos/1009210544.
Web of Science ®Google Scholar
Kammoun, A., Couillet, R., Pascal, F., and Alouini, M.-S. (2016), “Convergence and Fluctuations of Regularized Tyler Estimators,” IEEE Transactions on Signal Processing, 64, 1048–1060. DOI: 10.1109/TSP.2015.2494858.
Web of Science ®Google Scholar
Kent, J. T., and Tyler, D. E. (1988), “Maximum Likelihood Estimation for the Wrapped Cauchy Distribution,” Journal of Applied Statistics, 15, 247–254. DOI: 10.1080/02664768800000029.
Google Scholar
Kent, J. T., and Tyler, D. E. (1991), “Redescending M-estimates of Multivariate Location and Scatter,” The Annals of Statistics, 19, 2102–2119. DOI: 10.1214/aos/1176348388.
Web of Science ®Google Scholar
Ledoit, O., and Wolf, M. (2002), “Some Hypothesis Tests for the Covariance Matrix When the Dimension is Large Compared to the Sample Size,” The Annals of Statistics, 30, 1081–1102. DOI: 10.1214/aos/1031689018.
Web of Science ®Google Scholar
Li, J., and Chen, S. (2012), “Two Sample Tests for High-Dimensional Covariance Matrices,” The Annals of Statistics, 40, 908–940. DOI: 10.1214/12-AOS993.
Web of Science ®Google Scholar
Liu, B., Xu, L., Zheng, S., and Tian, G. (2014), “A New Test for the Proportionality of Two Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 131, 293–308. DOI: 10.1016/j.jmva.2014.06.008.
Web of Science ®Google Scholar
Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al. (1999), “Robust Principal Component Analysis for Functional Data,” Test, 8, 1–73. DOI: 10.1007/BF02595862.
Web of Science ®Google Scholar
Magyar, A. F., and Tyler, D. E. (2014), “The Asymptotic Inadmissibility of the Spatial Sign Covariance Matrix for Elliptically Symmetric Distributions,” Biometrika, 101, 673–688. DOI: 10.1093/biomet/asu020.
Web of Science ®Google Scholar
Maronna, R. A. (1976), “Robust M-estimators of Multivariate Location and Scatter,” The Annals of Statistics, 4, 51–67. DOI: 10.1214/aos/1176343347.
Web of Science ®Google Scholar
Marčenko, V., and Pastur, L. (1967), “The Distribution of Eigenvalues in Certain Sets of Random Matrices,” Math. USSR-Sbornik, 1, 457–483.
Google Scholar
Ollila, E., and Tyler, D. E. (2014), “Regularized M-estimators of Scatter Matrix,” IEEE Transactions on Signal Processing, 62, 6059–6070. DOI: 10.1109/TSP.2014.2360826.
Web of Science ®Google Scholar
Pan, G., and Zhou, W. (2008), “Central Limit Theorem for Signal-to-Interference Ratio of Reduced Rank Linear Receiver,” Annals of Applied Probability, 18, 1232–1270.
Web of Science ®Google Scholar
Rousseeuw, P. (1985), “Multivariate Estimation with High Breakdown Point,” in Mathematical Statistics and Applications: Proceedings 4th Pannonian Symposium on Mathematical Statistics Vol. B, ed. W. Grossman, et al., pp. 283–297.
Google Scholar
Schott, J. R. (2007), “A Test for the Equality of Covariance Matrices When the Dimension is Large Relative to the Sample Sizes,” Computational Statistics & Data Analysis, 51, 6535–6542. DOI: 10.1016/j.csda.2007.03.004.
Web of Science ®Google Scholar
Tyler, D. E. (1987), “A Distribution-Free M-Estimator of Multivariate Scatter,” The Annals of Statistics, 15, 234–251. DOI: 10.1214/aos/1176350263.
Web of Science ®Google Scholar
Zhang, T., Cheng, X., and Singer, A. (2016), “Marčenko-Pastur Law for Tyler’s M-estimator,” Journal of Multivariate Analysis, 149, 114–123. DOI: 10.1016/j.jmva.2016.03.010.
Web of Science ®Google Scholar
Zheng, S. (2012), “Central Limit Theorems for Linear Spectral Statistics of Large Dimensional F-matrices,” Annales de l’Institut Henri Poincaré (B) Probability and Statistics 48, 444–476.
Web of Science ®Google Scholar
Zheng, S., Bai, Z., and Yao, J. (2015), “Substitution Principle for CLT of Linear Spectral Statistics of High-Dimensional Sample Covariance Matrices with Applications to Hypothesis Testing,” The Annals of Statistics, 43(2):546–591. DOI: 10.1214/14-AOS1292.
Web of Science ®Google Scholar
Zou, T., Lin, R., Zheng, S., and Tian, G. L. (2021), “Two-Sample Tests for High-Dimensional Covariance Matrices Using both Difference and Ratio,” Electronic Journal of Statistics, 15, 135–210. DOI: 10.1214/20-EJS1783.
Web of Science ®Google Scholar

Tests for Large-Dimensional Shape Matrices via Tyler’s M Estimators

Abstract

1 Introduction

2 New Test Procedures for Large-Dimensional Shape Matrix

2.1 Model of the Population

2.2 Testing the Identity of a Shape Matrix

2.3 Testing the Equality of Two Shape Matrices

2.4 Adjustments When the True Location is Unknown

3 Simulations and Real Data Analysis

3.1 Simulations for Testing (2.3)

Table 1 Empirical sizes in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

Table 2 Empirical powers in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

3.2 Simulations for Testing (2.9)

Table 3 Empirical sizes and powers in percentiles of $T_{2, tr}, {\bar{T}}_{2, tr}$ , and ${\tilde{T}}_{2, tr}$ at 5% significance under Models (III) and (IV).

3.3 A Real Data Example

4 Theoretical Results on the Limiting Spectrum of Tyler’s M Estimator from a General Population

4.1 Notations and Assumptions

4.2 Limiting Spectral Distribution of $C_{n}$

4.3 Central Limit Theorem for Linear Spectral Statistics of $C_{n}$

4.4 Adjustments to the CLT When the Location Vector is Unknown

Supplementary Materials

Supplemental Material

Supplemental Material

Acknowledgments

Disclosure Statement

References

Information for

Open access

Opportunities

Help and information

Tests for Large-Dimensional Shape Matrices via Tyler’s M Estimators

Abstract

1 Introduction

2 New Test Procedures for Large-Dimensional Shape Matrix

2.1 Model of the Population

2.2 Testing the Identity of a Shape Matrix

2.3 Testing the Equality of Two Shape Matrices

2.4 Adjustments When the True Location is Unknown

3 Simulations and Real Data Analysis

3.1 Simulations for Testing (2.3)

Table 1 Empirical sizes in percentiles of the six tests T1,tr, T1,log, T¯1,tr, T¯1,log, T˜1,tr, and T˜1,log at 5% significance.

Table 2 Empirical powers in percentiles of the six tests T1,tr, T1,log, T¯1,tr, T¯1,log, T˜1,tr, and T˜1,log at 5% significance.

3.2 Simulations for Testing (2.9)

Table 3 Empirical sizes and powers in percentiles of T2,tr, T¯2,tr, and T˜2,tr at 5% significance under Models (III) and (IV).

3.3 A Real Data Example

4 Theoretical Results on the Limiting Spectrum of Tyler’s M Estimator from a General Population

4.1 Notations and Assumptions

4.2 Limiting Spectral Distribution of Cn

4.3 Central Limit Theorem for Linear Spectral Statistics of Cn

4.4 Adjustments to the CLT When the Location Vector is Unknown

Supplementary Materials

Supplemental Material

Supplemental Material

Acknowledgments

Disclosure Statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature

Table 1 Empirical sizes in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

Table 2 Empirical powers in percentiles of the six tests $T_{1, tr}, T_{1, log}, {\bar{T}}_{1, tr}, {\bar{T}}_{1, log}, {\tilde{T}}_{1, tr}$ , and ${\tilde{T}}_{1, log}$ at 5% significance.

Table 3 Empirical sizes and powers in percentiles of $T_{2, tr}, {\bar{T}}_{2, tr}$ , and ${\tilde{T}}_{2, tr}$ at 5% significance under Models (III) and (IV).

4.2 Limiting Spectral Distribution of $C_{n}$

4.3 Central Limit Theorem for Linear Spectral Statistics of $C_{n}$