516
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Tests for Large-Dimensional Shape Matrices via Tyler’s M Estimators

ORCID Icon, &
Received 20 Feb 2022, Accepted 24 Apr 2024, Published online: 24 May 2024

Abstract

Tyler’s M estimator, as a robust alternative to the sample covariance matrix, has been widely applied in robust statistics. However, classical theory on Tyler’s M estimator is mainly developed in the low-dimensional regime for elliptical populations. It remains largely unknown when the parameter of dimension p grows proportionally to the sample size n for general populations. By using the eigenvalues of Tyler’s M estimator, this article develops tests for the identity and equality of shape matrices in a large-dimensional framework where the dimension-to-sample size ratio p/n has a limit in (0, 1). The proposed tests can be applied to a broad class of multivariate distributions including the family of elliptical distributions (see model (2.1) for details). To analyze both the null and alternative distributions of the proposed tests, we provide a unified theory on the spectrum of a large-dimensional Tyler’s M estimator when the underlying population is general. Simulation results demonstrate good performance and robustness of our tests. An empirical analysis of the Fama-French 49 industrial portfolios is carried out to demonstrate the shape of the portfolios varying. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

1 Introduction

Tests for shape (or scatter) matrix for multivariate nonnormal data play the same role as tests for covariance matrix in multivariate normal distributions. This is one of the central parts in multivariate analysis (Anderson Citation1984). Many applications including, but not limited to, signal processing, wireless communication system, biology, and financial engineering may impose certain structures on shape or covariance matrix. Thus, test of shape or covariance matrix structure is of primary interest in modern multivariate analysis. Motivated by an empirical analysis of financial economic data in Section 3.3, this article aims to develop tests for shape matrix of large-dimensional data without imposing normality assumption.

As a natural extension of multivariate normal distribution, elliptical symmetric distributions (Fang, Kotz, and Ng Citation1990) are commonly used in the literature of nonparametric multivariate analysis. The shape or scatter matrix of elliptical distribution is proportional to the covariance matrix if its second moment exists and is finite. This motivates researchers to develop statistical inference procedures for the shape matrix rather than the covariance matrix of elliptical distributions. In many real-world applications such as the empirical analysis in Section 3.3, data may exhibit heavy-tailed nature or potentially contaminated by outliers. As it is well known, sample covariance matrix based inference on the shape is sensitive to outliers and heavy tails, and therefore it will lose its efficiency and become less reliable when the underlying populations are elliptically distributed. To cope with this issue, many robust alternatives are used in the literature, including the M-estimators (Huber Citation1964), the minimum volume ellipsoid and minimum determinant estimators (Rousseeuw Citation1985) and Tyler’s M estimator (Tyler Citation1987), etc. Hallin and Paindaeine (Citation2006) and Hallin, Oja, and Paindaeine (Citation2006) proposed a class of rank-based procedures for testing sphericity of elliptical distribution (equivalently testing the shape matrix being an identity matrix), and showed their test for shape is semiparametrically efficient. All aforementioned works assume that the data are finite and fixed-dimensional. The theoretical analyses in these works may not be applicable for large-dimensional data, which are common in many real-world applications nowadays. Thus, it is of great importance in developing new theory that is applicable for establishing asymptotic properties of statistical inference procedures for large-dimensional shape matrix.

Tests for large-dimensional covariance matrix are closely related to tests on shape matrix and have received increasing attention in the literature. Tests for the identity and sphericity of covariance matrix have been studied in the literature Ledoit and Wolf Citation2002; Bai et al. Citation2009; Chen, Zhang, and Zhong Citation2010. Tests for the equality and proportionality of covariance matrices in the two-sample problems have been studied in Schott (Citation2007), Li and Chen (Citation2012), and Zou et al. (Citation2021). A common feature of these conventional tests is that their statistics are constructed from sample covariance matrix or sample correlation matrix, which are both well-behaved for those Gaussian-type and light-tailed populations, but are sensitive to outliers and heavy-tailed distributions. Meanwhile, random matrix theory for sample covariance or correlation matrices have been well developed for Gaussian-type and light-tailed populations, but is less developed for heavy-tailed populations. One of primary goals of this article is to develop new theory for large-dimensional shape matrix estimate with heavy-tailed populations, and then apply the new theory for developing tests on shape matrices.

In this article, we develop tests for shape matrix in a large-dimensional regime where the parameter of dimension p grows to infinity proportionally to the sample size n such that p/nc(0,1). We focus on two fundamental problems in statistical inference on the shape matrix: test whether the shape matrix of a population is identity and test whether the shape matrices of two populations are equal. The proposed test statistics are based on the eigenvalues of Tyler’s M estimator since among all aforementioned robust estimators, Tyler’s M estimator enjoys certain superiorities: it is affine equivariant and, under ellipticity assumption, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). The proposed procedures are applicable to a broad class of multivariate distributions defined in (2.1). In order to analyze both the null and alternative distributions of the proposed statistics, we develop new theory on the limiting spectrum of a large-dimensional Tyler’s M estimator. Specifically, under the population model (2.1), we establish both the convergence of the empirical spectral distribution (ESD) and the central limit theorem (CLT) for linear spectral statistics (LSSs) of a large-dimensional Tyler’s M estimator. Such theoretical results are brand new and are of independent interest in random matrix theory, with potential applications in developing new approaches for other related robust inferences in large dimensions. Let’s briefly summarize the major contributions of this work below.

From methodological point of view, the proposed tests have several appealing properties. First, they are all distribution-free. We show that their limiting null distributions depend only on the dimension p and sample size n of a dataset. Second, Tyler’s M estimator is an affine equivariant estimator of the shape matrix in the sense that when we apply the affine transformation xAxz to the data, their corresponding Tyler’s M estimators, denoted as Cn(x) and Cn(z), satisfy Cn(z)=ACn(x)A for any deterministic full-rank square matrix A. Thus, Tyler’s M estimator behaves similarly to the sample covariance matrix under linear transformations. In particular, for tests of the two-sample shape matrix problem, this affine equivariance property of Tyler’s M estimator can eliminate the impact from the unknown shape matrix under the null hypothesis. This ensures that the proposed tests are universal to the underlying populations. At the same time, it allows us to investigate some nontrivial cases, including those where the shape matrix has an unbounded spectral norm, etc. These findings are also conformed by both theoretical analysis and Monte Carlo studies.

From technical aspects, we develop new strategies and techniques to study the limiting spectrum of a large-dimensional Tyler’s M estimator from general populations. As far as we know, existing theories on Tyler’s M estimator are mainly developed in the low-dimensional regime for elliptical populations, but remain largely unknown when the dimension p grows to infinity and the underlying population deviates away from the elliptical ones. In addition, Tyler’s M estimator is defined to be the solution to a fixed point equation and has no explicit expression. Therefore, standard approaches and tools in random matrix theory given in Bai and Silverstein (Citation2010) cannot be directly imitated here. To solve this problem, we propose a new theoretical development strategy by comparing the resolvent of Tyler’s M estimator to that of sample spatial-sign covariance matrix (Locantore et al. Citation1999) for the first time, which is eventually achieved by employing a series of new techniques, mainly the method of moments. Note that our techniques can be extended similarly to the study of other implicitly defined M-estimators, such as the regularized Tyler’s M estimator (Kammoun et al. Citation2016), where limp/n is then allowed to be larger than 1. However, this is beyond the scope of this article.

The rest of the article is organized as follows. Section 2 introduces our model assumptions, and develops tests for shape matrices in both the one-sample and two-sample cases. In Section 3, we conduct simulation results, and illustrate the proposed methodology via an empirical analysis of a real dataset. Section 4 is dedicated to the development of new theory regarding the limiting spectrum of a large-dimensional Tyler’s M-estimator. These are the core technical tools used to derive both the null and alternative distributions of our test statistics. All the proofs are provided in the supplementary material of this article.

2 New Test Procedures for Large-Dimensional Shape Matrix

2.1 Model of the Population

Consider a random vector xRp admitting the following stochastic representation, (2.1) x=dwΣ12z,(2.1) where w > 0 is a scalar random variable, Σ is a p × p deterministic positive definite matrix normalized by tr(Σ)=p (normalization of Σ guarantees the model identifiability between w and Σ), referred to as the shape matrix or scatter matrix of the population, z=(z1,,zp)Rp is a random vector with iid continuous coordinates (zi) having bounded densities and satisfying (2.2) E(zi)=0,E(zi2)=1,E(zi4)τ,E(zi6)<.(2.2)

Remark 2.1.

It is worth noting that w and z in (2.1) are not necessarily independent. Therefore, our model is quite general and covers several important classes of multivariate distributions.

  1. When the random variable w degenerates to a constant, model (2.1) becomes the well studied independent components model (ICM), which provides the most standard setup for independent component analysis (ICA). See Ilmonen and Paindaeine (Citation2011) for detailed discussion about ICM and ICA. The ICM is also a conventional structure assumption in the study of random matrix theory (Bai and Silverstein Citation2010). As demonstrated by the real data example in Section 3.3, model (2.1) can be more general than ICM in practice.

  2. Model (2.1) includes the elliptical distribution family with zero location vector as a special case. To see this, since if x follows an elliptical distribution with zero location vector, it has the stochastic representation x=dvΣ12u, where v > 0 is a scalar random variable and u is a random vector uniformly distributed on the unit sphere in Rp (Fang, Kotz, and Ng Citation1990). Further, let u=dz/||z|| and w=v/||z||, where z is set to be a standard Gaussian random vector (τ = 3). Thus, we have x=dvΣ12u=dwΣ12z, which is indeed a special case of our model (2.1).

  3. By using the same argument as that in (b), model (2.1) includes multivariate l1-norm symmetric distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of z be iid according to the exponential distribution with mean 1.

  4. With a similar argument to that in (b), model (2.1) includes a subclass of multivariate Liouville distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of z be iid according to a Gamma distribution.

Detailed definitions and properties of elliptical distributions, l1-norm symmetric distributions and multivariate Liouville distributions are referred to Fang, Kotz, and Ng (Citation1990).

2.2 Testing the Identity of a Shape Matrix

Let x1,,xnRp be a sequence of iid observations from the population x defined in (2.1). Consider testing the following hypothesis for the shape matrix Σ, (2.3) H0:Σ=IpversusH1:ΣIp(2.3) in the large-dimensional asymptotic framework where p,n,cnp/nc(0,1).

Our procedures are based on the eigenvalues of Tyler’s M estimator (Tyler Citation1987) of the shape Σ, which is defined to be the matrix Cn satisfying (2.4) Cn=pnj=1nxjxjxjCn1xj,tr Cn=p.(2.4)

Tyler’s M estimator is affine equivariant and under ellipticity assumption and compared with existing robust estimators, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). Kent and Tyler (Citation1988) showed that if any linear subspace in Rp of dimension d (1dp1) contains less than nd/p observations, then there exists a unique solution to the (2.4). In particular, this condition holds with probability one for the random sample {xj,j=1,,n} from model (2.1).

We consider the following two statistics for testing (2.3), T1,tr=tr(Cn2)andT1,log=log det(Cn).

The trace-based statistic T1,tr is motivated by the tests for covariance matrix from John (Citation1971, Citation1972), which measures the squared Frobenius distance between the Tyler’s M estimator Cn and the shape matrix Σ under the null. And the log-based statistic T1,log is an analogue of likelihood ratio statistic (Anderson Citation1984), which is based on the entropy loss L(Cn,Σ)trCnΣ1log det(CnΣ1)p between Cn and Σ under the null (James and Stein Citation1961). In the definition of these test statistics, we use the normalization trCn=p, see (2.4). The null distributions of the two statistics are presented in the following theorem.

Theorem 2.1.

In the large-dimensional asymptotic framework, we have under H0, T1,trp(1+cn)+c(c3)/(1c)2cDN(0,1),T1,log+p(pn)log(1cn)21log(1c)+c/(1c)2log(1c)2cDN(0,1).

We prove this theorem by using the new theory on the CLT for LSSs of Tyler’s M estimator, see Theorem 4.2 in Section 4. In particular, the limiting null distributions of the two statistics are only relevant to the population dimension p and sample size n, independent of the fourth moment τ of (zi) and also the true underlying distribution of the population x.

To analyze the power of the proposed tests, we consider a local alternative such that Σ is diagonal or τ = 3, and compare our Tyler’s M based test T1,tr with John’s test (John Citation1972), defined as T˜1,trtr(Sn2)/[tr(Sn)/p]2, where Sn=n1j=1nxjxj=1nj=1nwj2Σ12zjzjΣ12 is the sample covariance matrix. Under the local alternative, the asymptotical distributions of both T1,tr and T˜1,tr are derived in Theorems 2.2 and 2.3, respectively. Using these two theorems, we further derive the asymptotic power functions of these two tests in Proposition 2.1.

Theorem 2.2.

Suppose that 0<liminfpλmin(Σ)limsuppλmax(Σ)<, and as p, βk,pp1trΣkβk,k=2,3,4. Then, under either of the two conditions: (i) Σ is diagonal or (ii) τ = 3, we have T1,trtr(Σ2)pcnDN(μ,σ2), where the limiting mean and limiting variance are μ=2c(c2)(c1)1cβ2+c(τ1)(β21), σ2=4c2β22+4c(τ1)(β232β2β3+β4).

Theorem 2.3.

In addition to the assumptions of Theorem 2.2, suppose that w is independent of z and has finite eighth moments, that is, βwkE(wk)<,k{1,,8}. Then, we have (2.5) T˜1,trtrΣ2cnpβw22βw4n1trΣ2[(τ1)βw22βw41]σnJDN(0,1),(2.5) where the scaling parameter σnJ is defined by σnJ2={ncn4[4βw4(βw42βw2βw6)βw22+βw8βw42]βw24,var(w2)>0,4c2β22+4c(τ1)(β232β2β3+β4),var(w2)=0.

Proposition 2.1.

Suppose that the assumptions of Theorem 2.3 hold. Then, the asymptotic power functions of T1,tr and T˜1,tr are given by (2.6) βT1,tr=Φ(2czα+(trΣ2p)[1+n1(τ2)]4c2β22+4c(τ1)(β232β2β3+β4)),(2.6) (2.7) βT˜1,tr=Φ(zασnJ0σnJ+(trΣ2p)[1+n1(τ1)βw22βw4n1]σnJ),(2.7) where Φ is the standard normal distribution function, zα=Φ1(1α) is the upper α quantile and (2.8) σnJ02={ncn4[4βw4(βw42βw2βw6)βw22+βw8βw42]βw24,var(w2)>0,4c2,var(w2)=0.(2.8)

Moreover, the asymptotic relative efficiency (ARE) of T1,tr with respect to T˜1,tr is ARE(T1,tr,T˜1,tr)σnJ4c2β22+4c(τ1)(β232β2β3+β4){,var(w2)>0,1,var(w2)=0.

Remark 2.2.

Theorem 2.3 unveils a phase transition phenomenon for T˜1,tr such that its convergence rate is of the order O(n) for the case of nondegenerate w2 such as scale mixture models and multivariate t-distributions, while is of the order O(1) for the degenerate case where w2 is a constant, say, the independent components model.

Remark 2.3.

For the independent components model, Proposition 2.1 demonstrates an equivalent role of Tyler’s M estimator to the sample covariance matrix in terms of the asymptotic power functions (the two are exactly the same). However, in contrast to the test T1,tr, we note that by setting wσ and Σ=Ip in (2.5), the null distribution of the sample covariance matrix based test T˜1,tr still depends on the unknown fourth moment parameter τ of the atom random variable zi, which is not distribution free. Thus, an extra estimation of this parameter should be carried out for the test procedure.

Remark 2.4.

For populations with nondegenerate w2, Proposition 2.1 reveals an overwhelming superiority of Tyler’s M estimator to the sample covariance matrix. In particular, for those local alternatives where trΣ2p=o(p), Proposition 2.1 shows that the power of the Tyler’s M based test T1,tr tends to 1 while the sample covariance matrix based test T˜1,tr has no power. An intuitive explanation for this is that the noise generated by the variable w is stronger than the signal trΣ2p. Therefore, removing the irrelevant variable w from the shape estimation is the main achievement of using Tyler’s M estimator.

2.3 Testing the Equality of Two Shape Matrices

Let x1 and x2 be two p-dimensional populations with stochastic representations x1=dw1Σ112z1 and x2=dw2Σ212z2, respectively, where both z1=(z1j) and z2=(z2j) satisfy the moment conditions in (2.2) with E(zij4)=τi for i = 1, 2. In this section, we consider the following hypothesis for the shape matrices Σ1 and Σ2, (2.9) H0:Σ1=Σ2=ΣversusH1:Σ1Σ2,(2.9) where Σ is unknown, normalized by tr(Σ)=p. Assume Σ1 is invertible and denote by Dp the squared Frobenius distance between the matrix Σ11Σ2/[tr(Σ11Σ2)/p] and Ip, (2.10) Dp1ptr[Σ11Σ2tr(Σ11Σ2)/pIp]2=ptr(Σ11Σ2)2tr2(Σ11Σ2)1,(2.10) which equals zero if and only if H0 is true.

Let {x1j,j=1,,n1} and {x2j,j=1,,n2} be two independent sets of observations from the two populations x1 and x2, respectively. Their sample sizes n1, n2 and population dimension p are assumed to be large such that p, n1, n2, cn1p/n1c1(0,1), cn2p/n2c2(0,1). Denote by M1n1 and M2n2 Tyler’s M estimators of the two shapes. That is, they are the unique solutions to the fixed point equations Mini=pni1j=1ni(xijMini1xij)1xijxij, trMini=p, i = 1, 2. Then, plugging M1n1 and M2n2 into (2.10) yields a naive estimate of Dp, (2.11) D̂pptr(M1n11M2n2)2tr2(M1n11M2n2)1.(2.11)

However, in the large-dimensional asymptotic framework, such D̂p is not a consistent estimator of Dp as demonstrated in the first conclusion of Theorem 2.5. It has a bias cn1/(1cn1)+cn2 which is not negligible asymptotically. By subtracting this bias from D̂p, we finally define the following consistent estimator of Dp for testing (2.9), T2,trD̂pcn11cn1cn2=ptr(M1n11M2n2)2tr2(M1n11M2n2)11cn1cn2,and a large value of such an estimate constitutes an evidence in favor of H1. The limiting null distribution of T2,tr is given in the following theorem, which is also derived from our general theory presented in Section 4.

Theorem 2.4.

In the large-dimensional asymptotic framework, we have under H0, pT2,trDN(μ,σ2), where the limiting mean and limiting variance are μ=3c1+c123c2+8c1c23c12c2+c22c1c22(c21)(c11)2,σ2=4c12+8c13+8c1c28c13c2+4c228c1c22+4c12c22(c11)4.

Remark 2.5.

The matrix M1n11M2n2 in the test statistic T2,tr is the F-ratio of the two Tyler’s M estimators. By their affine equivariance, this test can eliminate the impact from the unknown shape matrix Σ under the null hypothesis. To see this, we denote C1n1 and C2n2 as two Tyler’s M estimators from the atom random vectors {z1j,j=1,,n1} and {z2j,j=1,,n2}, respectively. That is, they satisfy Cini=pni1j=1ni(zijCini1zij)1zijzij, trCini=p, i = 1, 2. From the affine equivariance, we have the relation between Mini and Cini as (2.12) Mini=Σi12CiniΣi12tr(Σi12CiniΣi12)/p,i=1,2.(2.12)

Note that in (2.12), we normalize the matrix Σi12CiniΣi12 so that the trace of the term on the right hand side equals p, which meets the condition trMini=p. Then, under H0:Σ1=Σ2=Σ, we have from (2.11) and (2.12), D̂p=ptr(Σ112C1n11Σ112Σ212C2n2Σ212)2tr2(Σ112C1n11Σ112Σ212C2n2Σ212)1=ptr(C1n11C2n2)2tr2(C1n11C2n2)1,which is indeed independent of the common shape matrix Σ under H0. Therefore, the conclusion of Theorem 2.4 holds true only if the shape matrix Σ is invertible and no other constraints have to be imposed. This point allows us to investigate some nontrivial cases, for example, the one that Σ has an unbounded spectral norm, etc. This case is also studied by our simulations.

Next, we present the convergence of T2,tr for general Σ1 and Σ2, from which its power performance can be evaluated theoretically.

Theorem 2.5.

Suppose that 0<liminfpλmin(Σ11Σ2)limsuppλmax(Σ11Σ2)< and as p, (2.13) 1ptr(Σ11Σ2tr(Σ11Σ2)/p)kγk,k=2,3,4.(2.13)

  1. The statistic T2,tr is a strongly consistent estimator of Dp, that is, T2,trDp0, almost surely.

  2. If the moment conditions in (2.2) hold for the two populations with τ1=τ2=3, then we have pT2,trpDpDN(μ,σ2).

The limiting mean and limiting variance are μ=2c1(c11)2c1c2+c1γ2c112c2c21+c2γ2,σ2=4c2(c12(6γ2+c2)(c11)22c1(γ2c2γ22+2γ3)c11+2γ4+γ2(2γ22+c2γ24γ3)4c13(c11)3)+12c14(c11)416c13γ2(c11)3+4c12(4γ33γ22)(c11)28c1(γ232γ3γ2+γ4)c11.

Remark 2.6.

The first conclusion of Theorem 2.5 demonstrates the consistency of T2,tr toward the distance Dp and the second conclusion provides the non-null distribution of T2,tr for a local alternative with τ1=τ2=3. In particular, the test can achieve full power if the quantity pDp grows to infinity.

2.4 Adjustments When the True Location is Unknown

In this section, we discuss adjustments to the tests developed in Sections 2.2 and 2.3 when the true location vector μ of the population x is unknown, that is, (2.14) x=dμ+wΣ12z,tr(Σ)=p.(2.14)

In addition to the assumptions of the model (2.1), we further assume that the random variable w is independent of z and they satisfy (2.15) E(wr)<,E(wr)<,E(ziγ)<(2.15) for some r > 4 and γ>8.

For the one-sample case, given a sequence of iid observations x1,,xn following the same distribution as x in (2.14), Tyler’s M estimator, denoted by C¯n, is defined to be the solution to the equation C¯n=pn1j=1n((xjμ̂)C¯n1(xjμ̂))1(xjμ̂)(xjμ̂), tr(C¯n)=p, where μ̂ is an estimate of the true location μ. For robustness, we take into account the location estimate μ̂ of the form μ̂j=1nαjxj=μ+j=1nαjwjΣ12zj, where αj=(j=1n||xjx¯||2)1||xjx¯||2 and x¯=n1i=1nxi is the sample mean. Thus, the two test statistics for the hypothesis (2.3) are T¯1,tr=tr(C¯n2)andT¯1,log=log det(C¯n).

For the two-sample case, let μ̂ij=1niαijxij=μi+j=1niαijwijΣi12zij be the robust estimates of the unknown locations μi, i = 1, 2, respectively, where αij=(j=1ni||xijx¯i||2)1||xijx¯i||2 and x¯i=j=1nixij/ni being the sample means. Similarly, the two Tyler’s M estimators M¯1n1 and M¯2n2 are the solutions to the equations M¯ini=pni1j=1ni((xijμ̂i2)M¯ini1(xijμ̂i2))1(xijμ̂i2)(xijμ̂i2), trM¯ini=p. Our adjusted test statistic for the equality hypothesis in (2.9) is denoted by (2.16) T¯2,tr=ptr(M¯1n11M¯2n2)2tr2(M¯1n11M¯2n2)n11n1p1pn21.(2.16)

The asymptotic null distributions of {T¯1,tr,T¯1,log,T¯2,tr} are listed below, which are direct conclusions of our newly developed theory on the CLT for LSSs of mean-adjusted Tyler’s M estimator detailed in Theorem 4.3.

Theorem 2.6.

In the large-dimensional asymptotic framework and under the null hypothesis in (2.3), we have T¯1,trp(1+pn1)DN(3cc21c,4c2),T¯1,log+p(pn+1)log(1pn1)DN(12log(1c)c1c,2log(1c)2c).

Theorem 2.7.

In the large-dimensional asymptotic framework and under the null hypothesis in (2.9), we have pT¯2,trDN(μ,σ2), where the limiting mean and limiting variance are the same as those given in Theorem 2.4.

Comparing Theorems 2.6 and 2.7 with Theorems 2.1 and 2.4, one can see that the three adjusted statistics {T¯1,tr,T¯1,log,T¯2,tr} share the same null distributions with {T1,tr,T1,log,T2,tr} except that we have replaced the sample sizes (n,n1,n2) with (n1,n11,n21) in the centering terms. We will discuss the theoretical foundation of such substitution in Section 4.4.

Remark 2.7.

Note that in Kent and Tyler (Citation1991), the authors studied the existence and uniqueness of simultaneous M-estimates of multivariate location and scatter. These estimates are defined to minimize the objective function L(μ,Σ)=i=1nρ{(xiμ)Σ1(xiμ)}+12nlog|Σ|, and thus have the following form (2.17) {μ̂M=i=1nvixi/i=1nvi,Σ̂M=1ni=1nvi(xiμ̂M)(xiμ̂M),(2.17) where vi=u(si),u(s)=2ρ(s) and si=(xiμ̂M)Σ̂M1(xiμ̂M). By taking ρ(s)=12logs, the solution to (2.17) corresponds to Tyler’s M estimator Σ̂M and another estimate of location μ̂M. However, this choice of ρ(s) does not satisfy the Condition M outlined in Kent and Tyler (Citation1991) for the existence and uniqueness of the solution to (2.17). Although there is potential for further exploration in this direction for simultaneous estimation when the true location is unknown, such theoretical analysis is currently beyond the scope of this article.

Nevertheless, to understand the simultaneous estimates better for future investigations, we have empirically implemented an iterative algorithm to obtain a solution (μ̂M,Σ̂M) to (2.17), temporarily setting aside the issue of existence and uniqueness of the solution. Detailed simulations are presented at the end of the supplementary material, and the simulation results seem to imply that such μ̂M is asymptotically equivalent to our explicitly defined location estimate μ̂, in terms of the l2-norm.

3 Simulations and Real Data Analysis

In this section, we conduct simulation studies to compare the performance of the proposed tests with existing ones, and illustrate the proposed methodology via a real data example.

3.1 Simulations for Testing (2.3)

We numerically examine the performance of the proposed tests T1,tr,T¯1,tr,T1,log, and T¯1,log. For comparison, we also conduct John’s test T˜1,tr from Theorem 2.3 and the likelihood ratio test T˜1,log from Bai et al. (Citation2009), both of which are based on the sample covariance matrix. Simulation data are generated from the following two models.

Model (I). The underlying population is multivariate normal xN(0,Σ) where Σ=Ip+Σ0 with Σ0=(θ|ij|). We set θ = 0 under the null hypothesis and θ=0.2 under the alternative.

Model (II). The underlying population is a scale mixture, that is, x=0.5N(0,Σ)+0.5N(0,4Σ). The matrix Σ is the same as the one in Model (I).

The population dimension p is set to be p = 100, 200, 400 and the dimension-to-sample size ratio is cn = 0.25 and 0.5. The nominal level is fixed at 5%. The test T˜1,tr depends on the mixing parameters of Model (II), which are treated as known in our simulations. All statistics are averaged from 10,000 independent replications.

Empirical sizes of the six tests are summarized in . The results show that the four Tyler’s M based tests T1,tr,T1,log,T¯1,tr,T¯1,log, and the test T˜1,tr all have reasonable sizes under both models. Meanwhile, Tyler’s M based tests have slight size inflation when p and n are small. As the dimensions grow, their sizes reach the nominal level of 0.05. For the test T˜1,log, it has accurate sizes under Model (I) but suffers from serious size distortions under Model (II). This is reasonable since the test T˜1,log is only justified for the independent components model.

Table 1 Empirical sizes in percentiles of the six tests T1,tr, T1,log, T¯1,tr, T¯1,log, T˜1,tr, and T˜1,log at 5% significance.

depicts the empirical powers of the six tests. The results of T˜1,log under the mixture model are excluded due to its size inflation. Under Model (I), the powers of these tests are comparable. That is, the tests based on Tyler’s M estimator keep the same efficiency as those based on the sample covariance matrix, and all achieve full power as the dimensions grow. When the underlying population shifts from the normal distribution to the scale mixture, that is, Model (II), the four Tyler’s M based tests remain well-behaved, but the test T˜1,tr shows a significant loss of power, as predicted by Theorem 2.3. For instance, when (cn,n)=(0.5,400), the power of T˜1,tr declines from 99.08 to 50.40. This demonstrates that Tyler’s M based tests can gain satisfactory robustness against heavy-tailed and tail-dependent data.

Table 2 Empirical powers in percentiles of the six tests T1,tr, T1,log, T¯1,tr, T¯1,log, T˜1,tr, and T˜1,log at 5% significance.

3.2 Simulations for Testing (2.9)

This section evaluates the performance of T2,tr and T¯2,tr. The population dimension is set to be p = 100, 200, 400 and the dimension-to-sample size ratios are (cn1,cn2)=(0.25,0.5) and (0.25,0.25). Our first experiment is to examine the robustness of the proposed tests by comparing them with the one constructed from sample covariance matrix (Liu et al. Citation2014), referred to as T˜2,tr. The following two models are considered.

  • Model (III). The two populations are multivariate normal such that x1N(0,Ip) and x2N(0,Γ), where Γ=diag(γ1,,γp) is diagonal with γj=1θ+2θ(j1)/(p1). We set θ = 0 (Γ=Ip) under the null hypothesis and θ=0.3 under the alternative.

  • Model (IV). The first population is standard normal x1N(0,Ip) and the second is a contaminated normal formulated by a scale mixture x2(1ε)N(0,Γ)+εN(0,4Ip), where ε(0,1) represents the proportion of contamination and the matrix Γ is defined as in Model (III). We set the proportion to be fixed at ε=0.05 and hence, under the null hypothesis (Γ=Ip), the population x2 generates a dataset containing 5% outliers.

depicts empirical sizes and powers of the three tests under Models (III) and (IV). It shows that, despite having small size inflation, the tests T2,tr and T¯2,tr are both robust against outliers. However, the test T˜2,tr shows a significant size distortion when outliers are present. This again demonstrates the robustness of Tyler’s M based tests in comparison to sample covariance matrix based tests.

Table 3 Empirical sizes and powers in percentiles of T2,tr, T¯2,tr, and T˜2,tr at 5% significance under Models (III) and (IV).

Our second experiment is designed such that the difference between the two shape matrices only lies in a finite-dimensional subspace. Hence, we adopt the spiked population model (Johnstone Citation2001) as follows.

  • Model (V). Σ1Ip+θuuand Σ2Ip+θvv,θ(1,+), u and v are two unit vectors.

For this model, when θ is close to –1, the matrix Σj (j = 1, 2) has an eigenvalue close to zero, implying the existence of collinearity among the components of the population. When θ is large or even divergent along with the dimension p, a dominant principle component of the population exists. Hence, our two-sample test detects whether the direction of collinearity or the dominant component of the population changes over spatial scales and through time. Moreover, the common base component Ip in Σ1 and Σ2 can be extended to any positive definite matrix with bounded eigenvalues.

The experimental settings are as follows. The parameter θ=1+1/p for the case of collinearity and θ=p for the case of the dominant factor. The vectors are u=w1 and v=cos(α)w1+sin(α)w2, where w1 and w2 are orthogonal vectors chosen randomly from the unit sphere in Rp. The parameter α which represents the angle between u and v takes values in [0,π/4]. Samples are drawn from normal populations with dimensions (p,n1,n2)=(100,200,400), (200, 400, 800) and (300, 600, 1200). Empirical powers of T2,tr and T¯2,tr under both cases at 5% significance from 10,000 independent replications are plotted in . It shows that the powers grow to 1 as the angle α and/or the dimensions (p,n1,n2) increase, which confirms the capacity of T2,tr and T¯2,tr to distinguish the two shape matrices.

Fig. 1 Empirical powers of T2,tr and T¯2,tr for θ=1+1/p (Left panel) and θ=p (Right panel) under Model (V) at 5% significance (red horizontal line) from 10,000 independent replications. The dimensions are (p,n1,n2)=(100,200,400), (200, 400, 800) and (300,600,1200).

Fig. 1 Empirical powers of T2,tr and T¯2,tr for θ=−1+1/p (Left panel) and θ=p (Right panel) under Model (V) at 5% significance (red horizontal line) from 10,000 independent replications. The dimensions are (p,n1,n2)=(100,200,400), (200, 400, 800) and (300,600,1200).

3.3 A Real Data Example

To illustrate the proposed methodology, we apply the proposed tests to an empirical analysis of Fama-French 49 industrial portfolios. The dataset containing average value-weighted daily returns of the portfolios from Years 2002 to 2019 can be downloaded from _ library.html\#Benchmarks" class="url" >http://mba.tuck.dartmouth.\edu/pages/faculty/ken.french/data{}_ library.html\#Benchmarks. To avoid the weekend effect in the stock market, we consider only data subset consisting of every Wednesday’s returns. Since there was a financial crisis in 2008–2009, we focus on two segments data: one from 2002 to 2007 and the other one from 2010 to 2019. Thus, the sample sizes of the data subsets from 2002–2007 and 2010–2019 are 310 and 515, respectively. We apply the proposed tests to examine whether the shape of data changes over time during these two periods.

To demonstrate the proposed model in (2.14) is more general than the ICM in practice, we conduct an exploratory data analysis to illustrate the ICM does not fit the datasets well. Define T1=E||xE(x)||2/p and T2=E||xE(x)||4/p2. Then it can be shown that under assumptions imposed in Section 2.4, if x has stochastic representation (2.14), T2/T12E(w4)/{E(w2)}2 almost surely as p. Note that the ICM corresponds to a constant w, and therefore the ratio E(w4)/{E(w2)}2=1. In general, the ratio E(w4)/{E(w2)}2 is greater than 1. This motivates us to examine scatterplots of (T12,T2) based on its sample version.

To construct the scatterplots, we generate B = 200 bootstrap samples and calculate the sample version of (T12,T2) for each bootstrap sample. Specifically, for a given bootstrap sample {xj*,j=1,,n}. Denote T̂2*=n1j=1n||xj*x¯*||4/p2, and T̂1*=n1j=1n||xj*x¯*||2/p, where x¯* is the sample mean of the bootstrap sample. We obtain (T1*2,T2*) for each bootstrap sample. The scatterplots of (T1*2,T2*) based on the B = 200 bootstrap samples are depicted in . If ICM is valid, one would expect the points (T1*2,T2*) scattering around the straight line y = x since the ratio of T2* to T1*2 should approximately equal 1 under the ICM model assumption. clearly shows that the ratio of T2* to T1*2 is dramatically greater than 1.

Fig. 2 Scatterplots of the statistics {(T1*2,T2*)} for the data from 2002 to 2007 (left panel) and from 2010 to 2019 (right panel). These statistics are calculated from 200 bootstrap samples.

Fig. 2 Scatterplots of the statistics {(T1*2,T2*)} for the data from 2002 to 2007 (left panel) and from 2010 to 2019 (right panel). These statistics are calculated from 200 bootstrap samples.

We now examine whether the shape matrix of the 49 assets would change over Years 2002–2007. To this end, we apply the proposed testing procedure for equality of two shape matrices to examine whether the shape matrix for data from 2002 to 2003 equals the shape matrix for data from 2004 to 2005. The resulting p-value is 0.1745, which implies that the shape does not have dramatically change. We further apply the testing procedure for data from 2004 to 2005 versus data from 2006 to 2007, and the resulting p-value is less than 0.0001. This implies that the shape of data has been changing from 2004–2005 to 2006–2007.

Similarly, we test whether the shape matrix of the 49 assets would change over the period of 2010–2019. We apply the testing procedure for data subsets over every two-year period. The resulting p-values for data from 2010–2011 versus 2012–2013, 2012–2013 versus 2014–2015, 2014–2015 versus 2016–2017, 2016–2017 versus 2018–2019 are 0.0030, 0.0112, 0.0037, and 0.0007, respectively. This implies the shape matrix of the data varies over this period.

4 Theoretical Results on the Limiting Spectrum of Tyler’s M Estimator from a General Population

Recall the definitions of our proposed statistics T1,tr, T1,log, T2,tr and their adjusted versions, which are all functionals of the eigenvalues of Tyler’s M estimator (or their F-ratio). To our best knowledge, limiting properties of the eigenvalue distributions of large-dimensional Tyler’s M estimator with general populations have not been explored fully yet and almost all existing works in the literature on M-estimators assume elliptical distributions, see Maronna (Citation1976), Tyler (Citation1987), Couillet, Pascal, and Silverstein (Citation2015), Kammoun et al. (Citation2016), Zhang, Cheng, and Singer (Citation2016), Goes, Lerman, and Nadler (Citation2020), etc. Note that by assuming the parameter of dimension p grows to infinity proportionally to the sample size n (also referred to as Marčenko-Pastur asymptotic regime in random matrix theory), it has been established only recently that for those elliptical distributions, the estimators of Maronna’s M (Couillet, Pascal, and Silverstein Citation2015), Tyler’s M (Zhang, Cheng, and Singer Citation2016) and regularized Tyler’s M (Kammoun et al. Citation2016) all behave similarly and are asymptotically equivalent to the sample covariance matrix from normal distribution. Such equivalence gives the first-order approximation. That is, the ESDs of these robust estimators from elliptical distributions all converge to the renowned Marčenko-Pastur law (Marčenko and Pastur Citation1967). As for the second-order asymptotics for these robust scatters, existing work is even limited and the only available reference is Couillet, Kammoun, and Pascal (Citation2016), where the authors focused on the fluctuation of bilinear forms of regularized Tyler’s M estimator, also from elliptical populations, and it was again established to be the same as the sample covariance matrix asymptotically. However, in the literature of random matrix theory, the convergence rate of bilinear forms of certain random matrix models is Op(p) (Bai, Miao, and Pan Citation2007; Couillet, Kammoun, and Pascal Citation2016), which is in general much slower than the convergence rate of LSSs Bai and Silverstein Citation2004; Pan and Zhou Citation2008; Zheng Citation2012. Therefore, unlike the case for dealing with spectral norms (Couillet, Pascal, and Silverstein Citation2015; Kammoun et al. Citation2016; Zhang, Cheng, and Singer Citation2016) or bilinear forms (Couillet, Kammoun, and Pascal Citation2016), the LSSs of Tyler’s M estimator may fluctuate differently from those of a sample covariance matrix asymptotically.

Therefore, this section aims to develop new theoretical results on the convergence of ESDs (first-order asymptotic) and the CLT for LSSs (second-order asymptotic) of large-dimensional Tyler’s M estimator from general populations. Note that the technique developed in this article is different from the one in Zhang, Cheng, and Singer (Citation2016). The latter depends on the properties of Gaussian and elliptical distributions, say, some concentration bounds, and thus cannot be directly extended to other distributions. When it comes to the second-order asymptotics, the processing becomes more intricate as it involves obtaining high-precision approximations of certain moments of a resolvent matrix, rather than just comparing two matrices in spectral norms. Our strategy is mainly on the method of moment and a comparison technique. To be concrete, the method of moment helps to get rid of the particular distribution of the underlying population, and the comparison technique allows us to compare Tyler’s M estimator with sample spatial-sign covariance matrix for the first time. We investigate the inner structure of their resolvent matrices and finally show that the two share the same first and second order asymptotics in some sense.

4.1 Notations and Assumptions

First, we introduce some notations that are repeatedly used in this article. Let Ap be a p × p symmetric or Hermitian matrix with eigenvalues (li)1ip. The ESD of Ap is defined as the random measure FAp=p1i=1pδli, where δli is the Dirac mass at point li. If the ESD sequence {FAp} has a limit when p, this limit is referred to as the limiting spectral distribution (LSD) of the sequence. For a given analytical function f(x) defined on R, we call f(x)dFAp(x)=p1i=1pf(li) an LSS of the random matrix Ap associated with the test function f. The Stieltjes transform of a probability measure G is defined as mG(z)=(xz)1dG(x),zC+{zC:I(z)>0}. This definition can be extended to the whole complex plane except the support of G.

Next, we recall our assumptions and definitions for easy reading. Let x1,,xnRp be a random sample having the following stochastic representation, (4.1) xj=dwjΣ12zj,tr(Σ)=p.(4.1)

Our main assumptions on this model are listed below.

Assumption

(1). Both the sample size n and population dimension p tend to infinity such that p,n,cnp/nc(0,1).

Assumption

(2). The ESD of Σ, denoted as Hp, has a bounded support, that is Supp(Hp)[a,b](0,), and converges weakly to a probability distribution H.

Assumption

(3). The coordinates (zij) of zj are iid continuous random variables having bounded density and satisfying the following moment conditions for some Δ>0, E(zij)=0, E(zij2)=1, E(zij4)=τ, E|zij|4+Δ<.

In the following Sections 4.2 and 4.3, we establish the LSD and the CLT for LSSs of Tyler’s M estimator Cn defined by Cn=pn1j=1n(xjCn1xj)1xjxj, trCn=p, where x1,,xn is a random sample from (4.1) satisfying Assumptions (1)–(3). Finally, our results are generalized to cover the case of unknown location in Section 4.4.

Remark 4.1.

We discuss here the reason for imposing Assumption (1). Initially, the definition of Tyler’s M estimator implies that the matrix Cn is well-defined only when p < n. As a result, we set the constraint c < 1. It’s worth noting that recent works, such as Zhang, Cheng, and Singer (Citation2016) and Goes, Lerman, and Nadler (Citation2020), have explored the properties of Tyler’s M estimator within this very regime. To bypass the constraint c < 1, one might consider alternative matrix models like the regularized Tyler’s M estimator, as proposed by Ollila and Tyler (Citation2014). While the techniques developed in this article could potentially address non-explicitly defined random matrix models, such discussions are beyond the scope of this article and could be a good topic for future research.

4.2 Limiting Spectral Distribution of Cn

The following theorem is on the LSD of Tyler’s M estimator Cn.

Theorem 4.1.

Under Assumptions (1)–(3), almost surely, the ESD FCn of Tyler’s M estimator Cn converges weakly to a probability measure Fc,H, whose Stieltjes transform m(z) is a solution to (4.2) m(z)=1t{1cczm(z)}zdH(t),zC+.(4.2)

The solution is unique in the set {m(z)C:(1c)/z +cm(z)C+}.

Remark 4.2.

Theorem 4.1 extends the main conclusion in Zhang, Cheng, and Singer (Citation2016), where the same LSD was derived under elliptical distributions. Indeed, according to Remark 2.1(b), elliptical distributions require those atom random vectors {zj} in (4.1) to be standard multivariate normal N(0,Ip). In that case, any finite moment of {zij} exists. Our Theorem 4.1 is indeed universal. It reaches the same conclusion not only for those standard normal {zj} but also for any continuous distributions with finite (4+Δ)th moment.

Remark 4.3.

The LSD Fc,H, characterized by its Stieltjes transform m(z) as defined in (4.2), is a generalized Marčenko-Pastur law, initially introduced in the seminal paper Marčenko and Pastur (Citation1967) to describe the LSD of sample covariance matrices. Hence, Theorem 4.1 illustrates that, in terms of the first-order asymptotics for the ESD, Tyler’s M estimator Cn constructed by samples {xj}j=1n originating from the model (4.1) with shape matrix Σ, is asymptotically equivalent to the sample covariance matrix Sn formed by samples drawn from the independent components model with the same population covariance matrix Σ, that is, Sn=n1i=1nΣ12zjzjΣ12. Indeed, the auxiliary Lemma S1.1, which is crucial for proving our main results, has yielded a stronger conclusion than the current one in Theorem 4.1. Lemma S1.1 in the supplementary material demonstrates that ||CnBn||0, almost surely, where Bn=pn1j=1n||xj||2xjxj is the spatial-sign covariance matrix constructed by samples {xj}j=1n. Additionally, it is straightforward to verify that ||BnSn||0, almost surely. These facts imply that the three matrices, Tyler’s M estimator Cn, the spatial-sign covariance matrix Bn, and the sample covariance matrix Sn, are asymptotically equivalent in terms of spectral norm.

Remark 4.4.

Note that while Theorem 4.1 is established under Assumption (1), if we allow c approach 0, the result indicates that the eigenvalue’s distribution of Tyler’s M estimator Cn converges to that of the shape matrix Σ. This finding aligns with the established result derived in Tyler (Citation1987) for the fixed-p scenario.

Remark 4.5.

Let m¯(z)cm(z)(1c)/z, which is the Stieltjes transform of F¯c,HcFc,H+(1c)δ0. Then, (4.2) can be rewritten using m¯(z) as (4.3) z=1m¯(z)+ct1+tm¯(z)dH(t),zC+.(4.3)

This quantity m¯(z) will be frequently recalled in the next subsection. For simplicity of notation, we shall sometimes suppress the independent variable z in the functions m¯(z) and m(z).

4.3 Central Limit Theorem for Linear Spectral Statistics of Cn

This section aims to establish the CLT for LSSs of Tyler’s M estimator Cn, that is, the statistics of the type f(x)dFCn(x), where f(x) is a given test function. To establish our main result, we introduce a deterministic matrix T, which relates to the shape matrix Σ in the following way, (4.4) TΣ2pΣ2τ3pΣ12diag(Σ)Σ12+(2p2trΣ2+τ3p2tr[Σ°Σ])Σ,(4.4) where “°” denotes the Hadamard product of two matrices. Note that the matrix T is asymptotically equivalent to Σ. However, the remaining terms which are of the order O(1/p) in spectral norm will still contribute when we study the CLT for LSSs of Cn. Let H˜p denote the ESD of T. Then, by replacing (c, H) with (cn,H˜p) in (4.2), this new equation defines a unique solution, referred as m˜0(z), and its induced probability measure is referred as Fcn,H˜p. In virtue of this measure, we may normalize the LSS of Cn as (4.5) Gn(f)p(f(x)dFCn(x)f(x)dFcn,H˜p(x)).(4.5)

In addition, we need the following Assumption (4) for establishing our general CLT, which indeed is required for handling the case when E(zij4)=τ3. Note that such an assumption depends not only on the eigenvalues of Σ, but also on its eigenvectors.

Assumption

(4). As p, the limits of the following three auxiliary quantities exist, (4.6) ζp=1ptr[Σ°Σ]ζ,hp(z)=1ptr[(Σ12(ΣzI)1Σ12)°Σ]h(z),gp(z1,z2)=1ptr[(Σ12(Σz1I)1Σ12)°(Σ12(Σz2I)1Σ12)]g(z1,z2),(4.6) where z,z1,z2C+.

We next show that the linear functional {Gn(f)} of the eigenvalues of Cn defined in (4.5) converges weakly to a Gaussian process.

Theorem 4.2.

Suppose that Assumptions (1)–(4) hold with some Δ2. Let f1,,fk be k functions analytic on an open set that includes the interval Ic=[lim infpλminΣ(1c)2, lim suppλmaxΣ(1+c)2]. Then the random vector {Gn(f1),,Gn(fk)} converges weakly to a Gaussian random vector {Gf1,,Gfk}. Its mean function is given by (4.7) E(Gf)=12πiCf(z)[μ1(z)+μ2(z)]dz,(4.7) where (4.8) μ1(z)=c(m¯t)2dH(t)m¯(1+m¯t)32m¯(1+zm¯)t2dH(t)(1+m¯t)2+(β2tt2)dH(t)1+m¯t2cm¯m¯tdH(t)(1+m¯t)2,+(τ3){cm¯m¯2g(1m¯,1m¯)+ζ(1+zm¯)tm¯dH(t)(1+m¯t)2(1+zm¯)m¯m¯2h(1m¯)cm¯tdH(t)(1+m¯t)2h(1m¯)},(4.8) (4.9) μ2(z)=(zm¯){2c(1c)zm¯2cβ2zm¯2cz+2c(1c)+2}    2c(1+zm¯)+(τ3){ddz[zm¯h(1m¯)]    1c(zm¯)zm¯ζ1c(1+zm¯)(zm¯)},(4.9) in which β2=t2dH(t). Its covariance function is given by (4.10) cov(Gf,Gg)=14π2C1C2f(z1)g(z2)[σ1(z1,z2)+(τ3)σ2(z1,z2)]dz1dz2,(4.10) where σ1(z1,z2)=2m¯(z1)m¯(z2)(m¯(z1)m¯(z2))2+2c(β2+1m¯(z1)+1m¯(z2))(z1m¯(z1))(z2m¯(z2))2m¯(z1)cm¯2(z1)(z2m¯(z2))(1+z1m¯(z1))2m¯(z2)cm¯2(z2)(z1m¯(z1))(1+z2m¯(z2)),σ2(z1,z2)=2z1z2{cg(1m¯(z1),1m¯(z2))+ζc(1+z1m¯(z1))(1+z2m¯(z2))(1+z1m¯(z1))h(1m¯(z2))(1+z2m¯(z2))h(1m¯(z1))}.

Here, the contours in (4.7) and (4.10) (two in (4.10), which may be assumed to be nonoverlapping) are closed, counter-clockwise orientated in the complex plane and enclosing the support of Fc,H.

Remark 4.6.

Theorem 4.2 indicates that the convergence rate for the LSSs of Tyler’s M estimator is of the order O(n) when the dimension p grows to infinity proportionally with the sample size n, while keeping their ratio away from zero asymptotically. Theoretically, our result does not cover scenarios where p is fixed or p diverges but c = 0. In the fixed-p case, Tyler (Citation1987) demonstrates that under certain conditions, n(CnΣ) converges in distribution to a matrix normal distribution as n. This large sample behavior of Cn resembles the classical theory on sample covariance matrix introduced in the book by Anderson (Citation1984). When p diverges but c = 0, random matrix theory suggests that the convergence rate for LSSs becomes the order of O(pn), as observed in the results for sample covariance matrices in Chen and Pan (Citation2015). Particularly, in order to determine the centering term, as well as the limiting mean and limiting variance of the CLT for LSSs, we must identify the dominant terms that will contribute by controlling the remaining negligible terms up to certain orders, which depends heavily on the convergence rates. As far as we know, there is currently no unified theory that can easily accommodate all these asymptotic frameworks including fixed p, p/n0 and p/nc>0. Nevertheless, for practitioners applying the CLT in real data analysis or applications, it’s worth noting that once a dataset is available, one can directly set the value of c equal to p/n. Our CLT result remains valid even if c is small. We have applied this approach in all our simulation experiments and real data analysis provided in Section 3.

Finally, we discuss the results of two special cases covered by Theorem 4.2.

Special case (1): Ezij4=3. This special case includes the elliptical family of distributions. From Remark 2.1(b), when the sample {x1,,xn} comes from the family of elliptical distributions, the atom random vectors z1,,zn in (4.1) reduce to the standard multivariate normal vectors N(0,Ip), which meets E(zij4)=3. According to Theorem 4.2 and by setting τ = 3, all the impacts from the eigenvectors of Σ disappear. We establish in the following Proposition 4.1 that the centering term f(x)dFcn,H˜p(x) given in (4.16) can also be simplified to a functional of Hp only. In this way, all the quantities including the centering term, limiting mean and limiting variance in Theorem 4.2 are only relevant to the ESD of Σ, that is, the information contained in {Hp}.

Proposition 4.1.

Suppose that Assumptions (1)–(3) hold with τ = 3. We have p(f(x)dFcn,Hp(x)f(x)dFcn,H˜p(x))12πiCf(z)μ3(z)dz,where μ3(z)=2m¯(z)β2tt2(1+tm¯(z))2dH(t) and β2=t2dH(t).

Special case (2): identity shape matrix. When the shape matrix Σ=Ip (Hpδ1), Assumption (4) holds automatically. Indeed, the three auxiliary quantities defined in (4.6) are exactly equal to their limits given by ζ = 1, h(z)=(1z)1 and g(z1,z2)=(1z1)1(1z2)1. After some algebra by setting H=δ1 in Remark 4.5, we have all the limiting quantities in Theorem 4.2 involving the factor τ3 equal zero, that is, the terms in (4.8), (4.9) and σ2(z1,z2) all disappear. Moreover, by (4.4), we notice that when Σ=Ip, the matrix T reduces to the identity matrix Ip which does not depend on the factor τ3. Thus, for those samples with identity shape matrix, the CLT for LSSs of Tyler’s M estimator Cn is irrelevant to the fourth moment τ. Therefore, it is universal in the sense that it does not depend on the specific underlying distribution of the population.

4.4 Adjustments to the CLT When the Location Vector is Unknown

This section considers the CLT for LSSs of Tyler’s M estimator when the true location vector is unknown. To this end, suppose we have a sequence of iid observations {x1,,xn}Rp having the form (4.11) xj=dμ+wjΣ12zj,tr(Σ)=p,(4.11) where μ is now unknown. We update Assumption (3) of the model (2.1) as follows.

Assumption

(3’). The coordinates (zij) of zj are iid continuous random variables having bounded density and satisfying E(zij)=0, E(zij2)=1, E(zij4)=τ, E(zijγ)< for some γ>8. The random variables {wj} are independent of {zj} and satisfying E(wjr)<, E(wjr)< for some r > 4.

Tyler’s M estimator corresponding to the data sample x1,,xn in (4.11), denoted as C¯n=C¯n(μ̂), is then defined to be the solution of equation (4.12) C¯n=pnj=1n(xjμ̂)(xjμ̂)(xjμ̂)C¯n1(xjμ̂),tr(C¯n)=p,(4.12) where μ̂ is an estimate of the true location μ. We consider a class of location estimates μ̂λ of the form (4.13) μ̂λj=1nαjλxj=μ+j=1nαjλwjΣ12zj,(4.13) (4.14) αjλ||xjx¯||λj=1n||xjx¯||λ,(4.14) where λ is a pre-determined nonnegative number in [0,2] and x¯=j=1nxj/n is the sample mean. Notice that the quantitie {||xjx¯||λ} is close to {(pwj)λ} and thus, by the construction, the weight {αjλ} attempts to approximate the oracle one wjλ/j=1nwjλ. In particular, μ̂λ reduces to the usual sample mean if we set λ = 0. With such a class of location estimates μ̂λ involved, we establish the relation between the LSSs of C¯n(μ̂λ) and those of Cn in the following theorem.

Theorem 4.3.

Suppose that Assumptions (1)-(2)-(3’) hold. Then, we have pf(x)dFC¯n(μ̂λ)(x)pf(x)dFCn(x)12πiCf(z)μ4(z)dzin probability, where μ4(z)=[zm¯(z)]{1α˜λ2α˜λzm¯(z)zm¯(z)+α˜λzm¯(z)[1+zm¯(z)]+α˜λ1},α˜λ=Ew2Ew22λ(Ewλ)21.

Theorem 4.3

demonstrates that substituting the location estimate μ̂λ for the true location vector μ in Tyler’s M estimator only introduces a mean shift to the CLT for its LSSs. And the shift is characterized by the function μ4(z), which depends on certain moments of the random variable w as specified in the parameter α˜λ. In particular, by setting λ = 2 in Theorem 4.3, we get α˜λ=0 and the function μ4(z) then becomes irrelevant to the moments of w and possesses a concise form (4.15) μ4(z)=(1+zm¯(z))(m¯(z)+zm¯(z))zm¯(z)=p1xz[dFcn1,H˜p(x)dFcn,H˜p(x)]+o(1),(4.15) where Fcn,H˜p and Fcn1,H˜p with cn1=p/(n1) are two finite-dimensional proxies for the LSD Fc,H with (c, H) replaced by (cn,H˜p) and (cn1,H˜p), respectively, see Zheng, Bai, and Yao (Citation2015). Such μ4(z) in (4.15) can be canceled out by a simple replacement of the sample size n with n – 1 in the centering term of the LSS defined in (4.5). We summarize such substitution result in the following proposition.

Proposition 4.2.

Suppose that Assumptions (1), (2), (3’), and (4) hold. Denote the normalized LSS of C¯n(μ̂2) as (4.16) G¯n(f)p(f(x)dFC¯n(μ̂2)(x)f(x)dFcn1,H˜p(x)).(4.16)

Then the random vector {G¯n(f1),,G¯n(fk)} converges weakly to the same Gaussian random vector {Gf1,,Gfk} as that in Theorem 4.2.

Very interestingly, we find for the first time that by substituting the true location vector μ with its estimate μ̂λ by setting λ = 2 in (4.13), the CLT for LSSs of their corresponding Tyler’s M estimators do not change. The only difference lies in the centering terms for normalizing the LSSs. This is a parallel result to the substitution principle for LSSs of the sample covariance matrix when the unknown mean is estimated by the sample mean, see Zheng, Bai, and Yao (Citation2015).

Supplementary Materials

This supplementary material consists of proofs of the main results developed in Sections 2 and 4, namely Theorems 2.1, 2.2, 2.3, 2.4, 2.5, 4.1, 4.2, 4.3, and Proposition 4.1. Additionally, it includes some additional lemmas for proving these results and simulations for the simultaneous estimates mentioned in Remark 2.7.

Supplemental material

Supplemental Material

Download Zip (1.2 MB)

Supplemental Material

Download PDF (548 KB)

Acknowledgments

The authors would like to thank the Associate Editor and referees for their constructive comments, which lead to significant improvement of this work.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

R. Li’s research is supported by NIH grants NIH R01AI170249 and R01AI136664, W. Li’s research is supported by NSFC (No. 11971293, 12141107), and Q. Wang’s research is supported by NSFC (No. 12171099) and Natural Science Foundation of Shanghai (23ZR1406200).

References

  • Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis, New York: Wiley.
  • Bai, Z., Jiang, D., Yao, J., and Zheng, S. (2009), “Corrections to LRT on Large-Dimensional Covariance Matrix by RMT,” The Annals of Statistics, 37, 3822–3840. DOI: 10.1214/09-AOS694.
  • Bai, Z., Miao, B., and Pan, G. (2007), “On Asymptotics of Eigenvectors of Large Sample Covariance Matrix,” The Annals of Probability, 35, 1532–1572. DOI: 10.1214/009117906000001079.
  • Bai, Z., and Silverstein, J. W. (2004), “CLT for Linear Spectral Statistics of Large-Dimensional Sample Covariance Matrices,” The Annals of Probability, 32, 553–605. DOI: 10.1214/aop/1078415845.
  • Bai, Z., and Silverstein, J. W. (2010), Spectral Analysis of Large Dimensional Random Matrices (2nd ed.), New York: Springer.
  • Chen, B., and Pan, G. (2015), “CLT for Linear Spectral Statistics of Normalized Sample Covariance Matrices With the Dimension Much Larger Than the Sample Size,” Bernoulli, 21, 1089–1133. DOI: 10.3150/14-BEJ599.
  • Chen, S., Zhang, L., and Zhong, P. (2010), “Tests for High-Dimensional Covariance Matrices,” Journal of the American Statistical Association, 105, 810–819. DOI: 10.1198/jasa.2010.tm09560.
  • Couillet, R., Kammoun, A., and Pascal, F. (2016), “Second Order Statistics of Robust Estimators of Scatter. Application to GLRT Detection for Elliptical Signals,” Journal of Multivariate Analysis, 143, 249–274. DOI: 10.1016/j.jmva.2015.08.021.
  • Couillet, R., Pascal, F., and Silverstein, J. W. (2015), “The Random Matrix Regime of Maronna’s M-estimator with Elliptically Distributed Samples,” Journal of Multivariate Analysis, 139, 56–78. DOI: 10.1016/j.jmva.2015.02.020.
  • Fang, K., Kotz, S., and Ng, K. W. (1990), Symmetric Multivariate and Related Distributions, London: Chapman and Hall, Ltd.
  • Goes, J., Lerman, G., and Nadler, B. (2020), “Robust Sparse Covariance Estimation by Thresholding Tyler’s M-estimator,” The Annals of Statistics, 48, 86–110. DOI: 10.1214/18-AOS1793.
  • Hallin, M., Oja, H., and Paindaeine, D. (2006), “Semipararamtrically Efficient Rank-based Inference for Shape II. Optimal R-estimation of Shape,” The Annals of Statistics, 34, 2757–2789. DOI: 10.1214/009053606000000948.
  • Hallin, M., and Paindaeine, D. (2006), “Seimpararamtrically Efficient Rank-based Inference for Shape I. Optimal Rank-based Test for Sphericity,” The Annals of Statistics, 34, 2707–2756. DOI: 10.1214/009053606000000731.
  • Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” The Annals of Mathematical Statistics, 35, 73–101. DOI: 10.1214/aoms/1177703732.
  • Ilmonen, P., and Paindaeine, D. (2011), “Seimpararamtrically Efficient Inference based Signed Ranks in Symmetric Independent Component Models,” The Annals of Statistics, 39, 2448–2476. DOI: 10.1214/11-AOS906.
  • James, W., and Stein, C. (1961), “Estimation with Quadratic Loss,” in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability (Vol. I), pp. 361–379.
  • John, S. (1971), “Some Optimal Multivariate Tests,” Biometrika, 58, 123–127. DOI: 10.2307/2334322.
  • John, S. (1972), “The Distribution of a Statistic Used for Testing Sphericity of Normal Distributions,” Biometrika, 59, 169–173. DOI: 10.2307/2334628.
  • Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” The Annals of Statistics, 29, 295–327. DOI: 10.1214/aos/1009210544.
  • Kammoun, A., Couillet, R., Pascal, F., and Alouini, M.-S. (2016), “Convergence and Fluctuations of Regularized Tyler Estimators,” IEEE Transactions on Signal Processing, 64, 1048–1060. DOI: 10.1109/TSP.2015.2494858.
  • Kent, J. T., and Tyler, D. E. (1988), “Maximum Likelihood Estimation for the Wrapped Cauchy Distribution,” Journal of Applied Statistics, 15, 247–254. DOI: 10.1080/02664768800000029.
  • Kent, J. T., and Tyler, D. E. (1991), “Redescending M-estimates of Multivariate Location and Scatter,” The Annals of Statistics, 19, 2102–2119. DOI: 10.1214/aos/1176348388.
  • Ledoit, O., and Wolf, M. (2002), “Some Hypothesis Tests for the Covariance Matrix When the Dimension is Large Compared to the Sample Size,” The Annals of Statistics, 30, 1081–1102. DOI: 10.1214/aos/1031689018.
  • Li, J., and Chen, S. (2012), “Two Sample Tests for High-Dimensional Covariance Matrices,” The Annals of Statistics, 40, 908–940. DOI: 10.1214/12-AOS993.
  • Liu, B., Xu, L., Zheng, S., and Tian, G. (2014), “A New Test for the Proportionality of Two Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 131, 293–308. DOI: 10.1016/j.jmva.2014.06.008.
  • Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al. (1999), “Robust Principal Component Analysis for Functional Data,” Test, 8, 1–73. DOI: 10.1007/BF02595862.
  • Magyar, A. F., and Tyler, D. E. (2014), “The Asymptotic Inadmissibility of the Spatial Sign Covariance Matrix for Elliptically Symmetric Distributions,” Biometrika, 101, 673–688. DOI: 10.1093/biomet/asu020.
  • Maronna, R. A. (1976), “Robust M-estimators of Multivariate Location and Scatter,” The Annals of Statistics, 4, 51–67. DOI: 10.1214/aos/1176343347.
  • Marčenko, V., and Pastur, L. (1967), “The Distribution of Eigenvalues in Certain Sets of Random Matrices,Math. USSR-Sbornik, 1, 457–483.
  • Ollila, E., and Tyler, D. E. (2014), “Regularized M-estimators of Scatter Matrix,” IEEE Transactions on Signal Processing, 62, 6059–6070. DOI: 10.1109/TSP.2014.2360826.
  • Pan, G., and Zhou, W. (2008), “Central Limit Theorem for Signal-to-Interference Ratio of Reduced Rank Linear Receiver,” Annals of Applied Probability, 18, 1232–1270.
  • Rousseeuw, P. (1985), “Multivariate Estimation with High Breakdown Point,” in Mathematical Statistics and Applications: Proceedings 4th Pannonian Symposium on Mathematical Statistics Vol. B, ed. W. Grossman, et al., pp. 283–297.
  • Schott, J. R. (2007), “A Test for the Equality of Covariance Matrices When the Dimension is Large Relative to the Sample Sizes,” Computational Statistics & Data Analysis, 51, 6535–6542. DOI: 10.1016/j.csda.2007.03.004.
  • Tyler, D. E. (1987), “A Distribution-Free M-Estimator of Multivariate Scatter,” The Annals of Statistics, 15, 234–251. DOI: 10.1214/aos/1176350263.
  • Zhang, T., Cheng, X., and Singer, A. (2016), “Marčenko-Pastur Law for Tyler’s M-estimator,” Journal of Multivariate Analysis, 149, 114–123. DOI: 10.1016/j.jmva.2016.03.010.
  • Zheng, S. (2012), “Central Limit Theorems for Linear Spectral Statistics of Large Dimensional F-matrices,” Annales de l’Institut Henri Poincaré (B) Probability and Statistics 48, 444–476.
  • Zheng, S., Bai, Z., and Yao, J. (2015), “Substitution Principle for CLT of Linear Spectral Statistics of High-Dimensional Sample Covariance Matrices with Applications to Hypothesis Testing,” The Annals of Statistics, 43(2):546–591. DOI: 10.1214/14-AOS1292.
  • Zou, T., Lin, R., Zheng, S., and Tian, G. L. (2021), “Two-Sample Tests for High-Dimensional Covariance Matrices Using both Difference and Ratio,” Electronic Journal of Statistics, 15, 135–210. DOI: 10.1214/20-EJS1783.