Abstract
Tyler’s M estimator, as a robust alternative to the sample covariance matrix, has been widely applied in robust statistics. However, classical theory on Tyler’s M estimator is mainly developed in the low-dimensional regime for elliptical populations. It remains largely unknown when the parameter of dimension p grows proportionally to the sample size n for general populations. By using the eigenvalues of Tyler’s M estimator, this article develops tests for the identity and equality of shape matrices in a large-dimensional framework where the dimension-to-sample size ratio p/n has a limit in (0, 1). The proposed tests can be applied to a broad class of multivariate distributions including the family of elliptical distributions (see model (2.1) for details). To analyze both the null and alternative distributions of the proposed tests, we provide a unified theory on the spectrum of a large-dimensional Tyler’s M estimator when the underlying population is general. Simulation results demonstrate good performance and robustness of our tests. An empirical analysis of the Fama-French 49 industrial portfolios is carried out to demonstrate the shape of the portfolios varying. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
1 Introduction
Tests for shape (or scatter) matrix for multivariate nonnormal data play the same role as tests for covariance matrix in multivariate normal distributions. This is one of the central parts in multivariate analysis (Anderson Citation1984). Many applications including, but not limited to, signal processing, wireless communication system, biology, and financial engineering may impose certain structures on shape or covariance matrix. Thus, test of shape or covariance matrix structure is of primary interest in modern multivariate analysis. Motivated by an empirical analysis of financial economic data in Section 3.3, this article aims to develop tests for shape matrix of large-dimensional data without imposing normality assumption.
As a natural extension of multivariate normal distribution, elliptical symmetric distributions (Fang, Kotz, and Ng Citation1990) are commonly used in the literature of nonparametric multivariate analysis. The shape or scatter matrix of elliptical distribution is proportional to the covariance matrix if its second moment exists and is finite. This motivates researchers to develop statistical inference procedures for the shape matrix rather than the covariance matrix of elliptical distributions. In many real-world applications such as the empirical analysis in Section 3.3, data may exhibit heavy-tailed nature or potentially contaminated by outliers. As it is well known, sample covariance matrix based inference on the shape is sensitive to outliers and heavy tails, and therefore it will lose its efficiency and become less reliable when the underlying populations are elliptically distributed. To cope with this issue, many robust alternatives are used in the literature, including the M-estimators (Huber Citation1964), the minimum volume ellipsoid and minimum determinant estimators (Rousseeuw Citation1985) and Tyler’s M estimator (Tyler Citation1987), etc. Hallin and Paindaeine (Citation2006) and Hallin, Oja, and Paindaeine (Citation2006) proposed a class of rank-based procedures for testing sphericity of elliptical distribution (equivalently testing the shape matrix being an identity matrix), and showed their test for shape is semiparametrically efficient. All aforementioned works assume that the data are finite and fixed-dimensional. The theoretical analyses in these works may not be applicable for large-dimensional data, which are common in many real-world applications nowadays. Thus, it is of great importance in developing new theory that is applicable for establishing asymptotic properties of statistical inference procedures for large-dimensional shape matrix.
Tests for large-dimensional covariance matrix are closely related to tests on shape matrix and have received increasing attention in the literature. Tests for the identity and sphericity of covariance matrix have been studied in the literature Ledoit and Wolf Citation2002; Bai et al. Citation2009; Chen, Zhang, and Zhong Citation2010. Tests for the equality and proportionality of covariance matrices in the two-sample problems have been studied in Schott (Citation2007), Li and Chen (Citation2012), and Zou et al. (Citation2021). A common feature of these conventional tests is that their statistics are constructed from sample covariance matrix or sample correlation matrix, which are both well-behaved for those Gaussian-type and light-tailed populations, but are sensitive to outliers and heavy-tailed distributions. Meanwhile, random matrix theory for sample covariance or correlation matrices have been well developed for Gaussian-type and light-tailed populations, but is less developed for heavy-tailed populations. One of primary goals of this article is to develop new theory for large-dimensional shape matrix estimate with heavy-tailed populations, and then apply the new theory for developing tests on shape matrices.
In this article, we develop tests for shape matrix in a large-dimensional regime where the parameter of dimension p grows to infinity proportionally to the sample size n such that . We focus on two fundamental problems in statistical inference on the shape matrix: test whether the shape matrix of a population is identity and test whether the shape matrices of two populations are equal. The proposed test statistics are based on the eigenvalues of Tyler’s M estimator since among all aforementioned robust estimators, Tyler’s M estimator enjoys certain superiorities: it is affine equivariant and, under ellipticity assumption, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). The proposed procedures are applicable to a broad class of multivariate distributions defined in (2.1). In order to analyze both the null and alternative distributions of the proposed statistics, we develop new theory on the limiting spectrum of a large-dimensional Tyler’s M estimator. Specifically, under the population model (2.1), we establish both the convergence of the empirical spectral distribution (ESD) and the central limit theorem (CLT) for linear spectral statistics (LSSs) of a large-dimensional Tyler’s M estimator. Such theoretical results are brand new and are of independent interest in random matrix theory, with potential applications in developing new approaches for other related robust inferences in large dimensions. Let’s briefly summarize the major contributions of this work below.
From methodological point of view, the proposed tests have several appealing properties. First, they are all distribution-free. We show that their limiting null distributions depend only on the dimension p and sample size n of a dataset. Second, Tyler’s M estimator is an affine equivariant estimator of the shape matrix in the sense that when we apply the affine transformation to the data, their corresponding Tyler’s M estimators, denoted as and , satisfy for any deterministic full-rank square matrix . Thus, Tyler’s M estimator behaves similarly to the sample covariance matrix under linear transformations. In particular, for tests of the two-sample shape matrix problem, this affine equivariance property of Tyler’s M estimator can eliminate the impact from the unknown shape matrix under the null hypothesis. This ensures that the proposed tests are universal to the underlying populations. At the same time, it allows us to investigate some nontrivial cases, including those where the shape matrix has an unbounded spectral norm, etc. These findings are also conformed by both theoretical analysis and Monte Carlo studies.
From technical aspects, we develop new strategies and techniques to study the limiting spectrum of a large-dimensional Tyler’s M estimator from general populations. As far as we know, existing theories on Tyler’s M estimator are mainly developed in the low-dimensional regime for elliptical populations, but remain largely unknown when the dimension p grows to infinity and the underlying population deviates away from the elliptical ones. In addition, Tyler’s M estimator is defined to be the solution to a fixed point equation and has no explicit expression. Therefore, standard approaches and tools in random matrix theory given in Bai and Silverstein (Citation2010) cannot be directly imitated here. To solve this problem, we propose a new theoretical development strategy by comparing the resolvent of Tyler’s M estimator to that of sample spatial-sign covariance matrix (Locantore et al. Citation1999) for the first time, which is eventually achieved by employing a series of new techniques, mainly the method of moments. Note that our techniques can be extended similarly to the study of other implicitly defined M-estimators, such as the regularized Tyler’s M estimator (Kammoun et al. Citation2016), where is then allowed to be larger than 1. However, this is beyond the scope of this article.
The rest of the article is organized as follows. Section 2 introduces our model assumptions, and develops tests for shape matrices in both the one-sample and two-sample cases. In Section 3, we conduct simulation results, and illustrate the proposed methodology via an empirical analysis of a real dataset. Section 4 is dedicated to the development of new theory regarding the limiting spectrum of a large-dimensional Tyler’s M-estimator. These are the core technical tools used to derive both the null and alternative distributions of our test statistics. All the proofs are provided in the supplementary material of this article.
2 New Test Procedures for Large-Dimensional Shape Matrix
2.1 Model of the Population
Consider a random vector admitting the following stochastic representation, (2.1) (2.1) where w > 0 is a scalar random variable, is a p × p deterministic positive definite matrix normalized by (normalization of guarantees the model identifiability between w and ), referred to as the shape matrix or scatter matrix of the population, is a random vector with iid continuous coordinates having bounded densities and satisfying (2.2) (2.2)
Remark 2.1.
It is worth noting that w and in (2.1) are not necessarily independent. Therefore, our model is quite general and covers several important classes of multivariate distributions.
When the random variable w degenerates to a constant, model (2.1) becomes the well studied independent components model (ICM), which provides the most standard setup for independent component analysis (ICA). See Ilmonen and Paindaeine (Citation2011) for detailed discussion about ICM and ICA. The ICM is also a conventional structure assumption in the study of random matrix theory (Bai and Silverstein Citation2010). As demonstrated by the real data example in Section 3.3, model (2.1) can be more general than ICM in practice.
Model (2.1) includes the elliptical distribution family with zero location vector as a special case. To see this, since if follows an elliptical distribution with zero location vector, it has the stochastic representation where v > 0 is a scalar random variable and u is a random vector uniformly distributed on the unit sphere in (Fang, Kotz, and Ng Citation1990). Further, let and , where is set to be a standard Gaussian random vector (τ = 3). Thus, we have which is indeed a special case of our model (2.1).
By using the same argument as that in (b), model (2.1) includes multivariate -norm symmetric distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of be iid according to the exponential distribution with mean 1.
With a similar argument to that in (b), model (2.1) includes a subclass of multivariate Liouville distributions (Fang, Kotz, and Ng Citation1990) as special cases by taking all elements of be iid according to a Gamma distribution.
Detailed definitions and properties of elliptical distributions, -norm symmetric distributions and multivariate Liouville distributions are referred to Fang, Kotz, and Ng (Citation1990).
2.2 Testing the Identity of a Shape Matrix
Let be a sequence of iid observations from the population defined in (2.1). Consider testing the following hypothesis for the shape matrix , (2.3) (2.3) in the large-dimensional asymptotic framework where
Our procedures are based on the eigenvalues of Tyler’s M estimator (Tyler Citation1987) of the shape , which is defined to be the matrix satisfying (2.4) (2.4)
Tyler’s M estimator is affine equivariant and under ellipticity assumption and compared with existing robust estimators, it is the “most robust” one that minimizes the maximum asymptotic variance in the finite-dimensional asymptotic framework (Tyler Citation1987; Magyar and Tyler Citation2014). Kent and Tyler (Citation1988) showed that if any linear subspace in of dimension d () contains less than nd/p observations, then there exists a unique solution to the (2.4). In particular, this condition holds with probability one for the random sample from model (2.1).
We consider the following two statistics for testing (2.3),
The trace-based statistic is motivated by the tests for covariance matrix from John (Citation1971, Citation1972), which measures the squared Frobenius distance between the Tyler’s M estimator and the shape matrix under the null. And the log-based statistic is an analogue of likelihood ratio statistic (Anderson Citation1984), which is based on the entropy loss between and under the null (James and Stein Citation1961). In the definition of these test statistics, we use the normalization , see (2.4). The null distributions of the two statistics are presented in the following theorem.
Theorem 2.1.
In the large-dimensional asymptotic framework, we have under H0,
We prove this theorem by using the new theory on the CLT for LSSs of Tyler’s M estimator, see Theorem 4.2 in Section 4. In particular, the limiting null distributions of the two statistics are only relevant to the population dimension p and sample size n, independent of the fourth moment τ of and also the true underlying distribution of the population .
To analyze the power of the proposed tests, we consider a local alternative such that is diagonal or τ = 3, and compare our Tyler’s M based test with John’s test (John Citation1972), defined as where is the sample covariance matrix. Under the local alternative, the asymptotical distributions of both and are derived in Theorems 2.2 and 2.3, respectively. Using these two theorems, we further derive the asymptotic power functions of these two tests in Proposition 2.1.
Theorem 2.2.
Suppose that , and as , Then, under either of the two conditions: (i) is diagonal or (ii) τ = 3, we have where the limiting mean and limiting variance are
Theorem 2.3.
In addition to the assumptions of Theorem 2.2, suppose that w is independent of and has finite eighth moments, that is, Then, we have (2.5) (2.5) where the scaling parameter σnJ is defined by
Proposition 2.1.
Suppose that the assumptions of Theorem 2.3 hold. Then, the asymptotic power functions of and are given by (2.6) (2.6) (2.7) (2.7) where is the standard normal distribution function, is the upper α quantile and (2.8) (2.8)
Moreover, the asymptotic relative efficiency (ARE) of with respect to is
Remark 2.2.
Theorem 2.3 unveils a phase transition phenomenon for such that its convergence rate is of the order for the case of nondegenerate w2 such as scale mixture models and multivariate t-distributions, while is of the order O(1) for the degenerate case where w2 is a constant, say, the independent components model.
Remark 2.3.
For the independent components model, Proposition 2.1 demonstrates an equivalent role of Tyler’s M estimator to the sample covariance matrix in terms of the asymptotic power functions (the two are exactly the same). However, in contrast to the test , we note that by setting and in (2.5), the null distribution of the sample covariance matrix based test still depends on the unknown fourth moment parameter τ of the atom random variable zi, which is not distribution free. Thus, an extra estimation of this parameter should be carried out for the test procedure.
Remark 2.4.
For populations with nondegenerate w2, Proposition 2.1 reveals an overwhelming superiority of Tyler’s M estimator to the sample covariance matrix. In particular, for those local alternatives where , Proposition 2.1 shows that the power of the Tyler’s M based test tends to 1 while the sample covariance matrix based test has no power. An intuitive explanation for this is that the noise generated by the variable w is stronger than the signal . Therefore, removing the irrelevant variable w from the shape estimation is the main achievement of using Tyler’s M estimator.
2.3 Testing the Equality of Two Shape Matrices
Let and be two p-dimensional populations with stochastic representations and respectively, where both and satisfy the moment conditions in (2.2) with for i = 1, 2. In this section, we consider the following hypothesis for the shape matrices and , (2.9) (2.9) where is unknown, normalized by . Assume is invertible and denote by Dp the squared Frobenius distance between the matrix and , (2.10) (2.10) which equals zero if and only if H0 is true.
Let and be two independent sets of observations from the two populations and , respectively. Their sample sizes n1, n2 and population dimension p are assumed to be large such that . Denote by and Tyler’s M estimators of the two shapes. That is, they are the unique solutions to the fixed point equations , , i = 1, 2. Then, plugging and into (2.10) yields a naive estimate of Dp, (2.11) (2.11)
However, in the large-dimensional asymptotic framework, such is not a consistent estimator of Dp as demonstrated in the first conclusion of Theorem 2.5. It has a bias which is not negligible asymptotically. By subtracting this bias from , we finally define the following consistent estimator of Dp for testing (2.9), and a large value of such an estimate constitutes an evidence in favor of H1. The limiting null distribution of is given in the following theorem, which is also derived from our general theory presented in Section 4.
Theorem 2.4.
In the large-dimensional asymptotic framework, we have under H0, where the limiting mean and limiting variance are
Remark 2.5.
The matrix in the test statistic is the F-ratio of the two Tyler’s M estimators. By their affine equivariance, this test can eliminate the impact from the unknown shape matrix under the null hypothesis. To see this, we denote and as two Tyler’s M estimators from the atom random vectors and , respectively. That is, they satisfy , , i = 1, 2. From the affine equivariance, we have the relation between and as (2.12) (2.12)
Note that in (2.12), we normalize the matrix so that the trace of the term on the right hand side equals p, which meets the condition . Then, under , we have from (2.11) and (2.12), which is indeed independent of the common shape matrix under H0. Therefore, the conclusion of Theorem 2.4 holds true only if the shape matrix is invertible and no other constraints have to be imposed. This point allows us to investigate some nontrivial cases, for example, the one that has an unbounded spectral norm, etc. This case is also studied by our simulations.
Next, we present the convergence of for general and , from which its power performance can be evaluated theoretically.
Theorem 2.5.
Suppose that and as , (2.13) (2.13)
The statistic is a strongly consistent estimator of Dp, that is, , almost surely.
If the moment conditions in (2.2) hold for the two populations with , then we have
The limiting mean and limiting variance are
Remark 2.6.
The first conclusion of Theorem 2.5 demonstrates the consistency of toward the distance Dp and the second conclusion provides the non-null distribution of for a local alternative with . In particular, the test can achieve full power if the quantity pDp grows to infinity.
2.4 Adjustments When the True Location is Unknown
In this section, we discuss adjustments to the tests developed in Sections 2.2 and 2.3 when the true location vector of the population is unknown, that is, (2.14) (2.14)
In addition to the assumptions of the model (2.1), we further assume that the random variable w is independent of and they satisfy (2.15) (2.15) for some r > 4 and .
For the one-sample case, given a sequence of iid observations following the same distribution as in (2.14), Tyler’s M estimator, denoted by , is defined to be the solution to the equation , where is an estimate of the true location . For robustness, we take into account the location estimate of the form , where and is the sample mean. Thus, the two test statistics for the hypothesis (2.3) are
For the two-sample case, let be the robust estimates of the unknown locations , i = 1, 2, respectively, where and being the sample means. Similarly, the two Tyler’s M estimators and are the solutions to the equations , . Our adjusted test statistic for the equality hypothesis in (2.9) is denoted by (2.16) (2.16)
The asymptotic null distributions of are listed below, which are direct conclusions of our newly developed theory on the CLT for LSSs of mean-adjusted Tyler’s M estimator detailed in Theorem 4.3.
Theorem 2.6.
In the large-dimensional asymptotic framework and under the null hypothesis in (2.3), we have
Theorem 2.7.
In the large-dimensional asymptotic framework and under the null hypothesis in (2.9), we have , where the limiting mean and limiting variance are the same as those given in Theorem 2.4.
Comparing Theorems 2.6 and 2.7 with Theorems 2.1 and 2.4, one can see that the three adjusted statistics share the same null distributions with except that we have replaced the sample sizes with in the centering terms. We will discuss the theoretical foundation of such substitution in Section 4.4.
Remark 2.7.
Note that in Kent and Tyler (Citation1991), the authors studied the existence and uniqueness of simultaneous M-estimates of multivariate location and scatter. These estimates are defined to minimize the objective function and thus have the following form (2.17) (2.17) where and . By taking , the solution to (2.17) corresponds to Tyler’s M estimator and another estimate of location . However, this choice of does not satisfy the Condition M outlined in Kent and Tyler (Citation1991) for the existence and uniqueness of the solution to (2.17). Although there is potential for further exploration in this direction for simultaneous estimation when the true location is unknown, such theoretical analysis is currently beyond the scope of this article.
Nevertheless, to understand the simultaneous estimates better for future investigations, we have empirically implemented an iterative algorithm to obtain a solution to (2.17), temporarily setting aside the issue of existence and uniqueness of the solution. Detailed simulations are presented at the end of the supplementary material, and the simulation results seem to imply that such is asymptotically equivalent to our explicitly defined location estimate , in terms of the -norm.
3 Simulations and Real Data Analysis
In this section, we conduct simulation studies to compare the performance of the proposed tests with existing ones, and illustrate the proposed methodology via a real data example.
3.1 Simulations for Testing (2.3)
We numerically examine the performance of the proposed tests , and . For comparison, we also conduct John’s test from Theorem 2.3 and the likelihood ratio test from Bai et al. (Citation2009), both of which are based on the sample covariance matrix. Simulation data are generated from the following two models.
Model (I). The underlying population is multivariate normal where with . We set θ = 0 under the null hypothesis and under the alternative.
Model (II). The underlying population is a scale mixture, that is, . The matrix is the same as the one in Model (I).
The population dimension p is set to be p = 100, 200, 400 and the dimension-to-sample size ratio is cn = 0.25 and 0.5. The nominal level is fixed at 5%. The test depends on the mixing parameters of Model (II), which are treated as known in our simulations. All statistics are averaged from 10,000 independent replications.
Empirical sizes of the six tests are summarized in . The results show that the four Tyler’s M based tests , and the test all have reasonable sizes under both models. Meanwhile, Tyler’s M based tests have slight size inflation when p and n are small. As the dimensions grow, their sizes reach the nominal level of 0.05. For the test , it has accurate sizes under Model (I) but suffers from serious size distortions under Model (II). This is reasonable since the test is only justified for the independent components model.
depicts the empirical powers of the six tests. The results of under the mixture model are excluded due to its size inflation. Under Model (I), the powers of these tests are comparable. That is, the tests based on Tyler’s M estimator keep the same efficiency as those based on the sample covariance matrix, and all achieve full power as the dimensions grow. When the underlying population shifts from the normal distribution to the scale mixture, that is, Model (II), the four Tyler’s M based tests remain well-behaved, but the test shows a significant loss of power, as predicted by Theorem 2.3. For instance, when , the power of declines from 99.08 to 50.40. This demonstrates that Tyler’s M based tests can gain satisfactory robustness against heavy-tailed and tail-dependent data.
3.2 Simulations for Testing (2.9)
This section evaluates the performance of and . The population dimension is set to be p = 100, 200, 400 and the dimension-to-sample size ratios are and . Our first experiment is to examine the robustness of the proposed tests by comparing them with the one constructed from sample covariance matrix (Liu et al. Citation2014), referred to as . The following two models are considered.
Model (III). The two populations are multivariate normal such that and , where is diagonal with . We set θ = 0 () under the null hypothesis and under the alternative.
Model (IV). The first population is standard normal and the second is a contaminated normal formulated by a scale mixture where represents the proportion of contamination and the matrix is defined as in Model (III). We set the proportion to be fixed at and hence, under the null hypothesis (), the population generates a dataset containing 5% outliers.
depicts empirical sizes and powers of the three tests under Models (III) and (IV). It shows that, despite having small size inflation, the tests and are both robust against outliers. However, the test shows a significant size distortion when outliers are present. This again demonstrates the robustness of Tyler’s M based tests in comparison to sample covariance matrix based tests.
Our second experiment is designed such that the difference between the two shape matrices only lies in a finite-dimensional subspace. Hence, we adopt the spiked population model (Johnstone Citation2001) as follows.
Model (V). and are two unit vectors.
For this model, when θ is close to –1, the matrix (j = 1, 2) has an eigenvalue close to zero, implying the existence of collinearity among the components of the population. When θ is large or even divergent along with the dimension p, a dominant principle component of the population exists. Hence, our two-sample test detects whether the direction of collinearity or the dominant component of the population changes over spatial scales and through time. Moreover, the common base component in and can be extended to any positive definite matrix with bounded eigenvalues.
The experimental settings are as follows. The parameter for the case of collinearity and for the case of the dominant factor. The vectors are and , where and are orthogonal vectors chosen randomly from the unit sphere in . The parameter α which represents the angle between and takes values in . Samples are drawn from normal populations with dimensions , (200, 400, 800) and (300, 600, 1200). Empirical powers of and under both cases at 5% significance from 10,000 independent replications are plotted in . It shows that the powers grow to 1 as the angle α and/or the dimensions increase, which confirms the capacity of and to distinguish the two shape matrices.
3.3 A Real Data Example
To illustrate the proposed methodology, we apply the proposed tests to an empirical analysis of Fama-French 49 industrial portfolios. The dataset containing average value-weighted daily returns of the portfolios from Years 2002 to 2019 can be downloaded from library.html\#Benchmarks" class="url" >http://mba.tuck.dartmouth.\edu/pages/faculty/ken.french/data{} library.html\#Benchmarks. To avoid the weekend effect in the stock market, we consider only data subset consisting of every Wednesday’s returns. Since there was a financial crisis in 2008–2009, we focus on two segments data: one from 2002 to 2007 and the other one from 2010 to 2019. Thus, the sample sizes of the data subsets from 2002–2007 and 2010–2019 are 310 and 515, respectively. We apply the proposed tests to examine whether the shape of data changes over time during these two periods.
To demonstrate the proposed model in (2.14) is more general than the ICM in practice, we conduct an exploratory data analysis to illustrate the ICM does not fit the datasets well. Define and Then it can be shown that under assumptions imposed in Section 2.4, if has stochastic representation (2.14), almost surely as . Note that the ICM corresponds to a constant w, and therefore the ratio . In general, the ratio is greater than 1. This motivates us to examine scatterplots of () based on its sample version.
To construct the scatterplots, we generate B = 200 bootstrap samples and calculate the sample version of () for each bootstrap sample. Specifically, for a given bootstrap sample . Denote , and , where is the sample mean of the bootstrap sample. We obtain for each bootstrap sample. The scatterplots of based on the B = 200 bootstrap samples are depicted in . If ICM is valid, one would expect the points scattering around the straight line y = x since the ratio of to should approximately equal 1 under the ICM model assumption. clearly shows that the ratio of to is dramatically greater than 1.
We now examine whether the shape matrix of the 49 assets would change over Years 2002–2007. To this end, we apply the proposed testing procedure for equality of two shape matrices to examine whether the shape matrix for data from 2002 to 2003 equals the shape matrix for data from 2004 to 2005. The resulting p-value is 0.1745, which implies that the shape does not have dramatically change. We further apply the testing procedure for data from 2004 to 2005 versus data from 2006 to 2007, and the resulting p-value is less than 0.0001. This implies that the shape of data has been changing from 2004–2005 to 2006–2007.
Similarly, we test whether the shape matrix of the 49 assets would change over the period of 2010–2019. We apply the testing procedure for data subsets over every two-year period. The resulting p-values for data from 2010–2011 versus 2012–2013, 2012–2013 versus 2014–2015, 2014–2015 versus 2016–2017, 2016–2017 versus 2018–2019 are 0.0030, 0.0112, 0.0037, and 0.0007, respectively. This implies the shape matrix of the data varies over this period.
4 Theoretical Results on the Limiting Spectrum of Tyler’s M Estimator from a General Population
Recall the definitions of our proposed statistics and their adjusted versions, which are all functionals of the eigenvalues of Tyler’s M estimator (or their F-ratio). To our best knowledge, limiting properties of the eigenvalue distributions of large-dimensional Tyler’s M estimator with general populations have not been explored fully yet and almost all existing works in the literature on M-estimators assume elliptical distributions, see Maronna (Citation1976), Tyler (Citation1987), Couillet, Pascal, and Silverstein (Citation2015), Kammoun et al. (Citation2016), Zhang, Cheng, and Singer (Citation2016), Goes, Lerman, and Nadler (Citation2020), etc. Note that by assuming the parameter of dimension p grows to infinity proportionally to the sample size n (also referred to as Marčenko-Pastur asymptotic regime in random matrix theory), it has been established only recently that for those elliptical distributions, the estimators of Maronna’s M (Couillet, Pascal, and Silverstein Citation2015), Tyler’s M (Zhang, Cheng, and Singer Citation2016) and regularized Tyler’s M (Kammoun et al. Citation2016) all behave similarly and are asymptotically equivalent to the sample covariance matrix from normal distribution. Such equivalence gives the first-order approximation. That is, the ESDs of these robust estimators from elliptical distributions all converge to the renowned Marčenko-Pastur law (Marčenko and Pastur Citation1967). As for the second-order asymptotics for these robust scatters, existing work is even limited and the only available reference is Couillet, Kammoun, and Pascal (Citation2016), where the authors focused on the fluctuation of bilinear forms of regularized Tyler’s M estimator, also from elliptical populations, and it was again established to be the same as the sample covariance matrix asymptotically. However, in the literature of random matrix theory, the convergence rate of bilinear forms of certain random matrix models is (Bai, Miao, and Pan Citation2007; Couillet, Kammoun, and Pascal Citation2016), which is in general much slower than the convergence rate of LSSs Bai and Silverstein Citation2004; Pan and Zhou Citation2008; Zheng Citation2012. Therefore, unlike the case for dealing with spectral norms (Couillet, Pascal, and Silverstein Citation2015; Kammoun et al. Citation2016; Zhang, Cheng, and Singer Citation2016) or bilinear forms (Couillet, Kammoun, and Pascal Citation2016), the LSSs of Tyler’s M estimator may fluctuate differently from those of a sample covariance matrix asymptotically.
Therefore, this section aims to develop new theoretical results on the convergence of ESDs (first-order asymptotic) and the CLT for LSSs (second-order asymptotic) of large-dimensional Tyler’s M estimator from general populations. Note that the technique developed in this article is different from the one in Zhang, Cheng, and Singer (Citation2016). The latter depends on the properties of Gaussian and elliptical distributions, say, some concentration bounds, and thus cannot be directly extended to other distributions. When it comes to the second-order asymptotics, the processing becomes more intricate as it involves obtaining high-precision approximations of certain moments of a resolvent matrix, rather than just comparing two matrices in spectral norms. Our strategy is mainly on the method of moment and a comparison technique. To be concrete, the method of moment helps to get rid of the particular distribution of the underlying population, and the comparison technique allows us to compare Tyler’s M estimator with sample spatial-sign covariance matrix for the first time. We investigate the inner structure of their resolvent matrices and finally show that the two share the same first and second order asymptotics in some sense.
4.1 Notations and Assumptions
First, we introduce some notations that are repeatedly used in this article. Let be a p × p symmetric or Hermitian matrix with eigenvalues . The ESD of is defined as the random measure where is the Dirac mass at point . If the ESD sequence has a limit when , this limit is referred to as the limiting spectral distribution (LSD) of the sequence. For a given analytical function f(x) defined on , we call an LSS of the random matrix associated with the test function f. The Stieltjes transform of a probability measure G is defined as . This definition can be extended to the whole complex plane except the support of G.
Next, we recall our assumptions and definitions for easy reading. Let be a random sample having the following stochastic representation, (4.1) (4.1)
Our main assumptions on this model are listed below.
Assumption
(1). Both the sample size n and population dimension p tend to infinity such that .
Assumption
(2). The ESD of , denoted as Hp, has a bounded support, that is , and converges weakly to a probability distribution H.
Assumption
(3). The coordinates of are iid continuous random variables having bounded density and satisfying the following moment conditions for some .
In the following Sections 4.2 and 4.3, we establish the LSD and the CLT for LSSs of Tyler’s M estimator defined by , , where is a random sample from (4.1) satisfying Assumptions (1)–(3). Finally, our results are generalized to cover the case of unknown location in Section 4.4.
Remark 4.1.
We discuss here the reason for imposing Assumption (1). Initially, the definition of Tyler’s M estimator implies that the matrix is well-defined only when p < n. As a result, we set the constraint c < 1. It’s worth noting that recent works, such as Zhang, Cheng, and Singer (Citation2016) and Goes, Lerman, and Nadler (Citation2020), have explored the properties of Tyler’s M estimator within this very regime. To bypass the constraint c < 1, one might consider alternative matrix models like the regularized Tyler’s M estimator, as proposed by Ollila and Tyler (Citation2014). While the techniques developed in this article could potentially address non-explicitly defined random matrix models, such discussions are beyond the scope of this article and could be a good topic for future research.
4.2 Limiting Spectral Distribution of
The following theorem is on the LSD of Tyler’s M estimator .
Theorem 4.1.
Under Assumptions (1)–(3), almost surely, the ESD of Tyler’s M estimator converges weakly to a probability measure , whose Stieltjes transform m(z) is a solution to (4.2) (4.2)
The solution is unique in the set .
Remark 4.2.
Theorem 4.1 extends the main conclusion in Zhang, Cheng, and Singer (Citation2016), where the same LSD was derived under elliptical distributions. Indeed, according to Remark 2.1(b), elliptical distributions require those atom random vectors in (4.1) to be standard multivariate normal . In that case, any finite moment of exists. Our Theorem 4.1 is indeed universal. It reaches the same conclusion not only for those standard normal but also for any continuous distributions with finite th moment.
Remark 4.3.
The LSD , characterized by its Stieltjes transform m(z) as defined in (4.2), is a generalized Marčenko-Pastur law, initially introduced in the seminal paper Marčenko and Pastur (Citation1967) to describe the LSD of sample covariance matrices. Hence, Theorem 4.1 illustrates that, in terms of the first-order asymptotics for the ESD, Tyler’s M estimator constructed by samples originating from the model (4.1) with shape matrix , is asymptotically equivalent to the sample covariance matrix formed by samples drawn from the independent components model with the same population covariance matrix , that is, Indeed, the auxiliary Lemma S1.1, which is crucial for proving our main results, has yielded a stronger conclusion than the current one in Theorem 4.1. Lemma S1.1 in the supplementary material demonstrates that , almost surely, where is the spatial-sign covariance matrix constructed by samples . Additionally, it is straightforward to verify that , almost surely. These facts imply that the three matrices, Tyler’s M estimator , the spatial-sign covariance matrix , and the sample covariance matrix , are asymptotically equivalent in terms of spectral norm.
Remark 4.4.
Note that while Theorem 4.1 is established under Assumption (1), if we allow c approach 0, the result indicates that the eigenvalue’s distribution of Tyler’s M estimator converges to that of the shape matrix . This finding aligns with the established result derived in Tyler (Citation1987) for the fixed-p scenario.
Remark 4.5.
Let , which is the Stieltjes transform of . Then, (4.2) can be rewritten using as (4.3) (4.3)
This quantity will be frequently recalled in the next subsection. For simplicity of notation, we shall sometimes suppress the independent variable z in the functions and m(z).
4.3 Central Limit Theorem for Linear Spectral Statistics of
This section aims to establish the CLT for LSSs of Tyler’s M estimator , that is, the statistics of the type where f(x) is a given test function. To establish our main result, we introduce a deterministic matrix T, which relates to the shape matrix in the following way, (4.4) (4.4) where “” denotes the Hadamard product of two matrices. Note that the matrix T is asymptotically equivalent to . However, the remaining terms which are of the order in spectral norm will still contribute when we study the CLT for LSSs of . Let denote the ESD of T. Then, by replacing (c, H) with in (4.2), this new equation defines a unique solution, referred as , and its induced probability measure is referred as . In virtue of this measure, we may normalize the LSS of as (4.5) (4.5)
In addition, we need the following Assumption (4) for establishing our general CLT, which indeed is required for handling the case when . Note that such an assumption depends not only on the eigenvalues of , but also on its eigenvectors.
Assumption
(4). As , the limits of the following three auxiliary quantities exist, (4.6) (4.6) where .
We next show that the linear functional of the eigenvalues of defined in (4.5) converges weakly to a Gaussian process.
Theorem 4.2.
Suppose that Assumptions (1)–(4) hold with some . Let be k functions analytic on an open set that includes the interval , Then the random vector converges weakly to a Gaussian random vector . Its mean function is given by (4.7) (4.7) where (4.8) (4.8) (4.9) (4.9) in which . Its covariance function is given by (4.10) (4.10) where
Here, the contours in (4.7) and (4.10) (two in (4.10), which may be assumed to be nonoverlapping) are closed, counter-clockwise orientated in the complex plane and enclosing the support of .
Remark 4.6.
Theorem 4.2 indicates that the convergence rate for the LSSs of Tyler’s M estimator is of the order O(n) when the dimension p grows to infinity proportionally with the sample size n, while keeping their ratio away from zero asymptotically. Theoretically, our result does not cover scenarios where p is fixed or p diverges but c = 0. In the fixed-p case, Tyler (Citation1987) demonstrates that under certain conditions, converges in distribution to a matrix normal distribution as . This large sample behavior of resembles the classical theory on sample covariance matrix introduced in the book by Anderson (Citation1984). When p diverges but c = 0, random matrix theory suggests that the convergence rate for LSSs becomes the order of , as observed in the results for sample covariance matrices in Chen and Pan (Citation2015). Particularly, in order to determine the centering term, as well as the limiting mean and limiting variance of the CLT for LSSs, we must identify the dominant terms that will contribute by controlling the remaining negligible terms up to certain orders, which depends heavily on the convergence rates. As far as we know, there is currently no unified theory that can easily accommodate all these asymptotic frameworks including fixed p, and . Nevertheless, for practitioners applying the CLT in real data analysis or applications, it’s worth noting that once a dataset is available, one can directly set the value of c equal to p/n. Our CLT result remains valid even if c is small. We have applied this approach in all our simulation experiments and real data analysis provided in Section 3.
Finally, we discuss the results of two special cases covered by Theorem 4.2.
Special case (1): . This special case includes the elliptical family of distributions. From Remark 2.1(b), when the sample comes from the family of elliptical distributions, the atom random vectors in (4.1) reduce to the standard multivariate normal vectors , which meets . According to Theorem 4.2 and by setting τ = 3, all the impacts from the eigenvectors of disappear. We establish in the following Proposition 4.1 that the centering term given in (4.16) can also be simplified to a functional of Hp only. In this way, all the quantities including the centering term, limiting mean and limiting variance in Theorem 4.2 are only relevant to the ESD of , that is, the information contained in .
Proposition 4.1.
Suppose that Assumptions (1)–(3) hold with τ = 3. We have where
Special case (2): identity shape matrix. When the shape matrix (), Assumption (4) holds automatically. Indeed, the three auxiliary quantities defined in (4.6) are exactly equal to their limits given by ζ = 1, and . After some algebra by setting in Remark 4.5, we have all the limiting quantities in Theorem 4.2 involving the factor equal zero, that is, the terms in (4.8), (4.9) and all disappear. Moreover, by (4.4), we notice that when , the matrix T reduces to the identity matrix which does not depend on the factor . Thus, for those samples with identity shape matrix, the CLT for LSSs of Tyler’s M estimator is irrelevant to the fourth moment τ. Therefore, it is universal in the sense that it does not depend on the specific underlying distribution of the population.
4.4 Adjustments to the CLT When the Location Vector is Unknown
This section considers the CLT for LSSs of Tyler’s M estimator when the true location vector is unknown. To this end, suppose we have a sequence of iid observations having the form (4.11) (4.11) where is now unknown. We update Assumption (3) of the model (2.1) as follows.
Assumption
(3’). The coordinates of are iid continuous random variables having bounded density and satisfying for some . The random variables are independent of and satisfying for some r > 4.
Tyler’s M estimator corresponding to the data sample in (4.11), denoted as , is then defined to be the solution of equation (4.12) (4.12) where is an estimate of the true location . We consider a class of location estimates of the form (4.13) (4.13) (4.14) (4.14) where λ is a pre-determined nonnegative number in and is the sample mean. Notice that the quantitie is close to and thus, by the construction, the weight attempts to approximate the oracle one In particular, reduces to the usual sample mean if we set λ = 0. With such a class of location estimates involved, we establish the relation between the LSSs of and those of in the following theorem.
Theorem 4.3.
Suppose that Assumptions (1)-(2)-(3’) hold. Then, we have in probability, where
Theorem 4.3
demonstrates that substituting the location estimate for the true location vector in Tyler’s M estimator only introduces a mean shift to the CLT for its LSSs. And the shift is characterized by the function , which depends on certain moments of the random variable w as specified in the parameter . In particular, by setting λ = 2 in Theorem 4.3, we get and the function then becomes irrelevant to the moments of w and possesses a concise form (4.15) (4.15) where and with are two finite-dimensional proxies for the LSD with (c, H) replaced by and , respectively, see Zheng, Bai, and Yao (Citation2015). Such in (4.15) can be canceled out by a simple replacement of the sample size n with n – 1 in the centering term of the LSS defined in (4.5). We summarize such substitution result in the following proposition.
Proposition 4.2.
Suppose that Assumptions (1), (2), (3’), and (4) hold. Denote the normalized LSS of as (4.16) (4.16)
Then the random vector converges weakly to the same Gaussian random vector as that in Theorem 4.2.
Very interestingly, we find for the first time that by substituting the true location vector with its estimate by setting λ = 2 in (4.13), the CLT for LSSs of their corresponding Tyler’s M estimators do not change. The only difference lies in the centering terms for normalizing the LSSs. This is a parallel result to the substitution principle for LSSs of the sample covariance matrix when the unknown mean is estimated by the sample mean, see Zheng, Bai, and Yao (Citation2015).
Supplementary Materials
This supplementary material consists of proofs of the main results developed in Sections 2 and 4, namely Theorems 2.1, 2.2, 2.3, 2.4, 2.5, 4.1, 4.2, 4.3, and Proposition 4.1. Additionally, it includes some additional lemmas for proving these results and simulations for the simultaneous estimates mentioned in Remark 2.7.
Supplemental Material
Download Zip (1.2 MB)Supplemental Material
Download PDF (548 KB)Acknowledgments
The authors would like to thank the Associate Editor and referees for their constructive comments, which lead to significant improvement of this work.
Disclosure Statement
The authors report there are no competing interests to declare.
Additional information
Funding
References
- Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis, New York: Wiley.
- Bai, Z., Jiang, D., Yao, J., and Zheng, S. (2009), “Corrections to LRT on Large-Dimensional Covariance Matrix by RMT,” The Annals of Statistics, 37, 3822–3840. DOI: 10.1214/09-AOS694.
- Bai, Z., Miao, B., and Pan, G. (2007), “On Asymptotics of Eigenvectors of Large Sample Covariance Matrix,” The Annals of Probability, 35, 1532–1572. DOI: 10.1214/009117906000001079.
- Bai, Z., and Silverstein, J. W. (2004), “CLT for Linear Spectral Statistics of Large-Dimensional Sample Covariance Matrices,” The Annals of Probability, 32, 553–605. DOI: 10.1214/aop/1078415845.
- Bai, Z., and Silverstein, J. W. (2010), Spectral Analysis of Large Dimensional Random Matrices (2nd ed.), New York: Springer.
- Chen, B., and Pan, G. (2015), “CLT for Linear Spectral Statistics of Normalized Sample Covariance Matrices With the Dimension Much Larger Than the Sample Size,” Bernoulli, 21, 1089–1133. DOI: 10.3150/14-BEJ599.
- Chen, S., Zhang, L., and Zhong, P. (2010), “Tests for High-Dimensional Covariance Matrices,” Journal of the American Statistical Association, 105, 810–819. DOI: 10.1198/jasa.2010.tm09560.
- Couillet, R., Kammoun, A., and Pascal, F. (2016), “Second Order Statistics of Robust Estimators of Scatter. Application to GLRT Detection for Elliptical Signals,” Journal of Multivariate Analysis, 143, 249–274. DOI: 10.1016/j.jmva.2015.08.021.
- Couillet, R., Pascal, F., and Silverstein, J. W. (2015), “The Random Matrix Regime of Maronna’s M-estimator with Elliptically Distributed Samples,” Journal of Multivariate Analysis, 139, 56–78. DOI: 10.1016/j.jmva.2015.02.020.
- Fang, K., Kotz, S., and Ng, K. W. (1990), Symmetric Multivariate and Related Distributions, London: Chapman and Hall, Ltd.
- Goes, J., Lerman, G., and Nadler, B. (2020), “Robust Sparse Covariance Estimation by Thresholding Tyler’s M-estimator,” The Annals of Statistics, 48, 86–110. DOI: 10.1214/18-AOS1793.
- Hallin, M., Oja, H., and Paindaeine, D. (2006), “Semipararamtrically Efficient Rank-based Inference for Shape II. Optimal R-estimation of Shape,” The Annals of Statistics, 34, 2757–2789. DOI: 10.1214/009053606000000948.
- Hallin, M., and Paindaeine, D. (2006), “Seimpararamtrically Efficient Rank-based Inference for Shape I. Optimal Rank-based Test for Sphericity,” The Annals of Statistics, 34, 2707–2756. DOI: 10.1214/009053606000000731.
- Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” The Annals of Mathematical Statistics, 35, 73–101. DOI: 10.1214/aoms/1177703732.
- Ilmonen, P., and Paindaeine, D. (2011), “Seimpararamtrically Efficient Inference based Signed Ranks in Symmetric Independent Component Models,” The Annals of Statistics, 39, 2448–2476. DOI: 10.1214/11-AOS906.
- James, W., and Stein, C. (1961), “Estimation with Quadratic Loss,” in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability (Vol. I), pp. 361–379.
- John, S. (1971), “Some Optimal Multivariate Tests,” Biometrika, 58, 123–127. DOI: 10.2307/2334322.
- John, S. (1972), “The Distribution of a Statistic Used for Testing Sphericity of Normal Distributions,” Biometrika, 59, 169–173. DOI: 10.2307/2334628.
- Johnstone, I. M. (2001), “On the Distribution of the Largest Eigenvalue in Principal Components Analysis,” The Annals of Statistics, 29, 295–327. DOI: 10.1214/aos/1009210544.
- Kammoun, A., Couillet, R., Pascal, F., and Alouini, M.-S. (2016), “Convergence and Fluctuations of Regularized Tyler Estimators,” IEEE Transactions on Signal Processing, 64, 1048–1060. DOI: 10.1109/TSP.2015.2494858.
- Kent, J. T., and Tyler, D. E. (1988), “Maximum Likelihood Estimation for the Wrapped Cauchy Distribution,” Journal of Applied Statistics, 15, 247–254. DOI: 10.1080/02664768800000029.
- Kent, J. T., and Tyler, D. E. (1991), “Redescending M-estimates of Multivariate Location and Scatter,” The Annals of Statistics, 19, 2102–2119. DOI: 10.1214/aos/1176348388.
- Ledoit, O., and Wolf, M. (2002), “Some Hypothesis Tests for the Covariance Matrix When the Dimension is Large Compared to the Sample Size,” The Annals of Statistics, 30, 1081–1102. DOI: 10.1214/aos/1031689018.
- Li, J., and Chen, S. (2012), “Two Sample Tests for High-Dimensional Covariance Matrices,” The Annals of Statistics, 40, 908–940. DOI: 10.1214/12-AOS993.
- Liu, B., Xu, L., Zheng, S., and Tian, G. (2014), “A New Test for the Proportionality of Two Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 131, 293–308. DOI: 10.1016/j.jmva.2014.06.008.
- Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al. (1999), “Robust Principal Component Analysis for Functional Data,” Test, 8, 1–73. DOI: 10.1007/BF02595862.
- Magyar, A. F., and Tyler, D. E. (2014), “The Asymptotic Inadmissibility of the Spatial Sign Covariance Matrix for Elliptically Symmetric Distributions,” Biometrika, 101, 673–688. DOI: 10.1093/biomet/asu020.
- Maronna, R. A. (1976), “Robust M-estimators of Multivariate Location and Scatter,” The Annals of Statistics, 4, 51–67. DOI: 10.1214/aos/1176343347.
- Marčenko, V., and Pastur, L. (1967), “The Distribution of Eigenvalues in Certain Sets of Random Matrices,” Math. USSR-Sbornik, 1, 457–483.
- Ollila, E., and Tyler, D. E. (2014), “Regularized M-estimators of Scatter Matrix,” IEEE Transactions on Signal Processing, 62, 6059–6070. DOI: 10.1109/TSP.2014.2360826.
- Pan, G., and Zhou, W. (2008), “Central Limit Theorem for Signal-to-Interference Ratio of Reduced Rank Linear Receiver,” Annals of Applied Probability, 18, 1232–1270.
- Rousseeuw, P. (1985), “Multivariate Estimation with High Breakdown Point,” in Mathematical Statistics and Applications: Proceedings 4th Pannonian Symposium on Mathematical Statistics Vol. B, ed. W. Grossman, et al., pp. 283–297.
- Schott, J. R. (2007), “A Test for the Equality of Covariance Matrices When the Dimension is Large Relative to the Sample Sizes,” Computational Statistics & Data Analysis, 51, 6535–6542. DOI: 10.1016/j.csda.2007.03.004.
- Tyler, D. E. (1987), “A Distribution-Free M-Estimator of Multivariate Scatter,” The Annals of Statistics, 15, 234–251. DOI: 10.1214/aos/1176350263.
- Zhang, T., Cheng, X., and Singer, A. (2016), “Marčenko-Pastur Law for Tyler’s M-estimator,” Journal of Multivariate Analysis, 149, 114–123. DOI: 10.1016/j.jmva.2016.03.010.
- Zheng, S. (2012), “Central Limit Theorems for Linear Spectral Statistics of Large Dimensional F-matrices,” Annales de l’Institut Henri Poincaré (B) Probability and Statistics 48, 444–476.
- Zheng, S., Bai, Z., and Yao, J. (2015), “Substitution Principle for CLT of Linear Spectral Statistics of High-Dimensional Sample Covariance Matrices with Applications to Hypothesis Testing,” The Annals of Statistics, 43(2):546–591. DOI: 10.1214/14-AOS1292.
- Zou, T., Lin, R., Zheng, S., and Tian, G. L. (2021), “Two-Sample Tests for High-Dimensional Covariance Matrices Using both Difference and Ratio,” Electronic Journal of Statistics, 15, 135–210. DOI: 10.1214/20-EJS1783.