A formal statistical test for the number of factors in. the approximate factor models

Size: px

Start display at page:

Download "A formal statistical test for the number of factors in. the approximate factor models"

Eileen Wilcox
6 years ago
Views:

1 A formal statistical test for the number of factors in the approximate factor models Alexei Onatski Economics Department, Columbia University September 26, 2006 Abstract In this paper we study i.i.d. sequences of n 1 vectors X t ; t = 1; :::; T such that, for each t; X t has an approximate factor structure with correlated Gaussian idiosyncratic terms. We develop a test of the null of r factors vs. the alternative that the number of factors is larger than k but smaller than k max +1; where k max is an a priori maximum number of factors. Our test statistic is equal to the ratio of k+1 kmax+1 to kmax+1 kmax+2; where i is the i-th largest eigenvalue of the sample covariance matrix 1 P T T t=1 X txt: 0 We describe the asymptotic distribution of the test statistic as n and T go to in nity proportionally as a function of the Tracy-Widom distribution. We tabulate the critical values of the test corresponding to k and k max relevant for nancial and macroeconomic applications. As an application, we test di erent hypotheses about the number of factors in arbitrage pricing theory. We reject the nulls of no or a single factor against alternatives of more than one factors by the test of 5%-level. We cannot reject the null of 5 factors by the 5%-level test. 1

2 1 Introduction Approximate factor analysis of high-dimensional data has attracted a lot of recent attention from researchers in macroeconomics and nance. Factors non-trivially in uencing thousands of stock returns observed over hundreds of time periods have been used to study the pricing of nancial assets, to evaluate the performance of nancial portfolios, and to test arbitrage pricing theory. In macroeconomics, pervasive factors extracted from half a century of data on hundreds of macroeconomic indicators have been used to monitor business cycles, to forecast individual macroeconomic time series, and to augment vector autoregressions used for monetary policy analysis. An important question to be addressed by any study which uses factor analysis is how many factors there are. Although there have been many recent studies which develop consistent estimators of the number of factors in the empirically relevant context of large and comparable time and cross-sectional dimensions of the data (see, for example, Forni et al (2000), Bai and Ng (2002, 2005), Stock and Watson (2005), Hallin and Liska (2005), Onatski (2005), and Watson and Amengual (2006)), the corresponding estimates of the number of factors driving stock returns and macroeconomic time series often considerably disagree. Such a disagreement indicates that there exists a large amount of statistical uncertainty about the point estimates of the number of factors. Unfortunately, none of the above studies proposes a formal statistical test of di erent hypotheses about the number of factors that would quantify the amount of the uncertainty. The purpose of this paper is to develop such a test. In this paper, we study i.i.d. observations of ndimensional vectors X t ; t = 1; :::; T; of data which admit the approximate factor structure (see Chamberlain and Rothschild, 1983) with correlated complex Gaussian idiosyncratic terms. If, as in the majority of the empirical applications, the original data is real, we construct X by adding the rst half of the original sample and the product of the imaginary unit and the second half of the original sample. We develop a test of the null of k factors vs. the alternative that the number of factors is larger 2

3 than k but smaller than k max +1; where k max is an a priori maximum number of factors. Our test statistics is equal to the ratio of k+1 kmax+1 to kmax+1 kmax+2; where i is the i-th P largest eigenvalue of the sample covariance matrix 1 T T t=1 X txt. 0 We describe the asymptotic distribution of the test statistics as n and T go to in nity proportionally as a function of the Tracy-Widom distribution (see Tracy and Widom, 1994), and tabulate the critical values of the test corresponding to k and k max relevant for nancial and macroeconomic applications. The logic of our test is based on the standard identi cation assumption (see Chamberlain and Rothschild, 1983) that the portion of the data s variation per cross-sectional unit explained by the factors remains non-trivial as n tends to in nity, whereas the portion of the variation per cross-sectional unit due to any idiosyncratic in uence tends to zero. This assumption is equivalent to the requirement that the rst r eigenvalues of the data s covariance matrix (where r is the true number of factors) rise proportionally to n, whereas the rest of the eigenvalues stay bounded. If k < r; we would, therefore, expect that k+1 kmax+1 is much larger than kmax+1 kmax+2: In contrast, when k = r; the eigenvalues k+1 ; kmax+1; and kmax+2 remain bounded as n tends to in nity, and we do not have a reason to expect the ratio of k+1 kmax+1 to kmax+1 kmax+2 to be large. In this paper, we establish the fact that, as both n and T tend to in nity, the joint distribution of the centered and scaled eigenvalues 1 r+1 ; :::; 1 kmax+1 ; where the centering and scaling constants and depend on n; T; and the covariance structure of the idiosyncratic terms, converges to the same limit as the joint distribution of the similarly centered and scaled k max r + 1 largest eigenvalues of the sample covariance matrix of the idiosyncratic terms. Further, we extend recent results of El Karoui (2006) to show that the latter joint distribution converges to the distribution described by Tracy and Widom (1994), which does not depend on the parameters of the data generating process. We conclude that, under the null hypothesis that k = r; the asymptotic distribution of the ratio of 1 k+1 1 kmax+1 to 1 kmax+1 1 kmax+2 is a function of the Tracy-Widom distribution. Finally, we note that the latter ratio is equal to the ratio of 3

4 k+1 kmax+1 to kmax+1 kmax+2: Therefore, its computation does not require the knowledge of the centering and the scaling constants, and hence, the knowledge of the covariance structure of the idiosyncratic terms. We take the ratio of k+1 kmax+1 to kmax+1 kmax+2 as our test statistics and tabulate the corresponding critical values. We study the nite sample performance of our test by running several Monte Carlo experiments. We nd that the test has correct size and good power for our Monte Carlo design and samples as small as n = 50 and T = 50: Moreover, the test seems to be reasonably robust with respect to di erent distributional assumptions about the idiosyncratic terms. Using Monte Carlo experiments we also study the performance of an analog of our test which does not require transforming the original real data into the complex form. Although we are unable to formally establish the asymptotic distribution of the corresponding test statistics, we conjecture the form of the asymptotic distribution and nd by simulation that the test works as well as our test designed for the complex data. As an application, we test di erent hypotheses about the number of factors in arbitrage pricing theory. Our test rejects the nulls of zero or only one factors against alternatives of more factors at 5% levels. We also can reject the nulls of 2, 3 and 4 factors at 5% level at least against some alternatives. We cannot reject the null of 5 factors against alternatives of more factors at 5% level. Although this paper is the rst to develop a formal statistical test of hypotheses about the number of factors in a situation when n and T tend to in nity simultaneously, Connor and Korajczyk (1993) were the rst to develop another test of similar hypotheses under a sequential asymptotics when rst n goes to in nity and then T goes to in nity. The Connor-Korajczyk test is not directly based on the eigenvalue-separation idea which forms the basis for our test, although their test s logic can be traced to this idea. We compare the performance of the Connor-Korajczyk test and our test using simulated data. We nd that in samples of various sizes our test has a much less distorted size than the Connor-Korajczyk test. At the same time, the power of our test is similar to that of the Connor-Korajczyk 4

5 test. The logic of our test (and that of Connor and Korajzyk, 1993) di ers from the logic of the standard likelihood ratio test (see Anderson (1984), chapter 14), which is based on the fact that the idiosyncratic covariance matrix is diagonal under the classical k-factor structure if the true number of factors is less than or equal to k: The approximate factor structure that we are concerned with in this paper does not require that all the cross-sectional correlation be due to the factors as is the case for the classical factor structure. Since the identi cation of factors is fundamentally di erent in the classical and the approximate factor model cases, it is not natural, in general, to compare our test and the classical test. The rest of the paper is organized as follows. In the next section we state our assumptions and develop the test. Section 3 contains Monte Carlo experiments. Section 4 tests di erent hypotheses about the number of factors in arbitrage pricing theory. Section 5 concludes. Technical proofs are contained in the Appendix. 2 The number of factors test We consider a sequence of approximate factor models indexed by n : X (n) = L (n) F (n)0 + e (n) (1) where X (n) is an n T (n) matrix of data; F (n) is a T (n) r matrix of T (n) observations of r factors; where r does not depend on n; L (n) is an n r deterministic matrix of factor loadings; and e (n) is an nt (n) noise matrix with i.i.d. N C (0; (n) ) columns independent from the elements of F (n). Here N C (0; (n) ) denotes a complex normal distribution, which is the distribution of a complex random variable whose real and imaginary parts are independent and identically distributed normals N(0; (n) =2): We assume that (1) satis es Assumptions 1 and 2 formulated below. In what follows we will omit superscript (n) over the variables that change with the dimensionality of data to simplify notations. 5

6 Assumption 1. There exist a positive de nite matrix B and a positive number b such that (L 0 L=n) (F 0 F=T ) p! B as n and T tend to in nity so that n=t! b: Assumption 1 can be thought of as a part of the identi cation restriction which allows us to identify the systematic component LF 0 of the data. The rest of the identi cation restriction is given by the rst inequality of Assumption 2 formulated below. If we further normalize factors so that F 0 F=T = I r ; then the assumption can be used to separately identify factors and factor loadings and may be interpreted as a requirement that the e ects of the factors per cross-sectional unit measured by L 0 L=n remain non-trivial as n tends to in nity. In this paper we do not impose any separate restrictions on the convergence or divergence of L 0 L=n and F 0 F=T because we are not going to separately identify factors and factor loadings. One consequence of putting no separate restrictions on factors and factor loadings is that much room is left for modeling factors. For example, they are allowed to be deterministic, or random and stationary, or random and non-stationary. The proportional n and T asymptotics di ers from the asymptotic assumptions made in the previous literature. Connor and Korajczyk (1993) develop their test using sequential limit asymptotics when, rst, n tends to in nity and, then, T tends to in nity. Bai and Ng (2002) allow n and T to go to in nity without any restrictions on the relative growth rates. The proportional asymptotics allows us to use the machinery of random matrix theory to establish the asymptotic distribution of our test statistics. Note that the limit b may be any positive number so that the asymptotics is consistent with a variety of empirically relevant nite sample situations. Our second assumption restricts the asymptotic behavior of the matrix of the crosssectional covariance of the idiosyncratic terms : Let l 1 ::: l n be the eigenvalues of : Denote by H the spectral distribution of ; that is H() = 1 1 n # fi n : l i > g ; where # fg denotes the number of elements in the indicated set. Further, let c be the unique root in 0; l 1 1 of the equation R (c= (1 c)) 2 dh() = T=n: Assumption 2. As n and T tend to in nity so that n=t! b, lim sup l 1 < 1; 6

7 lim inf l n > 0; and lim sup l 1 c < 1: Note that l 1 is the variance of the most variable or, loosly speaking, the most in uential weighted average of the idiosyncratic terms. Hence, the rst of the three inequalities of Assumption 2 requires that the strongest cumulative idiosyncratic in uence on the crosssectional units remains bounded as the number of the units tends to in nity. This requirement taken together with Assumption 1 constitute a slight modi cation of the standard (see Chamberlain and Rothschild, 1983) identi cation restriction imposed on model (1). The second inequality of Assumption 2 requires that, whatever the dimensionality of the data is, there is no multicollinearity among the idiosyncratic terms. The third inequality is crucial for the analytic apparatus developed in El Karoui (2006) for the analysis (which we rely on in this paper) of the spectral distribution of large complex Wishart matrices. In a nutshell, the inequality requires that the right tail of the spectral distribution of is not too thin. More precisely, it is not di cult to see that for the inequality lim sup l 1 c < 1 to hold, it is su cient that H weakly converges to a distribution H 1 with the upper boundary of support equal to lim sup l 1 and the density bounded away from zero in the vicinity of lim sup l 1 : As is pointed out by El Karoui (2006), such a su cient condition would hold, for example, for that have a symmetric Toeplitz structure with parameters a 0 ; a 1 ; ::: such that P k ja k j < 1 and for that have uniformly spaced eigenvalues on a given bounded segment. In this version of the paper we assume, in addition to Assumptions 1 and 2, that b < 1: We will get rid of this additional assumption in the later versions of the paper. Let 1 ::: n be the eigenvalues of ee 0 =T: De ne a centering constant and a scaling constant so that c = 1 + (n=t ) R (c) = (1 c) dh() and (c) 3 = 1 + (n=t ) R (c) 3 = (1 c) 3 dh () ; and let ~ i = T 2=3 1 ( i ) : Further, let W be a T T Hermitian matrix with i.i.d. N C (0; 1=T ) lower triangular entries and (independent from them) i.i.d N (0; 1=T ) diagonal entries. The collection of such matrices is called Gaussian Unitary Ensemble. It plays an important role in random matrix theory (see Mehta, 2004). Let d 1 ::: d T be the eigenvalues of W; and de ne ~ d i = T 2=3 (d i 2). We, rst, establish 7

8 the following Theorem 1. Let Assumptions 1 and 2 hold and let b < 1: Then, as n and T go to in nity so that n=t! b; for any nite positive integer j; the limiting joint distribution of ~ 1 ; :::; ~ j is equal to the limiting joint distribution of d1 ~ ; :::; d ~ j described by Tracy and Widom (1994). Theorem 1 is, essentially, due to El Karoui (2006). Although he proves the theorem only for the case of j = 1; his derivations imply that it holds for any xed positive integer j: An argument which establishes this fact is the same as that in Soshnikov (2002) which shows that Johnstone s (2001) result stated for the largest eigenvalue of the sample covariance matrix of uncorrelated data easily generalizes to the several largest eigenvalues. We will sketch the argument in the future version of the paper which will deal with the case b 1: Tracy and Widom (1994) studied the asymptotic distribution of a few largest eigenvalues of matrices from the Gaussian Unitary Ensemble when the dimensionality of the matrices tend to in nity. They showed that under appropriate centering and scaling the asymptotic distribution exists and can be expressed in terms of a solution of a system of partial di erential equations. The system simpli es to a single ordinary di erential equation q 00 (s) = sq(s) + 2q 3 (s) when we are interested in the asymptotic distribution of the largest eigenvalue only. In such a case, the asymptotic distribution is equal to F (x) exp R 1 x (x s)q2 (s)ds ; where q(s) is the solution of the above ordinary differential equation which is asymptotically equivalent to the Airy function Ai(s) as s! 1: 1 We describe the Tracy-Widom distribution in more detail in the Monte Carlo section of our paper. Our next theorem shows the equivalence of the joint asymptotic distribution of unobservable eigenvalues of ee 0 =T and that of observable eigenvalues of XX 0 =T: Let 1 ::: n be the eigenvalues of the sample covariance matrix XX 0 =T of the data, and let ~ i = T 2=3 1 ( i ) ; where and are as de ned above. Then, we have: 1 For the de nition and properties of the Airy function see Olver (1974). 8

9 Theorem 2. Let Assumptions 1 and 2 hold. Then, as n and T go to in nity so that n=t! b; for any nite positive integer j; the limiting joint distribution of ~ r+1 ; :::; ~ r+j is equal to the limiting joint distribution of ~ 1 ; :::; ~ j : A proof of the theorem is given in the Appendix. Theorem 2 is the basis of our test of the number of factors. To test a hypothesis that the number of factors is k vs. an alternative that the number of factors is larger than k but smaller than k max + 1; we form a test statistics k+1 kmax+1 = kmax+1 kmax+2 : Note that the test statistics is equal to the ratio of the normalized and centered eigenvalues ~ k+1 ~ kmax+1 = ~kmax+1 ~ kmax+2 : Therefore, by Theorem 2, the asymptotic distribution of our test statistics under the null of k factors is the same as the asymptotic distribution of the ratio ~1 kmax ~ k+1 = ~kmax k+1 ~ kmax k+2 : By Theorem 1 and the continuous mapping theorem, the asymptotic distribution of the latter ratio is equal to the distribution of (x 1 x kmax k+1) = (x kmax k+1 x kmax k+2), where x 1 ; :::; x j are random variables with the Tracy-Widom joint distribution. The critical values of the test statistics can therefore be tabulated based on the knowledge of the Tracy-Widom law. We do such a tabulation in the Monte Carlo section of the paper. In contrast, when the alternative hypothesis is true, k+1 rises proportionally to n while kmax+1 and kmax+2 remain bounded. Therefore, under the alternative hypothesis our test statistics explodes. Theorem 3 summarizes the properties of our test. Theorem 3. Let Assumptions 1 and 2 hold and let b < 1: Then, if the true number of factors is equal to k; the distribution of the ratio k+1 kmax+1 = kmax+1 kmax+2 converges to the distribution of (x 1 x kmax k+1) = (x kmax k+1 x kmax k+2), where x 1 ; :::; x j are random variables with the Tracy-Widom joint distribution. The convergence takes place as n and T go to in nity so that n=t! b: In contrast, when the true number of factors is larger than k but smaller than k max + 1; the ratio k+1 kmax+1 = kmax+1 kmax+2 diverges in probability to in nity. Theorem 3 is a simple consequence of Theorem 2. 9

10 3 Monte Carlo Study In this section we use Monte Carlo simulations to describe in some detail the Tracy-Widom distribution, to tabulate the critical values of our test, and to study its nite sample properties. We approximate the Tracy-Widom distribution by the distribution of a few largest eigenvalues of a matrix from the Gaussian Unitary Ensemble. We obtain an approximation for the latter distribution by simulating 30,000 independent matrices from the ensemble and numerically computing their 10 rst eigenvalues. The left panel of Figure 1 shows the empirical distribution function of the largest eigenvalue centered by 2 and scaled by T 2=3 = =3. The right panel of Figure 1 shows a scatterplot of 30,000 observations of the di erence between the rst and the second eigenvalues vs. the di erence between the second and the third eigenvalues (both di erences scaled by T 2=3 ): Tracy and Widom (2002) report that the mean of their univariate distribution 2 is about -1.77, the standard deviation is close to 0.90, the skewness is slightly larger than 0.22, and the kurtosis is around These characteristics are consistent with the left panel of Figure 1. The right panel of Figure 1 shows that it is not unlikely that x 1 x 2 is substantially larger than x 2 x 3 ; where x 1 ; x 2 ; and x 3 have joint Tracy-Widom distribution. This observation suggests that ad hoc methods of the determination of the number of factors based on visual inspection of the eigenvalues of the sample covariance matrix, and their separation into a group of large and a group of small eigenvalues may be misleading. It may happen, for example, that even though data have no factors, the rst eigenvalue of the sample covariance matrix is substantially larger than the second one, while the second eigenvalue is not much di erent from the third one. Table 1 contains approximate percentiles of the distributions of random variables (x 1 x j 1 ) = (x j 1 x j ) for j = 3; 4; :::; 10; where x 1 ; :::; x j have the joint Tracy-Widom distribution. The approximate percentiles were obtained as the Jackknifed sectioning es- 2 They denote the distribution as F 2 to distinguish it from the distributions F 1 and F 4 that correspond to the limiting distributions of the largest eigenvalues of matrices from the so called Gaussian Orthogonal and Simplectic Ensenbles, respectively. 10

11 Figure 1: Left panel: a univariate Tracy-Widom distribution. Right panel: 30,000 draws from the joint distribution of x 1 x 2 and x 2 x 3 ; where x 1 ; x 2 ; x 3 have the joint Tracy-Widom distribution 11

12 99% 8:81 (7:77;9:84) 95% 4:52 (4:43;4:61) 90% 3:33 (3:28;3:38) j = k max k :10 (13:93;16:28) 8:08 (7:89;8:28) 6:18 (6:09;6:27) 21:30 (19:38;23:22) 11:90 (11:67;12:14) 9:02 (8:91;9:12) 28:10 (23:73;32:47) 15:66 (15:42;15:90) 12:04 (11:81;12:27) 34:61 (30:26;38:96) 19:09 (18:69;19:49) 14:72 (14:62;14:83) 42:57 (37:87;47:28) 22:90 (22:18;23:61) 17:81 (17:45;18:18) 45:84 (43:54;48:13) 26:72 (26:22;27:21) 20:75 (20:48;21:02) 54:58 (51:97;57:17) 30:54 (30:03;31:04) 23:68 (23:41;23:96) Table 1: Approximate percentiles of the test statistics for the tests of k factors vs. alternative of more than k but less than kmax factors an timator (reference here) of the corresponding percentiles of the distribution of the ratio (y 1 y j 1 ) = (y j 1 y j ) ; where y 1 ; :::; y j are the rst j eigenvalues of matrices from the Gaussian Unitary Ensemble. The sectioning estimator uses 5 equal-length sections of 10,000 i.i.d. draws of (y 1 y j 1 ) = (y j 1 y j ) : The 90% con dence intervals for the true percentiles of the distribution of (y 1 y j 1 ) = (y j 1 y j ) are given in the parenthesis. The con dence intervals reported in Table 1 underestimate the uncertainty about the true percentiles of the asymptotic distribution of our test statistics because y 1 ; :::; y j are distributed according the joint Tracy-Widom law only asymptotically, as the dimensionality of the matrix of which y i s are eigenvalues increases to in nity. We hope that the amount of the additional uncertainty due to the fact that we used matrices to simulate y i s is small and do not try to improve the uncertainty estimates. Our hope is supported by El Karoui s (2006a) nding that the cumulative distribution function of y 1 converges to that of the univariate Tracy-Widom law at the very fast rate of n 2=3 ; where n is the dimensionality of the corresponding matrix: Table 1 can be used to nd critical values of our test that the number of factors is k vs. an alternative that it is more than k but less than k max + 1: According to Theorem 3, to determine, say, 5% critical value of the test, we need to consult Table 1 and nd the 95-th percentile of the distribution of (x 1 x j 1 ) = (x j 1 x j ) ; where j = k max k + 2: For example, the 5% critical value of our test of the hypothesis that there are 3 factors vs. an alternative that there are more than 3 factors but less than 7 factors is equal to To study the nite sample properties of our test, we simulated 1000 data sets having 12

13 k max % 0:008 0:010 0:011 0:009 0:012 0:011 0:008 0:013 95% 0:044 0:058 0:050 0:056 0:048 0:058 0:053 0:070 90% 0:090 0:112 0:111 0:102 0:097 0:103 0:125 0:126 Table 2: Empirical size of the tests of 3 factors vs. di erent alternatives, n=t=100 factor structure X = F + e; where is an n r matrix with i.i.d. standard normal entries, F is a r ~ T matrix with standard normal entries (hence, the true number of factors is r), and e is an n T ~ matrix with i.i.d. N (0; ) columns, where ij = ji jj : We considered three di erent choices of n and T ~ : n; T ~ = (100; 200) ; (50; 100) ; and (25; 50) ; ten di erent choices of : = 0; 0:1; 0:2; :::; 0:9; and nine di erent choices of r : r = 3; 4; :::; 11:.To create complex-valued data sets we added the rst ~ T =2 columns of X and p 1 times the last ~ T =2 columns of X: Hence, the dimensionality of our complex data sets were nt; where T = ~ T =2: Using the simulated data we tested a hypothesis that there are 3 factors vs. alternatives that there are more than 3, but less than 5, 6,..., 12 factors. Table 2 reports the empirical size of the performed tests for n = 100; ~ T = 200; and = 0. The last 3 rows of the table correspond to the tests with theoretical size equal to 1%, 5%, and 10%. Di erent columns of the table correspond to the tests that share the null of 3 factors but have di erent alternatives indexed by k max : We see that the size distortions due to the small sample size are small. Figure 2 shows contour plots of the empirical size and power of our tests (with theoretical size 5%) for di erent choices of and n; T ~ : The contour plots are drawn in the space of (horizontal axis) and k max 1 (vertical axis). The left panel of Figure 2 shows the tests size, and the right panel shows the tests power when the true number of factors is equal to k max : The rst row of the gure corresponds to the sample size n = 100; T ~ = 200; the second row corresponds to n = 50; T ~ = 100; and the third row corresponds to n = 25; T ~ = 50: We see that the size of our tests deteriorates when the amount of dependence in the idiosyncratic terms rises. This is especially noticeable for tests corresponding to large k max and for n = 25: 13

14 Figure 2: Contour plots of the empirical size and power of the tests of 3 factors vs. alternatives of less than k max + 1 factors. Horizontal axis: : Vertical axis: k max 1: Left panel: size. Right panel: power when the true number of fators is k max : First row: n = T = 100; second row: n = T = 50; third row: n = T = 25: For n 50 and k max = 4; the size distortions are small for all : For n 50 and relatively large k max ; the size distortions are larger. The power of the tests is very good for n 50 and virtually all and k max ; but becomes small for n = 25; large ; and small k max. The fact that the power of our test deteriorates as the amount of the idiosyncratic dependence rises and the sample size falls accords with the following intuition. When the true number of factors is k max ; the eigenvalue kmax+1 measures the largest contribution of the idiosyncratic terms into the data s variance. As the amount of the dependence in the idiosyncratic terms rises, this contribution increases so that it may become comparable to the factors shares of the data s variance measured by 1 ; :::; kmax : Furthermore, since factors 14

15 are common to all idiosyncratic units, their share in the data s variance rises proportionally to the number of the cross-sectional units. Therefore, eigenvalues 1 ; :::; kmax would be relatively small when the sample size is small and the gap between k and kmax+1 can become small even for relatively small amount of the idiosyncratic dependence. Hence, for a large amount of the idiosyncratic dependence or for small sample sizes, the numerator of our test statistics k+1 kmax+1 = kmax+1 kmax+2 may become relatively small even when the true number of factors is k max. This explains deterioration of the power of our test when the amount of the idiosyncratic dependence rises or the sample size falls. To check the robustness of our test to violations of Gaussianity and time independence of the idiosyncratic terms, we perform the following three experiments. First, we use an AR(1) process e i;t = e i 1;t +" it ; where " it are i.i.d. random variables having Student s t distribution with 5 degrees of freedom, 3 to generate relatively heavy-tailed idiosyncratic terms. Second, we use the same AR(1) process but assume that " it are i.i.d. random variables having 2 (1) distribution centered so that its mean is equal to zero. Hence, in the second experiment the idiosyncratic terms are skewed to the right. Finally, we simulate idiosyncratic terms that satisfy a version of the constant conditional correlation multivariate GARCH model of Bollerslev (1990): e i;t = p h i;t u i;t ; u i;t = u i 1;t +" it ; " i;t are i.i.d. N (0; 1 2 ) ; and h it = 0:1 + 0:8h i;t 1 + 0:1u 2 i;t 1: In the rst two experiments, we simulate data using the idiosyncratic terms after normalizing them so that they have unit sample variance. In the last experiment, the theoretical unconditional variance of e it is equal to 1, so we do not normalize the idiosyncratic terms. An equivalent of Figure 2 for the simulated data looks very similar to Figure 2 for all three experiments and it is not reported here. Instead, Figure 3 reports the size and the power of the test of the null of 3 factors vs. the alternative that the number of factors is 4 for the case N = 50; ~ T = 100. The left panel of Figure 3 shows the size of the test and the right panel shows the 3 We chose to consider Student s t distribution with 5 degrees of freedom because we wanted the distribution to have 4 moments, which is a necessary condition for many Large Random Matrix theory results to hold (see Bai, 1999). 15

16 Figure 3: Size and power of our test of 3 factors vs. 4 factors. Left panel: size. Right panel: power. Horizontal axis: : N = 50; ~ T = 100: Solid line: no violations of the assumptions. Dashed line: multivarite GARCH model for the idiosyncratic terms. Dash-dot line: skewed idiosyncratic terms. Dotted line: heavy tails. 16

17 power. The solid line corresponds to the benchmark simulations when no assumptions of the paper were violated. The dashed line corresponds to the idiosyncratic terms satisfying the multivariate GARCH model. The dash-dot line corresponds to the idiosyncratic terms heavily skewed to the right. The dotted line corresponds to the idiosyncratic terms which have heavy tails. We see that the small sample properties of our test are reasonably robust to di erent violations of our assumptions. The worst distortion of the size and the power happens when heavy tails are assumed for the idiosyncratic process. However, the distortions do not become prohibitively large until the amount of the cross-sectional correlation in the idiosyncratic terms measured by becomes as large as 0.7 even in the heavy-tail case. Our next task is to compare the nite sample properties of our test and that proposed by Connor and Korajczyk (1993). The procedure of the Connor-Korajczyk test of a null of k factors vs. an alternative of k + 1 factors is as follows. First, a k-factor model and a k + 1- factor model are estimated from the data, say, by principle components estimator. Then, the average over the cross-sectional units or the squared residuals for each of the models and each time period are obtained. Let us denote these averages as 1;t and 2;t ; where the rst subscript is equal to 1 if the average corresponds to the k-factor model and it is equal to 2 if the average corresponds to the k + 1-factor model. After that, a vector with components 1;2 1 2;2 ; = 1; :::; T=2 is formed. Connor and Korajzcyk (1993) show that, under the null, when n tends to in nity and T remains xed, the asymptotic joint distribution of the components of is zero mean Gaussian with some unspeci ed covariance matrix. In contrast, under the alternative, we would expect the residuals from the k + 1-factor model to be much smaller than those from the k-factor model. Therefore, the components of should be large. Hence, Connor and Korajzcyk propose as their test statistics a ratio of the mean of the components of and the heteroskedasticity and autocorrelation robust estimate of its standard error. Under the null, such a ratio should be asymptotically distributed as a standard normal random variable as rst n goes to in nity and then T goes to in nity. We implemented the Connor-Korajzcyk test of 3 factors relative to the alternative of 4 17

18 factors using the same 1000 simulations of the real data that we used to study the nite sample properties of our test. We focused on those simulations for which the true number of factors equals 3 or 4. Figure 4 shows the size and the power of our test (thick lines) and that of the Connor-Korajzcyk test (thin lines) for di erent amounts of the idiosyncratic dependence. The left panel of the gure shows the size and the right panel of the gure shows the power of the tests. The solid lines correspond to the sample size n = 100; T ~ = 200; the solid lines with dot markers on them correspond to the sample size n = 50; T ~ = 100; and the dashed lines correspond to the smallest sample size n = 25; T ~ = 50: We see that the Connor-Korajzcyk test has much too large a size (the theoretical size is equal to 5%) for all (horizontal axis). The size of the test becomes extremely distorted even for moderate amounts of the idiosyncratic dependence. The empirical size of our test behaves strikingly better than that of the Connor-Korajzcyk test. The power of both tests is very good for small amounts of the idiosyncratic dependence. For relatively large amounts of the idiosyncratic dependence and for relatively small sample sizes, the power of our test becomes much worse than that of the Connor-Korajzcyk test. However, given the huge size distortions of the latter, such a power advantage cannot exploited. Overall, the properties of our test are much better than those of the Connor-Korajzcyk at least in our Monte Carlo experiments. Before we turn to an application of our test, we would like formulate a conjecture which is an analog Theorem 1 for the case of real data. Conjecture 1: Assume that the columns of the matrix of idiosyncratic terms e (n) are multivariate real random variables with covariance matrix :Let Assumptions 1 and 2 hold. Then, as n and T go to in nity so that n=t! b; for any nite positive integer j; the limiting joint distribution of ~ 1 ; :::; ~ j is equal to the limiting joint distribution of T 2=3 (d i 2) ; i = 1; :::; j; where d i is the i-th largest eigenvalue of a matrix from the Gaussian Orthogonal Ensemble described by Tracy and Widom (1996) (denoted as TW1).. Conjecture 1 has been supported by numerical simulations in El Karoui (2006). If the conjecture is correct, then we have: 18

19 Figure 4: Empirical size and power of our test of 3 vs. 4 factors (thick lines) and the Connor- Korajzcyk test of 3 vs. 4 factors (thin lines). Left panel: size. Right panel: power. Solid lines: n = 100; ~ T = 200: Solid lines with dot markers: n = 50; ~ T = 100: Dashed lines: n = 25; ~ T = 50: Horizontla axis: : 19

20 99% 16:56 (14:79;18:33) 95% 6:89 (6:64;71:5) 90% 4:54 (4:45;4:63) j = k max k :75 (25:97;31:53) 12:41 (12:07;12:75) 8:53 (8:44;8:62) 41:57 (36:02;47:12) 18:16 (17:49;18:83) 12:46 (12:13;12:78) 55:07 (48:49;61:65) 23:99 (23:26;24:73) 16:40 (16:18;16:62) 67:53 (61:28;73:78) 29:41 (28:33;30:50) 20:29 (20:03;20:55) 79:13 (72:87;85:40) 35:05 (34:30;35:79) 24:34 (23:98;24:71) 91:90 (74:13;109:67) 39:89 (38:27;41:52) 27:42 (26:37;28:46) 106:01 (89:03;122:99) 47:35 (45:49;49:21) 32:62 (31:87;33:37) Table 3: Approximate percentiles of the test statistics for the tests of k factors vs. alternative of more than k but less than kmax factors. Real data. an Conjecture 2. Assume that the columns of the matrix of idiosyncratic terms e (n) are multivariate real random variables with covariance matrix :Let Assumptions 1 and 2 hold. Then, if the true number of factors is equal to k; the distribution of the ratio k+1 kmax+1 = kmax+1 kmax+2 converges to the distribution of (x 1 x kmax k+1) = (x kmax k+1 x kmax k+2), where x 1 ; :::; x j are random variables with the Tracy- Widom (TW1) joint distribution. The convergence takes place as n and T go to in nity so that n=t! b: In contrast, when the true number of factors is larger than k but smaller than k max + 1; the ratio k+1 kmax+1 = kmax+1 kmax+2 diverges in probability to in nity. Table 3 below is the analog of Table 1 for the test based on Conjecture 2. 4 The number of factors in asset returns As an application of our test, we would like to test di erent hypotheses about the number of pervasive factors driving stock returns. There is a large amount of controversy about this number in the literature. For example, Connor and Korajcyk (1993) nd evidence for between one and six pervasive factors in the stock returns. Trzcinka (1986) nds some support to the existence of ve pervasive factors. Five seems also to be a preferred number for Roll and Ross (1980) and Reinganum (1981). A study by Brown and Weinstein (1983) also suggested that the number of factors is unlikely to be greater than ve: 4 Huang and Jo (1995), Bai and Ng (2002), and Onatski (2005) identify only two common factors 4 Dhrymes, Friend, and Gultekin (1984) nd that the estimated number of factors grows with the sample size. However, their setting was the classical factor model as opposed to the approximate factor model. 20

21 To test hypotheses about the number of factors in stock returns we use CRSP data on monthly returns of 171 company for a period from January 1960 to December Our data set includes those and only those companies for which CRSP provides monthly holding period return data for all months in the studied time interval. To compute the excess returns we used monthly returns on 6-month Treasury bills provided by the Board of Governors of the Federal Reserve System. The cross-sectional dimensionality of our data is n = 171; and the time series dimensionality is T ~ = 552: To get a complex valued data set, we divide the real-valued data set into two periods: the rst containing all observations from January 1960 to December 1982, and the second containing all observations from January 1983 to December Then we add the data from the rst time period and square root from -1 times the data from the second period. The dimensionality of the obtained complex-valued data set is n = 171 and T = 276: Using the obtained data, we test hypotheses that the number of factors is 0,1,...,7 vs. alternatives that the number of factors is more than the null-hypothesis number but no more than up to 8 factors. the resulting test statistics are reported in Table 4. The numbers in parentheses given below the values of the test statistics are approximate p-values of the test. The approximate p-values were computed as the proportion of 30,000 simulated draws from the null distribution that fall above the corresponding test statistics. We see that the null hypothesis of no factors vs. any of the alternatives of the form: positive number of factors no larger than k max is overwhelmingly rejected by the data. The null of one factor vs. more factors is rejected at 5% level for all corresponding alternatives except the alternative that the number of factors is larger than 1 but no larger than 4. The null of 2 factors is not rejected at 5% level for all corresponding alternatives except the alternative that the number of factors is more than 2 but no larger than 5. The situation is the same for the null of 3 and 4 factors. The nulls of 5, 6, and 7 factors are not rejected by the data even at at 29% level. To interpret these results it is useful to analyze the largest eigenvalues of the sample 21

22 k max H 0 number of factors :72 (0:008) 2 75:19 (0:000) 3 126:71 (0:000) 4 46:88 (0:002) 5 592:27 (0:000) 6 204:54 (0:000) 7 257:62 (0:000) 8 349:00 (0:000) 7:01 (0:012) 13:33 (0:010) 5:26 (0:279) 77:44 (0:000) 27:04 (0:016) 35:15 (0:011) 48:78 (0:007) 1:66 (0:286) 0:98 (0:944) 24:46 (0:005) 8:78 (0:155) 12:57 (0:121) 17:89 (0:072) 0:37 (0:923) 16:91 (0:004) 6:18 (0:171) 8:99 (0:138) 13:49 (0:087) 12:37 (0:002) 4:61 (0:132) 7:03 (0:120) 10:84 (0:080) 0:34 (0:928) 1:69 (0:682) 3:62 (0:449) 1:25 (0:401) 3:04 (0:295) 1:35 (0:360) Table 4: Value of test statistics and approximate p-values for the tests of the number of factors in the second row vs. alternative of more factors but less than or equal to the number in the rst column covariance matrix of the data. Note that the eigenvalues can be interpreted as reductions in the mean square error which result from using an extra principal component to model systematic part of the data. The eigenvalues turned out to be (in percentage units relative to the largest eigenvalue): 100:00; 18:40; 10:00; 8:80; 8:09; 6:12; 5:96; 5:51; 5:14; 4:87; etc. The rejection of the null of no factors means that the gap between the explanatory power of the rst principal component and any consequent principal components is too large relative to the gaps between the explanatory power of more distant components. It is too large in the sense that such a gap is not consistent with the null that all principal components, including the rst one, represent just idiosyncratic in uences. The rst principal component must, therefore, contain a systematic or pervasive in uence on the cross-sectional units. Similarly, the gap between the explanatory power of the second principal component and virtually all consequent principal components is too large (in relative terms) to be consistent with the idiosyncratic nature of the second component. 22

23 As to the nulls of 2, 3 and 4 factors, they are rejected at 5% level only vs. an alternative of no more than 5 factors. Technically, this happens because the gaps between the explanatory power of, say, the second principal component and that of the 6th, 7th, 8th, and 9th principal components do not seem large (at 5% level) relative to all gaps but the gap between the explanatory power of the 6th and the 7th components. An important question arises whether we should consider the closeness of the 6th and 7th eigenvalues as a uke in the data, and, therefore, accept the null of only two factors, or we should interpret our results as suggesting that there are likely to be ve factors. We favor the latter interpretation. It is because, although we cannot reject the nulls of 2, 3 and 4 factors vs. alternatives of more factors but less than 7, 8, and 9 factors at 5% level, we reject these nulls vs. the alternatives at about 17% level, which is not a high level. Had the relative closeness of the 6th and 7th eigenvalues been just a uke, we would not expect the p-values of the tests against the alternatives of less than 7, 8, and 9 factors to be consistently at the low end. Note in this context, that the nulls of 5, 6 and 7 factors cannot be rejected with the corresponding p-values being everywhere in the range from to To check our nding that there likely be 5 factors driving the stock returns, we run the real data tests (which are conjectured but not proven to be correct procedures) on the untransformed real data. We get very similar results to those reported in Table 4. To check robustness of the nding to the choice of the data, we run our complex-data test on the CRSP data for 1148 stocks traded on the NYSE, AMEX, and NASDAQ during the period from January 1983 to December 2003, which we use in our previous work (see Onatski (2005)). For these data, the nulls of zero and one factors are, again, overwhelmingly rejected. The nulls of 2, 3, and 4 factors are rejected (at 5% level) again only vs. the alternatives of no more than 5 factors. This time, however, the p-values corresponding the alternatives of less than 7, 8, and 9 factors are not consistently small. They span a range between and Moreover, when we run the real-data test on the new data, we still overwhelmingly 23

24 reject hypotheses of zero and one factors, but no longer reject the hypotheses of 2, 3, etc. factors at any reasonable critical levels against any alternatives (the smallest p-value being for the test of two factors vs. the alternative of more than two but no more than four factors). The partial robustness of the our nding that there likely be 5 factors in the stock returns may be due either to the fact that much more stocks were included into the second data set, or to a possibility that the number of factors driving the stock returns decreased over time so that the second data set which spans more recent data period does not support the 5-factor hypothesis as strongly as the rst data set. To check the latter possibility, we split the original data set into two periods and then run our tests separately for the complex data constructed from the rst and the second periods. For both periods the nulls of zero and one factors are rejected, as usual. However, the nulls of 2 and 3 factors are now rejected at about 5% level against two alternatives: no more than 4 factors and no more than 7 factors for both time periods. The null of 4 factors is not rejected for any conventional critical levels against any of the studied alternatives for both time periods. Hence, the 5-factor phenomenon does not hold in both subsamples. To summarize, we nd that our tests overwhelmingly reject the hypotheses that there are either no factors or only one factor driving stock returns. There is some evidence that the number of factors may be ve, but this evidence is only partially robust with respect to the choice of the stocks to be included to the data set and with respect to the di erent choice of time periods. 5 Conclusion The new test shows good nite sample properties and uses simultaneous large n- large T asymptotics. It strongly outperforms the Connor-Korajczyk (1993) test for a variety of nite sample situations. The biggest weakness of the test is that it is developed for the situation 24

25 when the data are Gaussian and independent over time. The Monte Carlo experiments show, however, that the test is robust with respect to violations of the Gaussianity and the independence assumptions. We apply the test to test di erent hypotheses about the number of pervasive factors driving US stock returns. Our test rejects the nulls of zero or only one factors against alternatives of more factors at 5% levels. We also can reject the nulls of 2, 3 and 4 factors at 5% level at least against some alternatives. We cannot reject the null of 5 factors against alternatives of more factors at 5% level. The 5-factor nding turns out to be only partially robust with respect to di erent choices of the time interval and stocks to be included into the dataset. The rejection of zero and 1 factor hypotheses is very robust. 6 Appendix Proof of Theorem 2: We will rst formulate and prove a lemma, which our proof of Theorem 2 will be based upon. Let A (1) be a symmetric non-negative de nite nn matrix and A be an nn diagonal matrix of the form A = diag (a 1 ; :::; a k ; 0; 0; :::; 0) ; a 1 a 2 ::: a k > 0: Note that A and A (1) can be interpreted as matrix representations of linear operators acting in the space R n : For any linear bounded operator on R n ; B; we de ne its norm as kbk = (max eval (B B)) 1=2 ; which is the operator norm induced by the standard Euclidean norm of vectors in R n : We denote the j-th largest by absolute value eigenvalue of B as j (B); and the j-th largest by absolute value eigenvalue of B restricted to its invariant subspace M as i (BjM). Let M 0 be the invariant subspace of A corresponding to the eigenvalue 0 and let P 0 be the orthogonal projection on M 0 : We have the following Lemma 1: Let A ({) = A + {A (1) and let r 0 = a k =2: For real { such that 0 < { < r 0 = A (1) we have: k+1 (A ({)) { 1 P 0 A (1) P 0 jm 0 j{j 2 A (1) 2 3r0 (r 0 j{j ka (1) k) 2 25

26 Proof of Lemma 1: Let R (z; {) = (A ({) zi n ) 1 be the resolvent of A ({) de ned for all complex z not equal to any of the eigenvalues of A ({) : Then R (z; 0) is the resolvent of A: We will denote R (z; 0) as R(z): Let be a positively oriented circle in the complex plain with center at 0 and radius r 0 : According to Kato (1980), pages 67-68, R (z; {) can be expanded as R (z; {) = R (z) + X 1 ( j=1 {)j R (z) A (1) R (z) j ; where the sum on the right hand side is uniformly convergent on for j{j < min z2 A (1) 1 kr (z)k = r0 = A (1) ; where the last equality follows from the fact that kr (z)k = r 1 0 (2) for any z 2 : Furthermore, for j{j < r 0 = A (1) ; there are exactly n k eigenvalues (counted as many times as their algebraic multiplicity) inside the circle : We will call these eigenvalues 0-group eigenvalues and denote them as a (1) ({) ::: a (n k) ({) : Denote the direct sum of the invariant subspaces of A ({) corresponding to the 0-group eigenvalues as M 0 ({). Let P 0 be the orthogonal projection on M 0 and P 0 ({) be a projection R on M 0 ({) de ned as P 0 ({) = 1 2i R (z; {) dz: We have: P 0 ({) = P 0 1 2i 1X Z ( {) j j=1 R (z) A (1) R(z) j dz (3) where the series absolutely converges for j{j < r 0 = A (1) (see Kato (1980), pages 75-76). Equations (3) and (2) imply, that kp 0 ({) P 0 k 1X A j{j j (1) j j=1 r j 0 = j{j A (1) r 0 j{j ka (1) k : (4) Now consider an operator ~ A (1) ({) 1 { A ({) P 0 ({) : Since M 0 ({) is the invariant subspace of A ({) corresponding to the 0-group eigenvalues we have: 1 ~A (1) ({) jm 0 ({) = 1 { a(1) ({) : (5) 26

Testing hypotheses about the number of factors in. large factor models.

Testing hypotheses about the number of factors in large factor models. Alexei Onatski Economics Department, Columbia University April 5, 2009 Abstract In this paper we study high-dimensional time series