Power Enhancement in High Dimensional Cross-Sectional Tests

Power Enhancement in High Dimensional Cross-Sectional ests arxiv:30.3899v2 [stat.me] 6 Ag 204 Jianqing Fan, Yan Liao and Jiawei Yao Department of Operations Research and Financial Engineering, Princeton University Bendheim Center for Finance, Princeton University Department of Mathematics, University of Maryland Abstract We propose a novel techniqe to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the nll hypothesis is violated only by a cople of components. Existing tests based on qadratic forms sch as the Wald statistic often sffer from low powers de to the accmlation of errors in estimating high-dimensional parameters. More powerfl tests for sparse alternatives sch as thresholding and extreme-vale tests, on the other hand, reqire either stringent conditions or bootstrap to derive the nll distribtion and often sffer from size distortions de to the slow convergence. Based on a screening techniqe, we introdce a power enhancement component, which is zero nder the nll hypothesis with high probability, bt diverges qickly nder sparse alternatives. he proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power nder sparse alternatives. he nll distribtion does not reqire stringent reglarity conditions, and is completely determined by that of the he athors are gratefl to the comments from seminar and conference participants at UChicago, Princeton, Georgetown, George Washington, 204 Econometric Society orth America Smmer Meeting, UCL workshop on High-dimensional Econometrics Models, he 204 Annal meeting of Royal Economics Society, he 204 Asian Meeting of the Econometric Society, 204 International Conference on Financial Engineering and Risk Management, and 204 Midwest Econometric Grop meeting. Address: Department of Operations Research and Financial Engineering, Sherrerd Hall, Princeton University, Princeton, J 08544, USA. Department of Mathematics, University of Maryland, College Park, MD 20742, USA. E-mail: jqfan@princeton.ed, yanliao@md.ed, jiaweiy@princeton.ed. he research was partially spported by ational Science Fondation grants DMS-206464 and DMS-406266, and ational Institte of Health grants R0GM00474-0 and R0-GM0726.

pivotal statistic. As a byprodct, the power enhancement component also consistently identifies the elements that violate the nll hypothesis. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. Keywords: sparse alternatives, thresholding, large covariance matrix estimation, Waldtest, screening, cross-sectional independence, factor pricing model JEL code: C2, C33, C58 Introdction High-dimensional cross-sectional models have received growing attentions in both theoretical and applied econometrics. hese models typically involve a strctral parameter, whose dimension can be either comparable or mch larger than the sample size. his paper addresses testing a high-dimensional strctral parameter: H 0 : θ = 0, where = dim(θ) is allowed to grow faster than the sample size. We are particlarly interested in boosting the power in sparse alternatives nder which θ is approximately a sparse vector. his type of alternative is of particlar interest, as the nll hypothesis typically represents some economic theory and violations are expected to be only by some exceptional individals. A showcase example is the factor pricing model in financial economics. Let y it be the excess retrn of the i-th asset at time t, and f t = (f t,..., f Kt ) be the excess retrns of K tradable market risk factors. hen, the excess retrn has the following decomposition: y it = θ i + b if t + it, i =,...,, t =,...,, where b i = (b i,..., b ik ) is a vector of factor loadings and it represents the idiosyncratic error. he key implication from the mlti-factor pricing theory is that the intercept θ i shold be zero, known as the mean-variance efficiency pricing, for any asset i. An important qestion is then if sch a pricing theory can be validated by empirical data, namely we wish 2

to test the nll hypothesis H 0 : θ = 0, where θ = (θ,..., θ ) is the vector of intercepts for all financial assets. As the factor pricing model is derived from theories of financial economics (Merton, 973; Ross, 976), one wold expect that inefficient pricing by the market shold only occr to a small fractions of exceptional assets. Indeed, or empirical stdy of the constitents in the S&P 500 index indicates that there are only a cople of significant nonzero-alpha stocks, corresponding to a small portion of mis-priced stocks instead of systematic mis-pricing of the whole market. herefore, it is important to constrct tests that have high power when θ is sparse. Most of the conventional tests for H 0 : θ = 0 are based on a qadratic form: W = θ V θ. Here θ is an element-wise consistent estimator of θ, and V is a high-dimensional positive definite weight matrix, often taken to be the inverse of the asymptotic covariance matrix of θ (e.g., the Wald test). After a proper standardization, the standardized W is asymptotically pivotal nder the nll hypothesis. In high-dimensional testing problems, however, varios difficlties arise when sing a qadratic statistic. First, when >, estimating V is challenging, as the sample analoge of the covariance matrix is singlar. More fndamentally, tests based on W have low powers nder sparse alternatives. he reason is that the qadratic statistic accmlates high-dimensional estimation errors nder H 0, which reslts in large critical vales that can dominate the signals in the sparse alternatives. A formal proof of this will be given in Section 3.3. o overcome the aforementioned drawbacks, this paper introdces a novel techniqe for high-dimensional cross-sectional testing problems, called the power enhancement. Let J be a test statistic that has a correct asymptotic size (e.g., Wald statistic), which may sffer from low powers nder sparse alternatives. Let s agment the test by adding a power enhancement component J 0 0, which satisfies the following three properties: Power Enhancement Properties: (a) on-negativity: J 0 0 almost srely. (b) o-size-distortion: Under H 0, P (J 0 = 0 H 0 ). (c) Power-enhancement: J 0 diverges in probability nder some specific regions of alternatives H a. 3

Or constrcted power enhancement test takes the form J = J 0 + J. he non-negativity property of J 0 ensres that J is at least as powerfl as J. Property (b) garantees that the asymptotic nll distribtion of J is determined by that of J, and the size distortion de to adding J 0 is negligible, and property (c) garantees significant power improvement nder the designated alternatives. he power enhancement principle is ths smmarized as follows: Given a standard test statistic with a correct asymptotic size, its power is sbstantially enhanced with little size distortion; this is achieved by adding a component J 0 that is asymptotically zero nder the nll, bt diverges and dominates J nder some specific regions of alternatives. An example of sch a J 0 is a screening statistic: J 0 = j Ŝ θ 2 j v j = j= θ 2 j v j { θ j > v /2 j δ, }, where Ŝ = {j : θ j > v /2 j δ, }, and v j denotes a data-dependent normalizing factor, taken as the estimated asymptotic variance of θ j. he threshold δ,, depending on (, ), is a high-criticism threshold, chosen to be slightly larger than the noise level max j θ j θ j / v /2 j so that nder H 0, J 0 = 0 with probability approaching one. In addition, we take as a pivotal statistic, e.g., standardized Wald statistic or other qadratic forms sch J as the sm of the sqared marginal t-statistics (Bai and Saranadasa, 996; Chen and Qin, 200; Pesaran and Yamagata, 202). As a byprodct, the screening set Ŝ also consistently identifies indices where the nll hypothesis is violated. One of the major differences of or test from most of the thresholding tests (Fan, 996; Hansen, 2005) is that, it enhances the power sbstantially by adding a screening statistic, which does not introdce extra difficlty in deriving the asymptotic nll distribtion. Since J 0 = 0 nder H 0, it relies on the pivotal statistic J to determine its nll distribtion. In contrast, the existing thresholding tests and extreme vale tests often reqire stringent conditions to derive their asymptotic nll distribtions, making them restrictive in econometric applications, de to slow rates of convergence. Moreover, the asymptotic nll distribtions are inaccrate at finite sample. As pointed ot by Hansen (2003), these statistics are non-pivotal even asymptotically, and reqire bootstrap methods to simlate 4

the nll distribtions. As for specific applications, this paper stdies the tests of the aforementioned factor pricing model, and of cross-sectional independence in mixed effect panel data models: y it = α + x itβ + µ i + it, i n, t. Let ρ ij denote the correlation between it and jt, assmed to be time invariant. he crosssectional independence test is concerned abot the following nll hypothesis: H 0 : ρ ij = 0, for all i j, that is, nder the nll hypothesis, the n n covariance matrix Σ of { it } i n is diagonal. In empirical applications, weak cross-sectional correlations are often present, which reslts in a sparse covariance Σ with jst a few nonzero off-diagonal elements. his reslts in a sparse vector θ = (ρ 2, ρ 3,..., ρ n,n ). he dimensionality = n(n )/2 can be mch larger than the nmber of observations. herefore, the power enhancement in sparse alternatives is very important to the testing problem. here has been a large literatre on high-dimensional cross-sectional tests. For instance, the literatre on testing the factor pricing model is fond in Gibbons et al. (989), MacKinlay and Richardson (99), Bealie et al. (2007) and Pesaran and Yamagata (202), all in qadratic forms. Moreover, for the mixed effect panel data model, most of the existing statistics in the literatre are based on the sm of sqared residal correlations, which also accmlates many off-diagonal estimation errors in the covariance matrix of ( t,..., nt ). he literatre incldes Bresch and Pagan (980), Pesaran et al. (2008), Baltagi et al. (202), etc. In addition, or problem is also related to the test with a restricted parameter space, previosly considered by Andrews (998), who improves the power by directing towards the relevant alternatives (also see Hansen (2003) for a related idea). Recently, Chernozhkov et al. (203) proposed a high-dimensional ineqality test, and employed an extreme vale statistic, whose critical vale is determined throgh applying the moderate deviation theory on an pper bond of the rejection probability. In contrast, the asymptotic distribtion of or proposed power enhancement statistic is determined throgh the pivotal statistic J, and the power is improved via screening off most of the noises nder sparse alternatives. wo of the referees kindly reminded s a related recent paper by Gagliardini et al. (20), which stdied estimating and testing abot the risk premia in a CAPM model. While we also stdy a large panel of stock retrns as a specific example and doble asymptotics (as 5

, ), the problems and approaches being considered are very different. his paper addresses a general problem of enhancing powers nder high-dimensional sparse alternatives. he remainder of the paper is organized as follows. Section 2 sets p the preliminaries and highlights the major differences from existing tests. Section 3 presents the main reslt of power enhancement test. As applications to specific cases, Section 4 and Section 5 respectively stdy the factor pricing model and test of cross-sectional independence. Simlation reslts are presented in Section 6, along with an empirical application to the stocks in the S&P 500 index in Section 7. Section 8 concldes. All the proofs are given in the appendix. hroghot the paper, for a symmetric matrix A, let λ min (A) and λ max (A) represent its minimm and maximm eigenvales. Let A 2 and A denote its operator norm and l - norm respectively, defined by A 2 = λ /2 max(a A) and max i j A ij. For a vector θ, define θ = ( j θ2 j ) /2 and θ max = max j θ j. For two deterministic seqences a and b, we write a b (or eqivalently b a ) if a = o(b ). Also, a b if there are constants C, C 2 > 0 so that C b a C 2 b for all large. Finally, we denote S 0 as the nmber of elements in a set S. 2 Power Enhancement in high dimensions his section introdces power enhancement techniqes and provides heristics to jstify the techniqes. heir differences with related ideas in the literatre are also highlighted. 2. Power enhancement Consider a testing problem: H 0 : θ = 0, H a : θ Θ a, where Θ a R \{0} is an alternative set. A typical example is Θ a = {θ : θ 0}. Sppose we observe a stationary process D = {D t } t= of size. Let J (D) be a certain test statistic, and for notational simplicity, we write J = J (D). Often J is constrcted sch that nder H 0, it has a non-degenerate limiting distribtion F : As,, J H 0 d F. (2.) 6

For the significance level q (0, ), let F q be the critical vale for J. hen the critical region is taken as {D : J > F q } and satisfies lim sp P (J > F q H 0 ) = q. (2.2), his ensres that J has a correct asymptotic size. In addition, it is often the case that J has high power against H 0 on a sbset Θ(J ) Θ a, namely, lim inf inf P (J > F q θ). (2.3), θ Θ(J ) ypically, Θ(J ) consists of those θ s, whose l 2 -norm is relatively large, as J is normally an omnibs test (e.g. Wald test). In a data-rich environment, econometric models often involve high-dimensional parameters in which dim(θ) = can grow fast with the sample size. We are particlarly interested in sparse alternatives Θ s Θ a nder which H 0 is violated only on a cople of exceptional components of θ. Specifically, when θ Θ s, the nmber of non-vanishing components is mch less than. As a reslt, its l 2 -norm is relatively small. herefore, nder sparse alternative Θ s, the omnibs test J typically has lower power, de to the accmlation of high-dimensional estimation errors. Detailed explanations are given in Section 3.3 below. We introdce a power enhancement principle for high-dimensional sparse testing, by bringing in a data-dependent component J 0 that satisfies the Power Enhancement Properties as defined in Section. he introdced component J 0 does not serve as a test statistic on its own, bt is added to a classical statistic J that is often pivotal (e.g., Wald-statistic), so the proposed test statistic is defined by J = J 0 + J. Or introdced power enhancement principle is explained as follows.. he critical region of J is defined by {D : J > F q }. As J 0 0, P (J > F q θ) P (J > F q θ) for all θ Θ a. Hence the power of J is at least as large as that of J. 7

2. When θ Θ s is a sparse high-dimensional vector nder the alternative, the classical test J may have low power as θ is typically relatively small. On the other hand, for θ Θ s, J 0 stochastically dominates J. As a reslt, P (J > F q θ) > P (J > F q θ) strictly holds, so the power of J over the set Θ s is enhanced after adding J 0. Often J 0 diverges fast nder sparse alternatives Θ s, which ensres P (J > F q θ) for θ Θ s. In contrast, the classical test only has P (J > F q θ) < c < for some c (0, ) and θ Θ s, and when θ is sfficiently small, P (J > F q θ) is approximately q. 3. Under mild conditions, P (J 0 = 0 H 0 ) 0. Hence when (2.) is satisfied, we have lim sp P (J > F q H 0 ) = q., herefore, adding J 0 to J does not affect the size of the standard test statistic asymptotically. Both J and J have the same limiting distribtion nder H 0. It is important to note that the power is enhanced withot sacrificing the size asymptotically. In fact the power enhancement principle can be asymptotically flfilled nder a weaker condition J 0 H 0 p 0. However, we constrct J 0 so that P (J = 0 H 0 ) to ensre a good finite sample size. 2.2 Constrction of power enhancement component We constrct a specific power enhancement component J 0 that satisfies (a)-(c) of the power enhancement properties simltaneosly, and identify the sparse alternatives in Θ s. Sch a component can be constrcted via screening as follows. Sppose we have a consistent estimator θ sch that max j θ j θ j = o P (). For some slowly growing seqence δ, (as, ), define a screening set: Ŝ = {j : θ j > v /2 j δ,, j =,..., }, (2.4) where v j > 0 is a data-dependent normalizing constant, often taken as the estimated asymptotic variance of θ j. he seqence δ,, called high criticism, is chosen to be slightly larger than the maximm-noise-level, satisfying: (recall that Θ a denotes the alternative set) inf P (max θ j θ j / v /2 j < δ, /2 θ) (2.5) θ Θ a {0} j 8

for θ nder both nll and alternate hypotheses. he screening statistic J 0 is then defined as J 0 = j Ŝ θ 2 j v j = j= θ 2 j v j { θ j > v /2 j δ, }. By (2.4) and (2.5), nder H 0 : θ = 0, P (J 0 = 0 H 0 ) P (Ŝ = H 0) = P (max j θ j / v /2 j δ, H 0 ). herefore J 0 satisfies the non-negativeness and no-size-distortion properties. Let {v j } j be the poplation conterpart of { v j } j. For instance, one can take v j as the asymptotic variance of θ j, and v j as its estimator. o satisfy the power-enhancement property, note that the screening set mimics { } S(θ) = j : θ j > 2v /2 j δ,, j =,...,, (2.6) and in particlar S(0) =. We shall show in heorem 3. below that P (Ŝ = S(θ) θ), for all θ Θ a {0}. hs, the sbvector θŝ = ( θ j : j Ŝ) behaves like θ S = (θ j : j S(θ)), which can be interpreted as estimated significant signals. If S(θ), then by the definition of Ŝ and δ,, we have P (J 0 > S(θ) ) P ( j Ŝ δ 2, > S(θ) ). hs, the power of J is enhanced on the sbset Θ s {θ R : S(θ) } = {θ R : max j θ j v /2 j > 2δ, }. As a byprodct, the screening set consistently identifies the elements of θ that violate the nll hypothesis. he introdced J 0 can be combined with any other test statistic with an accrate asymptotic size. Sppose J is a classical test statistic. Or power enhancement test is simply J = J 0 + J. For instance, sppose we can consistently estimate the asymptotic inverse covariance matrix 9

of θ, denoted by var( θ), then J can be chosen as the standardized Wald-statistic: J = θ var( θ) θ 2. As a reslt, the asymptotic distribtion of J is (0, ) nder the nll hypothesis. In sparse alternatives where θ may not grow fast with bt θ Θ s, the combined test J 0 + J can be very powerfl. In contrast, we will formally show in heorem 3.4 below that the conventional Wald test J can have very low power on its own. On the other hand, when the alternative is dense in the sense that θ grows fast with, the conventional test J itself is consistent. In this case, J is still as powerfl as J. herefore, if we denote Θ(J ) R /{0} as the set of alternative θ s against which the classical J test has power converging to one, then the combined J = J 0 + J test has power converging to one against θ on Θ s Θ(J ). We shall show in Section 3 that the power is enhanced niformly over θ Θ s Θ(J ). 2.3 Comparisons with thresholding and extreme-vale tests One of the fndamental differences between or power enhancement component J 0 and existing tests with good power nder sparse alternatives is that, existing test statistics have a non-degenerate distribtion nder the nll, and often reqire either bootstrap or strong conditions to derive the nll distribtion. Sch convergences are typically slow and the serios size distortion appears at finite sample. In contrast, or screening statistic J 0 ses high criticism seqence δ, to make P (J 0 = 0 H 0 ), hence does not serve as a test statistic on its own. herefore, the asymptotic nll distribtion is determined by that of J, which may not be difficlt to derive especially when J is asymptotically pivotal. As we shall see in sections below, the reqired reglarity condition is relatively mild, which makes the power enhancement test applicable to many econometric problems. In the high-dimensional testing literatre, there are mainly two types of statistics with good power nder sparse alternatives: extreme vale test and thresholding test respectively. he test based on extreme vales stdies the maximm deviation from the nll hypothesis across the components of θ = ( θ,..., θ ), and forms the statistic based on max j θ j w j δ for some δ > 0 and a weight w j (e.g., Cai et al. (203), Chernozhkov et al. (203)). Sch a test statistic typically converges slowly to its asymptotic conterpart. An alternative test is 0

based on thresholding: for some δ > 0 and pre-determined threshold level t, R = j= θ j w j δ { θ j > t w j } (2.7) he accmlation of estimation errors is prevented de to the threshold { θ j > t w j } (see, e.g., Fan (996) and Zhong et al. (203)) for sfficiently large t. In a low-dimensional setting, Hansen (2005) sggested sing a threshold to enhance the power in a similar way. Althogh (2.7) looks similar to J 0, the ideas behind are very different. Both extreme vale test and thresholding test reqire reglarity conditions that may be restrictive in econometric applications. For instance, it can be difficlt to employ the central limit theorem directly on (2.7), as it reqires the covariance between θ j and θ j+k decay fast enogh as k (Zhong et al., 203). In cross-sectional testing problems, this essentially reqires an explicit ordering among the cross-sectional nits which is, however, often navailable in panel data applications. In addition, as (2.7) involves effectively limited terms of smmations de to thresholding, the asymptotic theory does not provide adeqate approximations, reslting size-distortion in applications. For example, when t is taken slightly less than max j θ j /w j, R becomes the extreme statistic. When t is small (e.g. 0), R becomes a traditional test, which is not powerfl in detecting sparse alternatives, thogh it can have good size properties. 3 Asymptotic properties 3. Main reslts his section presents the reglarity conditions and formally establishes the claimed power enhancement properties. Below we se P ( θ) to denote the probability measre defined from the sampling distribtion with parameter θ. Let Θ R be the parameter space of θ. When we write inf θ Θ P ( θ), the infimm is taken in the space that covers the nion of both nll and alternative space. We begin with a high-level assmption. In specific applications, they can be verified with primitive conditions. Assmption 3.. As,, the seqence δ,, and the estimators { θ j, v j } j are sch that

(i) inf θ Θ P (max j θ j θ j / v /2 j < δ, /2 θ) ; (ii) inf θ Θ P (4/9 < v j /v j < 6/9, j =,..., θ). he normalizing constant v j is often taken as the asymptotic variance of θ j, with v j being its consistent estimator. he constants 4/9 and 6/9 in condition (ii) are not optimally chosen, as this condition only reqires { v j } j be not-too-bad estimators of their poplation conterparts. In many high-dimensional problems with strictly stationary data that satisfy strong mixing conditions, following from the large-deviation theory, typically, max j θ j θ j / v /2 j = O P ( log ). herefore, we shall fix δ, = log(log ) log, (3.) which is a high criticism that slightly dominates the standardized noise level. provide primitive conditions for this choice of δ, in the sbseqent sections, so that Assmption 3. holds. We shall Recall that Ŝ and S(θ) are defined by (2.4) and (2.6) respectively for a given θ Θ and its consistent estimator θ. { } In particlar, S(θ) = j : θ j > 2v /2 j δ,, j =,...,, so nder H 0 : θ = 0, S(θ) =. ote that Θ denotes the parameter space containing both the nll and alternative hypotheses. he following theorem characterizes the asymptotic behavior of J 0 = θ 2 j Ŝ j v j nder both the nll and alternative hypotheses. Define the grey area set as G(θ) = {j : θ j /v /2 j δ,, j =,..., }. heorem 3.. Let Assmption 3. hold. As,, we have, nder H 0 : θ = 0, P (Ŝ = H 0). Hence In addition, P (J 0 = 0 H 0 ) and inf P (J 0 > θ). {θ Θ:S(θ) } inf P (S(θ) Ŝ θ) and inf P (Ŝ \ S(θ) G(θ) θ). θ Θ θ Θ Besides the asymptotic behavior of J 0, heorem 3. also provides a sre screening property of Ŝ. Sometimes we wish to find ot the identities of the elements in S(θ), which 2

represent the components of θ that deviate from zero. herefore, we are particlarly interested in a type of alternative hypothesis that satisfies the following empty grey area condition. Assmption 3.2 (Empty grey area). For any θ Θ, G(θ) =. heorem 3. shows that the large θ j s can be selected with no missing discoveries and Corollary 3. below frther asserts that the selection is consistent with no false discoveries either, nder both the nll and alternative hypotheses. Corollary 3.. Under Assmptions 3., 3.2, as,, inf P (Ŝ = S(θ) θ). θ Θ Proof. Corollary 3. follows immediately from heorem 3. and Assmption 3.2: inf P (Ŝ \ S(θ) = θ) inf P (Ŝ \ S(θ) G(θ) θ). θ Θ θ Θ Remark 3.. Corollary 3. and its reqired assmptions (Assmptions 3. and 3.2) are stated niformly over θ Θ. he empty grey area condition (Assmption 3.2) rles ot θ s that have components on the bondary of the screening set. Intitively, when a component θ j is on the bondary of the screening, it is hard to decide whether or not to eliminate it from the screening step. ote that the bondary of the screening depends on (, ), which is similar in spirit to the local alternatives in classical testing problems, and is also a common practice for asymptotic analysis of high-dimensional tests (e.g., Cai et al. (200); Chernozhkov et al. (203)). We are now ready to formally show the power enhancement argment. he enhancement is achieved niformly on the following set: Θ s = {θ Θ : max j θ j v /2 j > 2δ, }. (3.2) In particlar, if θ j is -consistent, and v /2 j is the asymptotic standard deviation of θ j, then σ j = v j is bonded away from both zero and infinity. Using (3.), we have log Θ s = {θ Θ : max θ j /σ j > 2 log(log ) j }. 3

his is a relatively weak condition on the strength of the maximal signal in order to be detected by J 0. A test is said to have high power niformly on a set Θ R \ {0} if inf P (reject H θ Θ 0 by the test θ). For a given distribtion fnction F, let F q denote its qth qantile. heorem 3.2. Let Assmptions 3.-3.2 hold. Sppose there is a test J sch that (i) it has an asymptotic non-degenerate nll distribtion F, and the critical region takes the form {D : J > F q } for the significance level q (0, ), (ii) it has high power niformly on some set Θ(J ) Θ, (iii) there is c > 0 so that inf θ Θs P (c + J > F q θ), as,. hen the power enhancement test J = J 0 + J has the asymptotic nll distribtion F, and has high power niformly on the set Θ s Θ(J ): as, inf P (J > F q θ). θ Θ s Θ(J ) he three reqired conditions for J are easy to nderstand: Conditions (i) and (ii) respectively reqire the size and power conditions for J. Condition (iii) reqires J be dominated by J 0 nder Θ s. his condition is not restrictive since J is typically standardized (e.g., Donald et al. (2003)). heorem 3.2 also shows that J and J have the critical regions {D : J > F q } and {D : J > F q } respectively, bt the power is enhanced from Θ(J ) to Θ s Θ(J ). In highdimensional testing problems with a fast-growing dimension, Θ s Θ(J ) can be mch larger than Θ(J ). As a reslt, the power of J can be sbstantially enhanced by adding J 0. 3.2 Power enhancement for qadratic tests As an example of J, we consider the widely sed qadratic test statistic, which is asymptotically pivotal: J Q = θ V θ ( + µ, ) ξ,, 4

where µ, and ξ, are deterministic seqences that may depend on (, ) and µ, 0, ξ, ξ (0, ). he weight matrix V is positive definite, whose eigenvales are bonded away from both zero and infinity. Here V is often taken to be the inverse of the asymptotic covariance matrix of θ. Other poplar choices are V = diag(σ 2,, σ 2 ) with σ j = v j (Bai and Saranadasa, 996; Chen and Qin, 200; Pesaran and Yamagata, 202) and V = I, the identity matrix. We set J = J Q, whose power enhancement version is J = J 0 +J Q. For the moment, we shall assme V to be known, and jst focs on the power enhancement properties. We will deal with nknown V for testing factor pricing problem in the next section. Assmption 3.3. (i) here is a non-degenerate distribtion F so that nder H 0, J Q d F (ii) he critical vale F q = O() and the critical region of J Q is {D : J Q > F q }, (iii) V is positive definite, and there exist two positive constants C and C 2 sch that C λ min (V) λ max (V) C 2. (iv) C 3 v j C 4, j =,..., for positive constants C 3 and C 4. Analyzing the power properties of J Q and applying heorem 3.2, we obtain the following theorem. Recall that δ, and Θ s are defined by (3.) and (3.2). heorem 3.3. Under Assmptions 3.-3.3, the power enhancement test J = J 0 + J Q satisfies: as,, (i) nder the nll hypothesis H 0 : θ = 0, J d F, (ii) there is C > 0 so that J has high power niformly on the set Θ s {θ Θ : θ 2 > Cδ 2, / } Θ s Θ(J Q ); that is, inf θ Θs Θ(J Q ) P (J > F q θ) for any q (0, ). 3.3 Low power of qadratic statistics nder sparse alternatives When J Q is sed on its own, it can sffer from a low power nder sparse alternatives if grows mch faster than the sample size, even thogh it has been commonly sed in the econometric literatre. Mainly, θ V θ aggregates high-dimensional estimation errors nder H 0, which become large with a non-negligible probability and potentially override the sparse signals nder the alternative. he following reslt gives this intition a more precise description. 5

o simplify or discssion, we shall focs on the Wald-test with V being the inverse of the asymptotic covariance matrix of θ, assmed to exist. standardized θ V θ to be asymptotically normal nder H 0 : Specifically, we assme the θ V θ 2 H 0 d (0, ). (3.3) his is one of the most commonly seen cases in varios testing problems. entries of V are given by {v j } j. he diagonal heorem 3.4. Sppose that (3.3) holds with V < C and V < C for some C > 0. Under Assmptions 3.- 3.3, = o( ) and log = o( γ ) for some 0 < γ <, the qadratic test J Q has low power at the sparse alternative Θ b given by Θ b = {θ Θ : {θ j 0} = o( / ), θ max = O()}. j= In other words, θ Θ b, for any significance level q, lim P (J Q > z q θ) = q,, where z q is the qth qantile of standard normal distribtion. In the above theorem, the alternative is a sparse vector. However, sing the qadratic test itself, the asymptotic power of the test is as low as q. his is becase the signals in the sparse alternative are dominated by the aggregated high-dimensional estimation errors: i:θ i =0 θ 2 i. In contrast, the nonzero components of θ (fixed constants) are actally detectable by sing J 0. he power enhancement test J 0 + J Q takes this into accont, and has a sbstantially improved power. 4 Application: esting Factor Pricing Models 4. he model he mlti-factor pricing model, derived by Ross (976) and Merton (973), is one of the most fndamental reslts in finance. It postlates how financial retrns are related to market risks, and has many important practical applications. Let y it be the excess retrn of 6

the i-th asset at time t and f t = (f t,..., f Kt ) be the observable excess retrns of K market risk factors. hen, the excess retrn has the following decomposition: y it = θ i + b if t + it, i =,...,, t =,...,, where b i = (b i,..., b ik ) is a vector of factor loadings and it represents the idiosyncratic error. o make the notation consistent, we pertain to se θ to represent the commonly sed alpha in the finance literatre. he key implication from the mlti-factor pricing theory for tradable factors is that nder no-arbitrage restrictions, the intercept θ i shold be zero for any asset i (Ross, 976; Merton, 973; Chamberlain and Rothschild, 983). An important qestion is then testing the nll hypothesis H 0 : θ = 0, (4.) namely, whether the factor pricing model is consistent with empirical data, where θ = (θ,..., θ ) is the vector of intercepts for all financial assets. One typically picks five-year monthly data, becase the factor pricing model is technically a one-period model whose factor loadings can be time-varying; see Gagliardini et al. (20) on how to model the timevarying effects sing firm characteristics and market variables. As the theory of the factor pricing model applies to all tradable assets, rather than a handfl selected portfolios, the nmber of assets shold be mch larger than. his ameliorates the selection biases in the constrction of testing portfolios. On the other hand, if the theory does not hold, it is expected that there are only a few significant nonzero components of θ, corresponding to a small portion of mis-priced stocks instead of systematic mis-pricing of the whole market. Or empirical stdies on the S&P500 index lend frther spport to sch kinds of sparse alternatives, nder which there are only a few nonzero components of θ compared to. Most existing tests to the problem (4.) are based on the qadratic statistic W = θ V θ, where θ is the OLS estimator for θ, and V is some positive definite matrix. Prominent examples are given by Gibbons et al. (989), MacKinlay and Richardson (99) and Bealie et al. (2007). When is possibly mch larger than, Pesaran and Yamagata (202) showed that, nder reglarity conditions (Assmption 4. below), J = a f, θ Σ θ d (0, ). 2 where a f, > 0 is a constant that depends only on factors empirical moments, and Σ is the 7

covariance matrix of t = ( t,..., t ), assmed to be time-invariant. Recently, Gagliardini et al. (20) propose a novel approach to modeling and estimating time-varying risk premims sing two-pass least-sqares method nder asset pricing restrictions. heir problems and approaches differ sbstantially from ors, thogh both papers stdy similar problems in finance. As a part of their model validation, they develop test statistics against the asset pricing restrictions and weak risk factors. heir test statistics are based on a weighted sm of sqared residals of the cross-sectional regression, which, like all classical test statistics, have power only when there are many violations of the asset pricing restrictions. hey do not consider the isse of enhancing the power nder sparse alternatives, nor do they involve a Wald statistic that depends on a high-dimensional covariance matrix. In fact, their testing power can be enhanced by sing or techniqes. 4.2 Power enhancement component We propose a new statistic that depends on (i) the power enhancement component J 0, and (ii) a feasible Wald component based on a consistent covariance estimator for Σ, which controls the size nder the nll even when /. Denote by f = t= f t and w = ( t= f tf t) f. Also define a f, = f w, and a f = Ef t(ef t f t) Ef t. he OLS estimator of θ can be expressed as θ = ( θ,..., θ ), θj = a f, y jt ( f tw). (4.2) t= When cov(f t ) is positive definite, nder mild reglarity conditions (Assmption 4. below), a f, consistently estimates a f, and a f > 0. In addition, withot serial correlations, the conditional variance of θ j (given {f t }) converges in probability to v j = var( jt )/( a f ), which can be estimated by v j based on the residals of OLS estimator: v j = û 2 jt/( a f, ), where û jt = y jt θ j b jf t. t= 8

We show in Proposition 4. below that max j θ j θ j / v /2 j = O P ( log ). herefore, δ, = log(log ) log slightly dominates the maximm estimation noise. he screening set and the power enhancement component are defined as Ŝ = {j : θ j > v /2 j δ,, j =,..., }, and J 0 = j Ŝ θ 2 j v j. 4.3 Feasible Wald test in high dimensions Assming no serial correlations among { t } t= and conditional homoskedasticity (Assmption 4. below), given the observed factors, the conditional covariance of θ is Σ /( a f, ). statistic is If the covariance matrix Σ of t were known, the standardized Wald test a f, θ Σ θ. (4.3) 2 Under H 0 : θ = 0, it converges in distribtion to (0, ). ote that the idiosyncratic errors ( t,..., t ) are often cross-sectionally correlated, which leads to a non-diagonal inverse covariance matrix Σ. When /, it is practically difficlt to estimate Σ, as there are O( 2 ) free off-diagonal parameters. o consistently estimate Σ when /, withot parametrizing the off-diagonal elements, we assme Σ = cov( t ) be a sparse matrix. his assmption is natral for large covariance estimations for factor models, and was previosly considered by Fan et al. (20). Since the common factors dictate preliminarily the co-movement across the whole panel, a particlar asset s idiosyncratic shock is sally correlated significantly only with a few of other assets. For example, some shocks only exert inflences on a particlar indstry, bt are not pervasive for the whole economy (Connor and Korajczyk, 993). Following the approach of Bickel and Levina (2008), we can consistently estimate Σ via thresholding: let s ij = t= ûitû jt. Define the covariance estimator as ( Σ s ij, if i = j, ) ij = h ij (s ij ), if i j, where h ij ( ) is a generalized thresholding fnction (Antoniadis and Fan, 200; Rothman et al., 9

log 2009), with threshold vale τ ij = C(s ii s jj )/2 for some constant C > 0, designed to keep only the sample correlation whose magnitde exceeds C( log )/2. he hard-thresholding fnction, for example, is h ij (x) = x{ x > τ ij }, and many other thresholding fnctions sch as soft-thresholding and SCAD (Fan and Li, 200) are specific examples. In general, h ij ( ) shold satisfy: (i) h ij (z) = 0 if z < τ ij ; (ii) h ij (z) z τ ij ; (iii) there are constants a > 0 and b > sch that h ij (z) z aτ 2 ij if z > bτ ij. he thresholded covariance matrix estimator sets most of the off-diagonal estimation noises in ( t= ûitû jt ) to zero. As stdied in Fan et al. (203), the constant C in the threshold can be chosen in a data-driven manner so that Σ is strictly positive definite in finite sample even when >. With Σ, we are ready to define the feasible standardized Wald statistic: J wald = a f, θ, (4.4) 2 θ Σ whose power can be enhanced nder sparse alternatives by: J = J 0 + J wald. (4.5) 4.4 Does the thresholded covariance estimator affect the size? A natral bt technical qestion to address is that when Σ indeed admits a sparse strctre, is the thresholded estimator Σ accrate enogh so that the feasible J wald is still asymptotically normal? he answer is affirmative if (log ) 4 = o( 2 ), and still we can allow /. However, sch a simple qestion is far more technically involved than anticipated, as we now explain. When Σ is a sparse matrix, nder reglarity conditions (Assmption 4.2 below), Fan et al. (20) showed that Σ log Σ 2 = O P ( ). (4.6) By the lower bond derived by Cai et al. (200), the convergence rate is minimax optimal for the sparse covariance estimation. On the other hand, when replacing Σ 20 in (4.3) by

Σ, one needs to show that the effect of sch a replacement is asymptotically negligible, namely, nder H 0, θ (Σ Σ ) θ/ = o P (). (4.7) However, when θ = 0, with carefl analysis, θ 2 = O P (/ ). Using this and (4.6), by the Cachy-Schwartz ineqality, we have θ (Σ Σ ) θ / log = O P ( ). We see that it reqires log = o( ) to converge, which is basically a low-dimensional scenario. he above simple derivation ses, however, a Cachy-Schwartz bond, which is too crde for a large. In fact, θ (Σ Σ ) θ is a weighted estimation error of Σ Σ, where the weights θ average down the accmlated estimation errors in estimating elements of Σ, and reslt in an improved rate of convergence. he formalization of this argment reqires frther reglarity conditions and novel technical argments. hese are formally presented in the following sbsection. 4.5 Reglarity conditions We are now ready to present the reglarity conditions. hese conditions are imposed for three technical prposes: (i) Achieving the niform convergence for θ θ as reqired in Assmption 3., (ii) defining the sparsity of Σ so that Σ is consistent, and (iii) showing (4.7), so that the errors from estimating Σ do not affect the size of the test. Let F 0 and F denote the σ-algebras generated by {f t : t 0} and {f t : t } respectively. In addition, define the α-mixing coefficient α( ) = sp A F 0,B F P (A)P (B) P (AB). Assmption 4.. (i) { t } t is i.i.d. (0, Σ ), where both Σ and Σ are bonded; (ii) {f t } t is strictly stationary, independent of { t } t, and there are r, b > 0 so that max i K P ( f it > s) exp( (s/b ) r ). 2

(iii) here exists r 2 > 0 sch that r + r 2 > 0.5 and C > 0, for all Z +, α( ) exp( C r 2 ). (iv) cov(f t ) is positive definite, and max i b i < c for some c > 0. Some remarks are in order for the conditions in Assmption 4.. Remark 4.. Condition (i), perhaps somewhat restrictive, sbstantially facilitates or technical analysis. Here t is reqired to be serially ncorrelated across t. Under this condition, the conditional covariance of θ, given the factors, has a simple expression Σ /( a f, ). On the other hand, if serial correlations are present in t, there wold be additional atocovariance terms in the covariance matrix, which need to be frther estimated via reglarizations. Moreover, given that Σ is a sparse matrix, the Gassianity ensres that most of the idiosyncratic errors are cross-sectionally independent so that cov( 2 it, l jt) = 0, l =, 2, for most of the pairs in {(i, j) : i j}. ote that we do allow the factors {f t } t to be weakly correlated across t, bt satisfy the strong mixing condition Assmption 4. (iii). Remark 4.2. he conditional homoskedasticity E( t t f t ) = E( t t) is assmed, granted by condition (ii). We admit that handling conditional heteroskedasticity, while important in empirical applications, is very technically challenging in or context. Allowing the highdimensional covariance matrix E( t t f t ) to be time-varying is possible with sitable continm of sparse conditions on the time domain. In that case, one can reqire the sparsity condition to hold niformly across t and continosly apply thresholding. However, nlike in the traditional case, technically, estimating the family of large inverse covariances {E( t t f t ) : t =, 2,...} niformly over t is highly challenging. As we shall see in the proof of Proposition 4.2, even in the homoskedastic case, proving the effect of estimating Σ to be first-order negligible when / reqires delicate technical analysis. o characterize the sparsity of Σ in or context, define m = max i {(Σ ) ij 0}, j= D = i j {(Σ ) ij 0}. Here m represents the maximm nmber of nonzeros in each row, and D represents the total nmber of nonzero off-diagonal entries. Formally, we assme: 22

Assmption 4.2. Sppose /2 (log ) γ = o( ) for some γ > 2, and (i) min (Σ) ij 0 (Σ ) ij (log )/ ; (ii) at least one of the following cases holds: (a) D = O( /2 ), and m 2 = O( (b) D = O(), and m 2 = O(). ) /2 (log ) γ As reglated in Assmption 4.2, we consider two kinds of sparse matrices, and develop or reslts for both cases. In the first case (Assmption 4.2 (ii)(a)), Σ is reqired to have no more than O( /2 ) off-diagonal nonzero entries, bt allows a diverging m. One typical example of this case is that there are only a small portion (e.g., finitely many) of firms whose individal shocks ( it ) are correlated with many other firms. In the second case (Assmption 4.2(ii)(b)), m shold be bonded, bt Σ can have O() off-diagonal nonzero entries. his allows block-diagonal matrices with finite size of blocks or banded matrices with finite nmber of bands. his case typically arises when firms individal shocks are correlated only within indstries bt not across indstries. Moreover, we reqire /2 (log ) γ = o( ), which is the price to pay for estimating a large error covariance matrix. Bt still we allow /. It is also reqired that the minimal signal for the nonzero components be larger than the noise level (Assmption 4.2 (i)), so that nonzero components are not thresholded off when estimating Σ. 4.6 Asymptotic properties he following reslt verifies the niform convergence reqired in Assmption 3. over the entire parameter space that contains both the nll and alternative hypotheses. Recall that the OLS estimator and its asymptotic standard error are defined in (4.2). Proposition 4.. Sppose the distribtion of (f t, t ) is independent of θ. Under Assmption 4., for δ, = log(log ) log, as,, inf P (max θ j θ j / v /2 j < δ, /2 θ). θ Θ j inf P (4/9 < v j/v j < 6/9, j =,..., θ). θ Θ Proposition 4.2. Under Assmptions 3.2, 4., 4.2, and H 0, J wald = a f, θ Σ θ d (0, ). 2 23

As shown, the effect of replacing Σ by its thresholded estimator is asymptotically negligible and the size of the standard Wald statistic can be well controlled. We are now ready to apply heorem 3.3 to obtain the asymptotic properties of J = J 0 + J wald as follows. For δ, = log(log ) log, let /2 θ j Θ s = {θ Θ : max j var /2 ( jt ) > 2a /2 f δ, }, Θ(J wald ) = {θ Θ : θ 2 > Cδ, 2 / }. heorem 4.. Sppose the assmptions of Propositions 4. and 4.2 hold. (i) Under the nll hypothesis H 0 : θ = 0, as,, P (J 0 = 0 H 0 ) 0, J wald d (0, ), and hence J = J 0 + J wald d (0, ). (ii) here is C > 0 so that for any q (0, ), as,, inf θ Θ s P (J 0 > θ), inf P (J wald > z q θ), θ Θ(J wald ) and hence inf P (J > z q θ), θ Θ s Θ(J wald ) where z q denotes the qth qantile of the standard normal distribtion. We see that the power is sbstantially enhanced after J 0 is added, as the region where the test has power is enlarged from Θ(J wald ) to Θ s Θ(J wald ). 5 Application: esting Cross-Sectional Independence 5. he model Consider a mixed effect panel data model y it = α + x itβ + µ i + it, i n, t, 24

where the idiosyncratic error it is assmed to be Gassian. he regressor x it cold be correlated with the individal random effect µ i, bt is ncorrelated with it. Let ρ ij denote the correlation between it and jt, assmed to be time invariant. he goal is to test the following hypothesis: H 0 : ρ ij = 0, for all i j, that is, whether the cross-sectional dependence is present. It is commonly known that the cross-sectional dependence leads to efficiency loss for OLS, and sometimes it may even case inconsistent estimations (Andrews, 2005). hs testing H 0 is an important problem in applied panel data models. If we let = n(n )/2, and let θ = (ρ 2,..., ρ n, ρ 23,..., ρ 2n,..., ρ n,n ) be an vector stacking all the mtal correlations, then the problem is eqivalent to testing abot a high-dimensional vector H 0 : θ = 0. ote that often the cross-sectional dependences are weakly present. Hence the alternative hypothesis of interest is often a sparse vector θ, corresponding to a sparse covariance matrix Σ of it. Most of the existing tests are based on the qadratic statistic W = i<j ρ2 ij = θ θ, where ρ ij is the sample correlation between it and jt, estimated by the within-ols (Baltagi, 2008), and θ = ( ρ 2,..., ρ n,n ). Pesaran et al. (2008) and Baltagi et al. (202) stdied the rescaled W, and showed that after a proper standardization, the rescaled W is asymptotically normal when both n,. However, the qadratic test sffers from a low power if Σ is a sparse matrix nder the alternative. In particlar, as is shown in heorem 3.4, when n/, the qadratic test cannot detect the sparse alternatives with i<j {ρ ij 0} = o(n/ ), which is very restrictive. Sch a sparse strctre is present, for instance, when Σ is a block-diagonal sparse matrix with finitely many blocks and finite block sizes. 5.2 Power enhancement test Following the conventional notation of panel data models, let ỹ it = y it t= y it, x it = x it t= x it, and ũ it = it t= it. hen ỹ it = x itβ + ũ it. he within- OLS estimator β is obtained by regressing ỹ it on x it, which leads to the estimated residal û it = ỹ it x it β. hen ρ ij is estimated by ρ ij = σ ij, σ σ /2 ii σ /2 ij = jj û it û jt. t= 25

For the within-ols, the asymptotic variance of ρ ij is given by v ij = ( ρ 2 ij) 2 /, and is estimated by v ij = ( ρ 2 ij) 2 /. herefore the screening statistic for the power enhancement test is defined as J 0 = (i,j) Ŝ ρ 2 ij v ij, Ŝ = {(i, j) : ρ ij / v /2 ij > δ,, i < j n}. (5.) where δ, = log(log ) log as before. he set Ŝ screens off most of the estimation errors. o control the size, we employ Baltagi et al. (202) s bias-corrected qadratic statistic: J = ( ρ 2 n ij ) n(n ) 2( ). (5.2) i<j Under reglarity conditions (Assmptions 5., 5.2 below), J d (0, ) nder H 0. hen the power enhancement test can be constrcted as J = J 0 + J. he power is sbstantially enhanced to cover the region Θ s = {θ : max i<j ρij ρ 2 ij > 2 log(log ) log }, (5.3) in addition to the region detectable by J itself. As a byprodct, it also identifies pairs (i, j) for ρ ij 0 throgh Ŝ. Empirically, this set helps s nderstand better the nderlying pattern of cross-sectional correlations. 5.3 Asymptotic properties In order for the power to be niformly enhanced, the parameter space of θ = (ρ 2,..., ρ n, ρ 23,..., ρ 2n,..., ρ n,n ) is reqired to be: θ is element-wise bonded away from ±: there is ρ max (0, ), Θ = {θ R : θ max ρ max }. We denote E( r it θ) as the rth moment of it when the correlation vector of the nderlying data generating process is θ. he following reglarity conditions are imposed. Assmption 5.. here are C, C 2 > 0, so that (i) sp θ Θ i j n E x it x jte( it jt θ) < C n, 26

(ii) sp θ Θ max j n E( 4 jt θ) < C, inf θ Θ min j n E( 2 jt θ) > C 2, Condition (i) is needed for the within-ols to be n -consistent (see, e.g., Baltagi (2008)). It is sally satisfied by weak cross-sectional correlations (sparse alternatives) among the error terms, or weak dependence among the regressors. We reqire the second moment of jt be bonded away from zero niformly in j n and θ Θ, so that the cross-sectional correlations can be estimated stably. he following conditions are assmed in Baltagi et al. (202), which are needed for the asymptotic normality of J nder H 0. Assmption 5.2. (i) { t } t are i.i.d. (0, Σ ), E( t {f t } t, θ) = 0 almost srely. (ii) With probability approaching one, all the eigenvales of t= x jt x jt are bonded away from both zero and infinity niformly in j n. Proposition 5.. Under Assmptions 5. and 5.2, for δ, = log(log ) log, and = n(n )/2, as,, inf P (max ρ ij ρ ij / v /2 ij < δ, /2 θ) θ Θ ij inf P (4/9 < v ij/v ij < 6/9, i j θ). θ Θ Define Θ(J ) = {θ Θ : i<j ρ 2 ij Cn 2 log n/ }. For J defined in (5.2), let J = J 0 + J. (5.4) he main reslt is presented as follows. heorem 5.. Sppose Assmptions 3.2, 5., 5.2 hold. As,, (i) nder the nll hypothesis H 0 : θ = 0, P (J 0 = 0 H 0 ) 0, J d (0, ), and hence J = J 0 + J d (0, ); 27

(ii) there is C > 0 in the definition of Θ(J ) so that for any q (0, ), inf P (J 0 > θ), θ Θ s inf P (J > z q θ), θ Θ(J ) and hence inf P (J > z q θ). θ Θ s Θ(J ) herefore the power is enhanced from Θ(J ) to Θ s Θ(J ) niformly over sparse alternatives. In particlar, the reqired signal strength of Θ s in (5.3) is mild: the maximm cross-sectional correlation is only reqired to exceed a magnitde of log(log ) (log )/. 6 Monte Carlo Experiments In this section, Monte Carlo simlations are employed to examine the finite sample performance of the power enhancement tests. We respectively stdy the factor pricing model and the cross-sectional independence test. 6. esting factor pricing models o mimic the real data application, we consider the Fama and French (992) three-factor model: y it = θ i + b if t + it. We simlate {b i } i=, {f t } t= and { t } t= independently from 3 (µ B, Σ B ), 3 (µ f, Σ f ), and (0, Σ ) respectively. he parameters are set to be the same as those in the simlations of Fan et al. (203), which are calibrated sing daily retrns of S&P 500 s top 00 constitents, for the period from Jly st, 2008 to Jne 29 th 202. hese parameters are listed in the following table. able : Means and covariances sed to generate b i and f t µ B Σ B µ f Σ f 0.9833 0.092-0.078 0.0436 0.0260 3.235 0.783 0.7783-0.233-0.078 0.0862-0.02 0.02 0.783 0.5069 0.002 0.0839 0.0436-0.02 0.7624-0.0043 0.7783 0.002 0.6586 Set Σ = diag{a,..., A /4 } to be a block-diagonal covariance matrix. Each diagonal block A j is a 4 4 positive definite matrix, whose correlation matrix has eqi-off-diagonal 28

entry ρ j, generated from Uniform[0, 0.5]. he diagonal entries of A j are obtained via (Σ ) ii = + v i 2, where v i is generated independently from 3 (0, 0.0I 3 ). We evalate the power of the test nder two specific alternatives (we set > ): 0.3, i sparse alternative Ha : θ i = 0, i > log weak theta Ha 2 : θ i =, i 0.4. 0, i > 0.4 Under H a, there are only a few nonzero θ s with a relative large magnitde. Under H 2 a, there are many non-vanishing θ s, bt their magnitdes are all relatively small. In or simlation setp, log / varies from 0.05 to 0.0. We therefore expect that nder Ha, P (Ŝ = ) is close to zero becase most of the first / estimated θ s shold srvive from the screening step. hese srvived ˆθ s contribte importantly to the rejection of the nll hypothesis. In contrast, P (Ŝ = ) shold be mch larger nder H2 a becase the non-vanishing θ s are too weak to be detected. For each test, we calclate the relative freqency of rejection nder H 0, H a and H 2 a based on 2000 replications, with significance level q = 0.05. We also calclate the relative freqency of Ŝ being empty, which approximates P (Ŝ = ). We se the soft-thresholding to estimate the error covariance matrix. able 2 presents the empirical size and power of the feasible standardized Wald test J wald as well as those of the power enhanced test J = J 0 + J wald. First of all, the size of J wald is close to the significance level. Under H 0, P (Ŝ = ) is close to one, implying that the power enhancement component J 0 screens off most of the estimation errors. he power enhanced test (PE) has approximately the same size as the original test J W ald. Under H a, the PE test significantly improves the power of the standardized Wald-test. In this case, P (Ŝ = ) is nearly zero becase the screening set manages to captre the big thetas. Under H 2 a, as the non-vanishing thetas are very week, it follows that Ŝ has a large probability of being empty. Bt, whenever Ŝ is non-empty, it contribtes to the power of the test. he PE test still slightly improves the power of the qadratic test. 29

able 2: Size and power (%) of tests for simlated Fama-French three-factor model H 0 Ha Ha 2 J wald PE P (Ŝ = ) J wald PE P (Ŝ = ) J wald PE P (Ŝ = ) 300 500 5.2 5.4 99.8 48.0 97.6 2.6 69.0 76.4 64.6 800 4.9 5. 99.8 60.0 99.0.2 69.2 76.2 62.2 000 4.6 4.7 99.8 54.6 98.4 2.6 75.8 82.6 63.2 200 5.0 5.4 99.6 64.2 99.2 0.8 74.2 8.0 63.6 500 500 5.2 5.3 99.8 33.8 99.2 0.8 73.4 77.2 77.8 800 4.8 5.0 99.8 67.4 00.0 0.0 72.4 76.4 75.0 000 5.0 5.2 99.8 65.0 00.0 0.2 76.8 80.4 74.0 200 5.2 5.2 00.0 58.0 00.0 0.2 74.2 78.4 77.0 ote: his table reports the freqencies of rejection and Ŝ = based on 2000 replications. Here J wald is the standardized Wald test, and PE the power enhanced test. hese tests are condcted at 5% significance level. 6.2 esting cross-sectional independence We se the following data generating process in or experiments, y it = α + βx it + µ i + it, i n, t, (6.) x it = ξx i,t + µ i + ε it. (6.2) ote that we model {x i } s as AR() processes, so that x it is possibly correlated with µ i, bt not with it, as was the case in Im et al. (999). For each i, initialize x it = 0.5 at t =. We specify the parameters as follows: µ i is drawn from (0, 0.25) for i =,..., n. he parameters α and β are set and 2 respectively. In regression (6.2), ξ = 0.7 and ε it (0, ). We generate { t } t= from n (0, Σ ). Under the nll hypothesis, Σ is set to be a diagonal matrix Σ,0 = diag{σ, 2..., σn}. 2 Following Baltagi et al. (202), consider the heteroscedastic errors σi 2 = σ 2 ( + κ x i ) 2 (6.3) with κ = 0.5, where x i is the average of x it across t. Here σ 2 is scaled to fix the average of σi 2 s at one. For alternative specifications, we se a spatial model for the errors it. Baltagi et al. (202) considered a tri-diagonal error covariance matrix in this case. We extend it by allowing 30

for higher order spatial atocorrelations, bt reqire that not all the errors be spatially correlated with their immediate neighbors. Specifically, we start with Σ, = diag{σ,..., Σ n/4 } as a block-diagonal matrix with 4 4 blocks located along the main diagonal. Each Σ i is assmed to be I 4 initially. We then randomly choose n 0.3 blocks among them and make them non-diagonal by setting Σ i (m, n) = ρ m n (m, n 4), with ρ = 0.2. o allow for error cross-sectional heteroscedasticity, we set Σ = Σ /2,0 Σ, Σ /2,0, where Σ,0 = diag{σ, 2..., σn} 2 as specified in (6.3). he Monte Carlo experiments are condcted for different pairs of (n, ) with significance level q = 0.05 based on 2000 replications. he empirical size, power and the freqency of Ŝ = as in (5.) are recorded. able 3: Size and power (%) of tests for cross-sectional independence H 0 n = 200 n = 400 n = 600 n = 800 J /PE /P (Ŝ = ) J /PE /P (Ŝ = ) J /PE /P (Ŝ = ) J /PE /P (Ŝ = ) 00 4.7/5.5 /99. 4.9/5.3 /99.6 5.5/5.7 /99.7 4.9/5.2 /99.7 200 5.3/5.3 /00.0 5.5/5.9 /99.6 4.7/5. /99.4 4.9/5. /99.8 300 5.2/5.2 /00.0 5.2/5.2 /00.0 4.6/4.6 /00.0 4.9/4.9 /00.0 500 4.7/4.7 /00.0 5.5/5.5 /00.0 5.0/5.0 /00.0 5./5. /00.0 H a n = 200 n = 400 n = 600 n = 800 00 26.4/95.5 /5.0 9.8/98.0 /2.3 3.5/98.2 /2.0 2.2/99.2 /0.9 200 54.6/98.8 /.6 40.3/99.6 /0.5 24.8/99.6 /0.4 2/99.7 /0.3 300 78.9/99.25 /. 65.3/00.0 /0. 4.7/99.9 /0.2 37.2/00.0 /0. 500 93.5/99.85 /0.2 89.0/00.0 /0.0 69./00.0 /0.0 6.8/00.0 /0.0 ote: his table reports the freqencies of rejection by J in (5.2) and PE in (5.4) nder the nll and alternative hypotheses, based on 2000 replications. he freqency of Ŝ being empty is also recorded. hese tests are condcted at 5% significance level. able 3 gives the size and power of the bias-corrected qadratic test J in (5.2) and those of the power enhanced test J in (5.4). he sizes of both tests are close to 5%. In particlar, the power enhancement test has little distortion of the original size. he bottom panel shows the power of the two tests nder the alternative specification. he PE test demonstrates almost fll power nder all combinations of (n, ). In contrast, the qadratic test J as in (5.2) only gains power when gets large. As n increases, the proportion of nonzero off-diagonal elements in Σ gradally decreases. It becomes harder for J to effectively detect those deviations from the nll hypothesis. his explains the low power exhibited by the qadratic test when facing a high sparsity level. 3

7 Empirical Stdy As an empirical application, we consider a test of Carhart (997) s for-factor model on the S&P 500 index. Or empirical findings show that there are only a few significant nonzero alpha components, corresponding to a small portion of mis-priced stocks instead of systematic mis-pricing of the whole market. We collect monthly excess retrns on all the S&P 500 constitents from the CRSP database for the period Janary 980 to December 202. We test whether θ = 0 (all alpha s are zero) in the factor-pricing model on a rolling window basis: for each month, we evalate or test statistics J wald and J (as in (4.4) and (4.5) respectively) sing the preceding 60 months retrns ( = 60). he panel at each testing month consists of stocks withot missing observations in the past five years, which yields a balanced panel with the cross-sectional dimension larger than the time-series dimension ( > ). In this manner we not only captre the p-to-date information in the market, bt also mitigate the impact of time-varying factor loadings and sampling biases. In particlar, for testing months τ = 984.2,..., 202.2, we rn the regressions rit τ rft τ = θi τ +βi,mk(mk τ τ t rft)+β τ i,smbsmb τ τ t +βi,hmlhml τ τ t +βi,mommom τ τ t + τ it, (7.) for i =,..., τ and t = τ 59,..., τ, where r it represents the retrn for stock i at month t, r ft the risk free rate, and MK, SMB, HML and MOM constitte market, size, vale and momentm factors. he time series of factors are downloaded from Kenneth French s website. o make the notation consistent, we se θi τ to represent the alpha of stock i. able 4: Smmary of descriptive statistics and testing reslts Variables Mean Std dev. Median Min Max τ 67.70 26.3 62 574 665 Ŝ 0 5.20 3.50 5 0 20 θ τ i (%) 0.9767 0.59 0.9308 0.7835.386 τ θ i Ŝ(%) 4.5569.4305 4.549.7839 0.8393 p-vale of J wald 0.235 0.2907 0.0853 0 0.9992 p-vale of J (PE) 0.48 0.264 0.0050 0 0.9982 able 4 smmarizes descriptive statistics for different components and estimates in the model. On average, 68 stocks (which is more than 500 becase we are recording stocks that have ever become the constitents of the index) enter the panel of the regression dring 32

each five-year estimation window. Of those, merely 5.2 stocks are selected by the screening set Ŝ, which directly implies the presence of sparse alternatives. he threshold δ, = (log ) log(log ) varies as the panel size changes at the end of each month, and is abot 3.5 on average, a high-criticism thresholding. he selected stocks have mch larger alphas (θ) than other stocks do. In addition, 64.05% of all the estimated alphas are positive, whereas 87.33% of the selected alphas in Ŝ are positive. his indicates that the power enhancement component in or test is primarily contribted by stocks with extra retrns. We also notice that the p-vales of the Wald test J wald are generally smaller than those of the power enhanced test J. Figre : Dynamics of p-vales and percents of selected stocks Similar to Pesaran and Yamagata (202), we plot the rnning p-vales of J wald and the PE test from December 984 to December 202. We also add the dynamics of the percentage of selected stocks ( Ŝ 0/) to the plot, as shown in Figre. here is a strong negative correlation between the stock selection percentage and the p-vales of these tests. In other words, the months at which the nll hypothesis is rejected typically correspond to a few stocks with alphas exceeding the threshold. Sch evidence of sparse alternatives has originally motivated or stdy. We also observe that the p-vales of the PE test lie beneath 33