DETERMINING THE NUMBER OFFACTORS IN APPROXIMATE FACTOR MODELS. By Jushan Bai and Serena Ng 1

Similar documents
Robust estimation based on the first- and third-moment restrictions of the power transformation model

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

GMM - Generalized Method of Moments

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Generalized Least Squares

Regression with Time Series Data

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Distribution of Estimates

Solutions to Odd Number Exercises in Chapter 6

How to Deal with Structural Breaks in Practical Cointegration Analysis

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Methodology. -ratios are biased and that the appropriate critical values have to be increased by an amount. that depends on the sample size.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

Chapter 5. Heterocedastic Models. Introduction to time series (2008) 1

Time series Decomposition method

Testing for a Single Factor Model in the Multivariate State Space Framework

Distribution of Least Squares

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates)

Wednesday, November 7 Handout: Heteroskedasticity

A note on spurious regressions between stationary series

STATE-SPACE MODELLING. A mass balance across the tank gives:

DEPARTMENT OF STATISTICS

Licenciatura de ADE y Licenciatura conjunta Derecho y ADE. Hoja de ejercicios 2 PARTE A

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

The Simple Linear Regression Model: Reporting the Results and Choosing the Functional Form

ESTIMATION OF DYNAMIC PANEL DATA MODELS WHEN REGRESSION COEFFICIENTS AND INDIVIDUAL EFFECTS ARE TIME-VARYING

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

Vehicle Arrival Models : Headway

3.1 More on model selection

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Let us start with a two dimensional case. We consider a vector ( x,

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Dynamic models for largedimensional. Yields on U.S. Treasury securities (3 months to 10 years) y t

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Properties of Autocorrelated Processes Economics 30331

OBJECTIVES OF TIME SERIES ANALYSIS

References are appeared in the last slide. Last update: (1393/08/19)

Lecture Notes 2. The Hilbert Space Approach to Time Series

Unit Root Time Series. Univariate random walk

Chapter 2. First Order Scalar Equations

Why is Chinese Provincial Output Diverging? Joakim Westerlund, University of Gothenburg David Edgerton, Lund University Sonja Opper, Lund University

20. Applications of the Genetic-Drift Model

Notes on Kalman Filtering

Comparing Means: t-tests for One Sample & Two Related Samples

Lecture 10 Estimating Nonlinear Regression Models

Lecture Notes 5: Investment

Online Appendix to Solution Methods for Models with Rare Disasters

Dynamic Models, Autocorrelation and Forecasting

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

14 Autoregressive Moving Average Models

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Testing the Random Walk Model. i.i.d. ( ) r

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Cointegration and Implications for Forecasting

Økonomisk Kandidateksamen 2005(II) Econometrics 2. Solution

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Lecture 33: November 29

Maximum Likelihood Estimation of Time-Varying Loadings in High-Dimensional Factor Models. Jakob Guldbæk Mikkelsen, Eric Hillebrand and Giovanni Urga

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Econ Autocorrelation. Sanjaya DeSilva

BU Macro BU Macro Fall 2008, Lecture 4

Affine term structure models

Chapter 11. Heteroskedasticity The Nature of Heteroskedasticity. In Chapter 3 we introduced the linear model (11.1.1)

Ready for euro? Empirical study of the actual monetary policy independence in Poland VECM modelling

4.1 Other Interpretations of Ridge Regression

10. State Space Methods

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Department of Economics East Carolina University Greenville, NC Phone: Fax:

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Mathematical Theory and Modeling ISSN (Paper) ISSN (Online) Vol 3, No.3, 2013

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matlab and Python programming: how to get started

Forecasting optimally

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be

Final Spring 2007

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Ensamble methods: Bagging and Boosting

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

The Overlapping Data Problem

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand

Two Coupled Oscillators / Normal Modes

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Modeling and Forecasting Volatility Autoregressive Conditional Heteroskedasticity Models. Economic Forecasting Anthony Tay Slide 1

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

Robust critical values for unit root tests for series with conditional heteroscedasticity errors: An application of the simple NoVaS transformation

Transcription:

Economerica, Vol. 70, o. 1 January, 2002, 191 221 DEERMIIG HE UMBER OFFACORS I APPROXIMAE FACOR MODELS By Jushan Bai and Serena g 1 In his paper we develop some economeric heory for facor models of large dimensions. he focus is he deerminaion of he number of facors r, which is an unresolved issue in he rapidly growing lieraure on mulifacor models. We firs esablish he convergence rae for he facor esimaes ha will allow for consisen esimaion of r. We hen propose some panel crieria and show ha he number of facors can be consisenly esimaed using he crieria. he heory is developed under he framework of large crosssecions and large ime dimensions. o resricion is imposed on he relaion beween and. Simulaions show ha he proposed crieria have good finie sample properies in many configuraions of he panel daa encounered in pracice. Keywords: Facor analysis, asse pricing, principal componens, model selecion. 1 inroducion he idea ha variaions in a large number of economic variables can be modeled by a small number of reference variables is appealing and is used in many economic analyses. For example, asse reurns are ofen modeled as a funcion of a small number of facors. Sock and Wason 1989 used one reference variable o model he comovemens of four main macroeconomic aggregaes. Cross-counry variaions are also found o have common componens; see Gregory and Head 1999 and Forni, Hallin, Lippi, and Reichlin 2000b. More recenly, Sock and Wason 1999 showed ha he forecas error of a large number of macroeconomic variables can be reduced by including diffusion indexes, or facors, in srucural as well as nonsrucural forecasing models. In demand analysis, Engel curves can be expressed in erms of a finie number of facors. Lewbel 1991 showed ha if a demand sysem has one common facor, budge shares should be independen of he level of income. In such a case, he number of facors is an objec of economic ineres since if more han one facor is found, homoheic preferences can be rejeced. Facor analysis also provides a convenien way o sudy he aggregae implicaions of microeconomic behavior, as shown in Forni and Lippi 1997. Cenral o boh he heoreical and he empirical validiy of facor models is he correc specificaion of he number of facors. o dae, his crucial parameer 1 We hank hree anonymous referees for heir very consrucive commens, which led o a much improved presenaion. he firs auhor acknowledges financial suppor from he aional Science Foundaion under Gran SBR-9709508. We would like o hank paricipans in he economerics seminars a Harvard-MI, Cornell Universiy, he Universiy of Rocheser, and he Universiy of Pennsylvania for help suggesions and commens. Remaining errors are our own. 191

192 jushan bai and serena ng is ofen assumed raher han deermined by he daa. 2 his paper develops a formal saisical procedure ha can consisenly esimae he number of facors from observed daa. We demonsrae ha he penaly for overfiing mus be a funcion of boh and he cross-secion dimension and he ime dimension, respecively in order o consisenly esimae he number of facors. Consequenly he usual AIC and BIC, which are funcions of or alone, do no work when boh dimensions of he panel are large. Our heory is developed under he assumpion ha boh and converge o infiniy. his flexibiliy is of empirical relevance because he ime dimension of daases relevan o facor analysis, alhough small relaive o he cross-secion dimension, is oo large o jusify he assumpion of a fixed. A small number of papers in he lieraure have also considered he problem of deermining he number of facors, bu he presen analysis differs from hese works in imporan ways. Lewbel 1991 and Donald 1997 used he rank of a marix o es for he number of facors, bu hese heories assume eiher or is fixed. Cragg and Donald 1997 considered he use of informaion crieria when he facors are funcions of a se of observable explanaory variables, bu he daa sill have a fixed dimension. For large dimensional panels, Connor and Korajczyk 1993 developed a es for he number of facors in asse reurns, bu heir es is derived under sequenial limi asympoics, i.e., converges o infiniy wih a fixed and hen converges o infiniy. Furhermore, because heir es is based on a comparison of variances over differen ime periods, covariance saionariy and homoskedasiciy are no only echnical assumpions, bu are crucial for he validiy of heir es. Under he assumpion ha for fixed, Forni and Reichlin 1998 suggesed a graphical approach o idenify he number of facors, bu no heory is available. Assuming wih /, Sock and Wason 1998 showed ha a modificaion o he BIC can be used o selec he number of facors opimal for forecasing a single series. heir crierion is resricive no only because i requires, bu also because here can be facors ha are pervasive for a se of daa and ye have no predicive abiliy for an individual daa series. hus, heir rule may no be appropriae ouside of he forecasing framework. Forni, Hallin, Lippi, and Reichlin 2000a suggesed a mulivariae varian of he AIC bu neiher he heoreical nor he empirical properies of he crierion are known. We se up he deerminaion of facors as a model selecion problem. In consequence, he proposed crieria depend on he usual rade-off beween good fi and parsimony. However, he problem is nonsandard no only because accoun needs o be aken of he sample size in boh he cross-secion and he imeseries dimensions, bu also because he facors are no observed. he heory we developed does no rely on sequenial limis, nor does i impose any resricions beween and. he resuls hold under heeroskedasiciy in boh he ime and 2 Lehmann and Modes 1988, for example, esed he AP for 5, 10, and 15 facors. Sock and Wason 1989 assumed here is one facor underlying he coinciden index. Ghysels and g 1998 esed he affine erm srucure model assuming wo facors.

approximae facor models 193 he cross-secion dimensions. he resuls also hold under weak serial and crosssecion dependence. Simulaions show ha he crieria have good finie sample properies. he res of he paper is organized as follows. Secion 2 ses up he preliminaries and inroduces noaion and assumpions. Esimaion of he facors is considered in Secion 3 and he esimaion of he number of facors is sudied in Secion 4. Specific crieria are considered in Secion 5 and heir finie sample properies are considered in Secion 6, along wih an empirical applicaion o asse reurns. Concluding remarks are provided in Secion 7. All he proofs are given in he Appendix. 2 facor models Le X i be he observed daa for he ih cross-secion uni a ime, for i = 1, and = 1. Consider he following model: 1 X i = i F + e i where F is a vecor of common facors, i is a vecor of facor loadings associaed wih F, and e i is he idiosyncraic componen of X i. he produc i F is called he common componen of X i. Equaion 1 is hen he facor represenaion of he daa. oe ha he facors, heir loadings, as well as he idiosyncraic errors are no observable. Facor analysis allows for dimension reducion and is a useful saisical ool. Many economic analyses fi naurally ino he framework given by 1. 1. Arbirage pricing heory. In he finance lieraure, he arbirage pricing heory AP of Ross 1976 assumes ha a small number of facors can be used o explain a large number of asse reurns. In his case, X i represens he reurn of asse i a ime F represens he vecor of facor reurns, and e i is he idiosyncraic componen of reurns. Alhough analyical convenience makes i appealing o assume one facor, here is growing evidence agains he adequacy of a single facor in explaining asse reurns. 3 he shifing ineres owards use of mulifacor models ineviably calls for a formal procedure o deermine he number of facors. he analysis o follow allows he number of facors o be deermined even when and are boh large. his is especially suied for financial applicaions when daa are widely available for a large number of asses over an increasingly long span. Once he number of facors is deermined, he facor reurns F can also be consisenly esimaed up o an inverible ransformaion. 2. he rank of a demand sysem. Le p be a price vecor for J goods and services, e h be oal spending on he J goods by household h. Consumer heory posulaes ha Marshallian demand for good j by consumer h is X jh = g j p e h. Le w jh = X jh /e h be he budge share for household h on he jh good. he 3 Cochrane 1999 sressed ha financial economiss now recognize ha here are muliple sources of risk, or facors, ha give rise o high reurns. Backus, Forsei, Mozumdar, and Wu 1997 made similar conclusions in he conex of he marke for foreign asses.

194 jushan bai and serena ng rank of a demand sysem holding prices fixed is he smalles ineger r such ha w j e = j1 G 1 e+ + jr G r e. Demand sysems are of he form 1 where he r facors, common across goods, are F h = G 1 e h G r e h. When he number of households, H, converges o infiniy wih a fixed JG 1 e G r e can be esimaed simulaneously, such as by nonparameric mehods developed in Donald 1997. his approach will no work when he number of goods, J, also converges o infiniy. However, he heory o be developed in his paper will sill provide a consisen esimaion of r and wihou he need for nonparameric esimaion of he G funcions. Once he rank of he demand sysem is deermined, he nonparameric funcions evaluaed a e h allow F h o be consisenly esimable up o a ransformaion. hen funcions G 1 e G r e may be recovered also up o a marix ransformaion from F h h = 1Hvia nonparameric esimaion. 3. Forecasing wih diffusion indices. Sock and Wason 1998, 1999 considered forecasing inflaion wih diffusion indices facors consruced from a large number of macroeconomic series. he underlying premise is ha hese series may be driven by a small number of unobservable facors. Consider he forecasing equaion for a scalar series y +1 = F + W + he variables W are observable. Alhough we do no observe F, we observe X i i = 1. Suppose X i bears relaion wih F as in 1. In he presen conex, we inerpre 1 as he reduced-form represenaion of X i in erms of he unobservable facors. We can firs esimae F from 1. Denoe i by F.We can hen regress y on F 1 and W 1 o obain he coefficiens ˆ and ˆ, from which a forecas ŷ +1 = ˆ F + ˆW can be formed. Sock and Wason 1998, 1999 showed ha his approach of forecasing ouperforms many compeing forecasing mehods. Bu as poined ou earlier, he dimension of F in Sock and Wason 1998, 1999 was deermined using a crierion ha minimizes he mean squared forecas errors of y. his may no be he same as he number of facors underlying X i, which is he focus of his paper. 21 oaion and Preliminaries Le F 00 i, and r denoe he rue common facors, he facor loadings, and he rue number of facors, respecively. oe ha F 0 is r dimensional. We assume ha r does no depend on or. A a given, we have 2 X = 0 F 0 + e 1 r r 1 1 where X = X 1 X 2 X 0 = 0 1 0 2 0, and e = e 1 e 2 e. Our objecive is o deermine he rue number of facors, r. In classical

approximae facor models 195 facor analysis e.g., Anderson 1984, is assumed fixed, he facors are independen of he errors e, and he covariance of e is diagonal. ormalizing he covariance marix of F o be an ideniy marix, we have = 0 0 +, where and are he covariance marices of X and e, respecively. Under hese assumpions, a roo- consisen and asympoically normal esimaor of, say, he sample covariance marix = 1/ X XX X can be obained. he essenials of classical facor analysis carry over o he case of large bu fixed since he problem can be urned ino a problem, as noed by Connor and Korajczyk 1993 and ohers. Inference on r under classical assumpions can, in heory, be based on he eigenvalues of since a characerisic of a panel of daa ha has an r facor represenaion is ha he firs r larges populaion eigenvalues of he covariance of X diverge as increases o infiniy, bu he r + 1h eigenvalue is bounded; see Chamberlain and Rohschild 1983. Bu i can be shown ha all nonzero sample eigenvalues no jus he firs r of he marix increase wih, and a es based on he sample eigenvalues is hus no feasible. A likelihood raio es can also, in heory, be used o selec he number of facors if, in addiion, normaliy of e is assumed. Bu as found by Dhrymes, Friend, and Gluekin 1984, he number of saisically significan facors deermined by he likelihood raio es increases wih even if he rue number of facors is fixed. Oher mehods have also been developed o esimae he number of facors assuming he size of one dimension is fixed. Bu Mone Carlo simulaions in Cragg and Donald 1997 show ha hese mehods end o perform poorly for moderaely large and. he fundamenal problem is ha he heory developed for classical facor models does no apply when boh and. his is because consisen esimaion of wheher i is an or a marix is no a well defined problem. For example, when >, he rank of is no more han, whereas he rank of can always be. ew heories are hus required o analyze large dimensional facor models. In his paper, we develop asympoic resuls for consisen esimaion of he number of facors when and. Our resuls complemen he sparse bu growing lieraure on large dimensional facor analysis. Forni and Lippi 2000 and Forni e al. 2000a obained general resuls for dynamic facor models, while Sock and Wason 1998 provided some asympoic resuls in he conex of forecasing. As in hese papers, we allow for cross-secion and serial dependence. In addiion, we also allow for heeroskedasiciy in e and some weak dependence beween he facors and he errors. hese laer generalizaions are new in our analysis. Evidenly, our assumpions are more general han hose used when he sample size is fixed in one dimension. Le X i be a 1 vecor of ime-series observaions for he ih cross-secion uni. For a given i, we have 3 X i = F 0 0 i + e i 1 r r 1 1

196 jushan bai and serena ng where X i = X i1 X i2 X i F 0 = F 0 1 F0 2 F0, and e i = e i1 e i2 e i. For he panel of daa X = X 1 X, we have 4 X = F 0 0 + e r r wih e = e 1 e. Le ra denoe he race of A. he norm of he marix A is hen A = ra A 1/2. he following assumpions are made: Assumpion A Facors: EF 0 4 < and 1 F 0 F 0 F as for some posiive definie marix F. Assumpion B Facor Loadings: i <, and 0 0 / D 0 as for some r r posiive definie marix D. Assumpion C ime and Cross-Secion Dependence and Heeroskedasiciy: here exiss a posiive consan M<, such ha for all and, 1. Ee i = 0Ee i 8 M; 2. Ee s e / = E 1 e ise i = s s s M for all s, and 1 s M; 3. Ee i e j = ij wih ij ij for some ij and for all ; in addiion, 1 j=1 ij M; 4. Ee i e js = ij s and 1 j=1 ij s M; 5. for every s E 1/2 e ise i Ee is e i 4 M. Assumpion D Weak Dependence beween Facors and Idiosyncraic Errors: 1 E 1 F 0 e i 2 M Assumpion A is sandard for facor models. Assumpion B ensures ha each facor has a nonrivial conribuion o he variance of X. We only consider nonrandom facor loadings for simpliciy. Our resuls sill hold when he i are random, provided hey are independen of he facors and idiosyncraic errors, and E i 4 M. Assumpion C allows for limied ime-series and cross-secion dependence in he idiosyncraic componen. Heeroskedasiciy in boh he ime and cross-secion dimensions is also allowed. Under saionariy in he ime dimension, s = s, hough he condiion is no necessary. Given Assumpion C1, he remaining assumpions in C are easily saisfied if he e i are independen for all i and. he allowance for some correlaion in he idiosyncraic componens ses up he model o have an approximae facor srucure. I is more general han a sric facor model, which assumes e i is uncorrelaed across i, he framework in which he AP heory of Ross 1976 is based. hus, he resuls o be developed will also apply o sric facor models. When he facors

approximae facor models 197 and idiosyncraic errors are independen a sandard assumpion for convenional facor models, Assumpion D is implied by Assumpions A and C. Independence is no required for D o be rue. For example, suppose ha e i = i F wih i being independen of F and i saisfies Assumpion C; hen Assumpion D holds. Finally, he developmens proceed assuming ha he panel is balanced. We also noe ha he model being analyzed is saic, in he sense ha X i has a conemporaneous relaionship wih he facors. he analysis of dynamic models is beyond he scope of his paper. For a facor model o be an approximae facor model in he sense of Chamberlain and Rohschild 1983, he larges eigenvalue and hence all of he eigenvalues of he covariance marix = Ee e mus be bounded. oe ha Chamberlain and Rohschild focused on he cross-secion behavior of he model and did no make explici assumpions abou he ime-series behavior of he model. Our framework allows for serial correlaion and heeroskedasiciy and is more general han heir seup. Bu if we assume e is saionary wih Ee i e j = ij, hen from marix heory, he larges eigenvalue of is bounded by max i j=1 ij. hus if we assume j=1 ij M for all i and all, which implies Assumpion C3, hen 2 will be an approximae facor model in he sense of Chamberlain and Rohschild. 3 esimaion of he common facors When is small, facor models are ofen expressed in sae space form, normaliy is assumed, and he parameers are esimaed by maximum likelihood. For example, Sock and Wason 1989 used = 4 variables o esimae one facor, he coinciden index. he drawback is ha because he number of parameers increases wih, 4 compuaional difficulies make i necessary o abandon informaion on many series even hough hey are available. We esimae common facors in large panels by he mehod of asympoic principal componens. 5 he number of facors ha can be esimaed by his nonparameric mehod is min, much larger han permied by esimaion of sae space models. Bu o deermine which of hese facors are saisically imporan, i is necessary o firs esablish consisency of all he esimaed common facors when boh and are large. We sar wih an arbirary number kk <min. he superscrip in k i and F k signifies he allowance of k facors in he esimaion. Esimaes of k and F k are obained by solving he opimizaion problem Vk= min 1 Xi k F k i F k 2 4 Gregory, Head, and Raynauld 1997 esimaed a world facor and seven counry specific facors from oupu, consumpion, and invesmen for each of he G7 counries. he exercise involved esimaion of 92 parameers and perhaps sreched he sae-space model o is limi. 5 he mehod of asympoic principal componens was sudied by Connor and Korajzcyk 1986 and Connor and Korajzcyk 1988 for fixed. Forni e al. 2000a and Sock and Wason 1998 considered he mehod for large.

198 jushan bai and serena ng subjec o he normalizaion of eiher k k / = I k or F k F k / = I k. If we concenrae ou k and use he normalizaion ha F k F k / = I k, he opimizaion problem is idenical o maximizing rf k XX F k. he esimaed facor marix, denoed by F k,is imes he eigenvecors corresponding o he k larges eigenvalues of he marix XX. Given F k k = F k F k 1 k F X = F k X/ is he corresponding marix of facor loadings. he soluion o he above minimizaion problem is no unique, even hough he sum of squared residuals Vk is unique. Anoher soluion is given by F k k, where k is consruced as imes he eigenvecors corresponding o he k larges eigenvalues of he marix X X. he normalizaion ha k k / = I k implies F k = X k /. he second se of calculaions is compuaionally less cosly when >, while he firs is less inensive when <. 6 Define F k = F k F k F k / 1/2 a rescaled esimaor of he facors. he following heorem summarizes he asympoic properies of he esimaed facors. heorem 1: For any fixed k 1, here exiss a r k marix H k rankh k = mink r, and C = min, such ha 1 C 2 k 5 F 2 = O p 1 wih Because he rue facors F 0 can only be idenified up o scale, wha is being considered is a roaion of F 0. he heorem esablishes ha he ime average of he squared deviaions beween he esimaed facors and hose ha lie in he rue facor space vanish as. he rae of convergence is deermined by he smaller of or, and hus depends on he panel srucure. Under he addiional assumpion ha s 2 M for all and, he resul 7 6 C 2 F 2 = O p 1 for each can be obained. eiher heorem 1 nor 6 implies uniform convergence in. Uniform convergence is considered by Sock and Wason 1998. hese auhors obained a much slower convergence rae han C 2, and heir resul requires. An imporan insigh of his paper is ha, o consisenly esimae he number of facors, neiher 6 nor uniform convergence is required. I is he average convergence rae of heorem 1 ha is essenial. However, 6 could be useful for saisical analysis on he esimaed facors and is hus a resul of independen ineres. 6 A more deailed accoun of compuaion issues, including how o deal wih unbalanced panels, is given in Sock and Wason 1998. 7 he proof is acually simpler han ha of heorem 1 and is hus omied o avoid repeiion.

approximae facor models 199 4 esimaing he number of facors Suppose for he momen ha we observe all poenially informaive facors bu no he facor loadings. hen he problem is simply o choose k facors ha bes capure he variaions in X and esimae he corresponding facor loadings. Since he model is linear and he facors are observed, i can be esimaed by applying ordinary leas squares o each equaion. his is hen a classical model selecion problem. A model wih k + 1 facors can fi no worse han a model wih k facors, bu efficiency is los as more facor loadings are being esimaed. Le F k be a marix of k facors, and VkF k 1 = min Xi k i F k 2 be he sum of squared residuals divided by from ime-series regressions of X i on he k facors for all i. hen a loss funcion VkF k + kg, where g is he penaly for overfiing, can be used o deermine k. Because he esimaion of i is classical, i can be shown ha he BIC wih g = ln / can consisenly esimae r. On he oher hand, he AIC wih g = 2/ may choose k>r even in large samples. he resul is he same as in Geweke and Meese 1981 derived for = 1 because when he facors are observed, he penaly facor does no need o ake ino accoun he sample size in he cross-secion dimension. Our main resul is o show ha his will no longer be rue when he facors have o be esimaed, and even he BIC will no always consisenly esimae r. Wihou loss of generaliy, we le 7 Vk F k 1 = min Xi k i F k 2 denoe he sum of squared residuals divided by when k facors are esimaed. his sum of squared residuals does no depend on which esimae of F is used because hey span he same vecor space. ha is, Vk F k = Vk F k = Vk F k. We wan o find penaly funcions, g, such ha crieria of he form PCk = Vk F k + kg can consisenly esimae r. Lekmax be a bounded ineger such ha r kmax. heorem 2: Suppose ha Assumpions A D hold and ha he k facors are esimaed by principal componens. Le ˆk = arg min0 k kmax PCk. hen lim Probˆk = r = 1 if i g 0 and ii C 2 g as,, where C = min. Condiions i and ii are necessary in he sense ha if one of he condiions is violaed, hen here will exis a facor model saisfying Assumpions A D, and

200 jushan bai and serena ng ye he number of facors canno be consisenly esimaed. However, condiions i and ii are no always required o obain a consisen esimae of r. A formal proof of heorem 2 is provided in he Appendix. he crucial elemen in consisen esimaion of r is a penaly facor ha vanishes a an appropriae rae such ha under and overparameerized models will no be chosen. An implicaion of heorem 2 is he following: Corollary 1: Under he Assumpions of heorem 2, he class of crieria defined by ICk = lnv k F k + kg will also consisenly esimae r. oe ha Vk F k is simply he average residual variance when k facors are assumed for each cross-secion uni. he IC crieria hus resemble informaion crieria frequenly used in ime-series analysis, wih he imporan difference ha he penaly here depends on boh and. hus far, i has been assumed ha he common facors are esimaed by he mehod of principle componens. Forni and Reichlin 1998 and Forni e al. 2000a sudied alernaive esimaion mehods. However he proof of heorem 2 mainly uses he fac ha F saisfies heorem 1, and does no rely on principal componens per se. We have he following corollary: Corollary 2: Le Ĝk be an arbirary esimaor of F 0. Suppose here exiss a marix H k such ha rank H k = mink r, and for some C 2 C2, 8 C 2 1 Ĝ k H k F 0 2 = O p 1 hen heorem 2 sill holds wih F k replaced by Ĝk and C replaced by C. he sequence of consans C 2 does no need o equal C2 = min. heorem 2 holds for any esimaion mehod ha yields esimaors Ĝ saisfying 8. 8 aurally, he penaly would hen depend on C 2, he convergence rae for Ĝ. 5 he PC p and he IC p In his secion, we assume ha he mehod of principal componens is used o esimae he facors and propose specific formulaions of g o be used in 8 We are graeful for a referee whose quesion led o he resuls repored here.

approximae facor models 201 pracice. Le ˆ 2 be a consisen esimae of 1 Ee i 2. Consider he following crieria: + PC p1 k = Vk F k + k ˆ 2 + PC p2 k = Vk F k + k ˆ 2 PC p3 k = Vk F k + k ˆ 2 ln C 2 C 2 ln + ln C 2 Since Vk F k = 1 ˆ i 2, where ˆ i 2 = ê iêi/, he crieria generalize he C p crierion of Mallows 1973 developed for selecion of models in sric imeseries or cross-secion conexs o a panel daa seing. For his reason, we refer o hese saisics as Panel C p PC p crieria. Like he C p crierion, ˆ 2 provides he proper scaling o he penaly erm. In applicaions, i can be replaced by Vkmax F kmax. he proposed penaly funcions are based on he sample size in he smaller of he wo dimensions. All hree crieria saisfy condiions i and ii of heorem 2 since C 2 + / 0as. However, in finie samples, C 2 + /. Hence, he hree crieria, alhough asympoically equivalen, will have differen properies in finie samples. 9 Corollary 1 leads o consideraion of he following hree crieria: 9 IC p1 k = lnv k F + k + k IC p2 k = lnv k F + k + k IC p3 k = lnv k F ln C 2 k + k C 2 ln + ln C 2 he main advanage of hese hree panel informaion crieria IC p is ha hey do no depend on he choice of kmax hrough ˆ 2, which could be desirable in pracice. he scaling by ˆ 2 is implicily performed by he logarihmic ransformaion of Vk F k and hus no required in he penaly erm. he proposed crieria differ from he convenional C p and informaion crieria used in ime-series analysis in ha g is a funcion of boh and.o undersand why he penaly mus be specified as a funcion of he sample size in 9 oe ha PC p1 and PC p2, and likewise, IC p1 and IC p2, apply specifically o he principal componens esimaor because C 2 = min is used in deriving hem. For alernaive esimaors saisfying Corollary 2, crieria PC p3 and IC p3 are sill applicable wih C replaced by C.

202 jushan bai and serena ng boh dimensions, consider he following: 2 AIC 1 k = Vk F k + k ˆ 2 ln BIC 1 k = Vk F k + k ˆ 2 AIC 2 k = Vk F k + k ˆ 2 2 BIC 2 k = Vk F k + k ˆ 2 ln AIC 3 k = Vk F k + k ˆ 2 2 + k BIC 3 k = Vk F k + k ˆ 2 + kln he penaly facors in AIC 1 and BIC 1 are sandard in ime-series applicaions. Alhough g 0as AIC 1 fails he second condiion of heorem 2 for all and. When and log /, he BIC 1 also fails condiion ii of heorem 2. hus we expec he AIC 1 will no work for all and, while he BIC 1 will no work for small relaive o. By analogy, AIC 2 also fails he condiions of heorem 2, while BIC 2 will work only if. he nex wo crieria, AIC 3 and BIC 3, ake ino accoun he panel naure of he problem. he wo specificaions of g reflec firs, ha he effecive number of observaions is, and second, ha he oal number of parameers being esimaed is k + k. I is easy o see ha AIC 3 fails he second condiion of heorem 2. While he BIC 3 saisfies his condiion, g does no always vanish. For example, if = exp, hen g 1 and he firs condiion of heorem 2 will no be saisfied. Similarly, g does no vanish when = exp. herefore BIC 3 may perform well for some bu no all configuraions of he daa. In conras, he proposed crieria saisfy boh condiions saed in heorem 2. 6 simulaions and an empirical applicaion We firs simulae daa from he following model: r X i = ij F j + e i j=1 = c i + e i where he facors are r marices of 01 variables, and he facor loadings are 01 variaes. Hence, he common componen of X i, denoed by c i, has variance r. Resuls wih ij uniformly disribued are similar and will no

approximae facor models 203 be repored. Our base case assumes ha he idiosyncraic componen has he same variance as he common componen i.e. = r. We consider hiry configuraions of he daa. he firs five simulae plausible asse pricing applicaions wih five years of monhly daa = 60 on 100 o 2000 asse reurns. We hen increase o 100. Configuraions wih = 60 = 100 and 200 are plausible sizes of daases for secors, saes, regions, and counries. Oher configuraions are considered o assess he general properies of he proposed crieria. All compuaions were performed using Malab Version 5.3. Repored in ables I o III are he averages of ˆk over 1000 replicaions, for r = 1 3, and 5 respecively, assuming ha e i is homoskedasic 01. For all cases, he maximum number of facors, kmax, is se o 8. 10 Prior o compuaion of he eigenvecors, each series is demeaned and sandardized o have uni variance. Of he hree PC p crieria ha saisfy heorem 2, PC p3 is less robus han PC p1 and PC p2 when or is small. he IC p crieria generally have properies very similar o he PC p crieria. he erm / + provides a small sample correcion o he asympoic convergence rae of C 2 and has he effec of adjusing he penaly upwards. he simulaions show his adjusmen o be desirable. When min is 40 or larger, he proposed ess give precise esimaes of he number of facors. Since our heory is based on large and,iis no surprising ha for very small or, he proposed crieria are inadequae. Resuls repored in he las five rows of each able indicae ha he IC p crieria end o underparameerize, while he PC p end o overparameerize, bu he problem is sill less severe han he AIC and he BIC, which we now consider. he AIC and BIC s ha are funcions of only or have he endency o choose oo many facors. he AIC 3 performs somewha beer han AIC 1 and AIC 2, bu sill ends o overparameerize. A firs glance, he BIC 3 appears o perform well. Alhough BIC 3 resembles PC p2, he former penalizes an exra facor more heavily since ln > ln C 2. As can be seen from ables II and III, he BIC 3 ends o underesimae r, and he problem becomes more severe as r increases. able IV relaxes he assumpion of homoskedasiciy. Insead, we le e i = ei 1 for odd, and e i = ei 1 +e2 i for even, where e1 i and e2 i are independen 01. hus, he variance in he even periods is wice as large as he odd periods. Wihou loss of generaliy, we only repor resuls for r = 5. PC p1 PC p2 IC p1, and IC p2 coninue o selec he rue number of facors very accuraely and dominae he remaining crieria considered. We hen vary he variance of he idiosyncraic errors relaive o he common componen. When <r, he variance of he common componen is relaively large. o surprisingly, he proposed crieria give precise esimaes of r. he resuls will no be repored wihou loss of generaliy. able V considers he case = 2r. Since he variance of he idiosyncraic componen is larger han he 10 In ime-series analysis, a rule such as 8 in[ /100 1/4 ] considered in Schwer 1989 is someimes used o se kmax, bu no such guide is available for panel analysis. Unil furher resuls are available, a rule ha replaces in Schwer s rule by min could be considered.

204 jushan bai and serena ng ABLE I DGP: X i = r j=1 ij F j + e i ; r = 1; = 1. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 102 100 297 100 100 100 800 297 800 800 757 100 100 60 100 100 241 100 100 100 800 241 800 800 711 100 200 60 100 100 100 100 100 100 800 100 800 800 551 100 500 60 100 100 100 100 100 100 521 100 800 800 157 100 1000 60 100 100 100 100 100 100 100 100 800 800 100 100 2000 60 100 100 100 100 100 100 100 100 800 800 100 100 100 100 100 100 324 100 100 100 800 324 800 324 668 100 200 100 100 100 100 100 100 100 800 100 800 800 543 100 500 100 100 100 100 100 100 100 800 100 800 800 155 100 1000 100 100 100 100 100 100 100 108 100 800 800 100 100 2000 100 100 100 100 100 100 100 100 100 800 800 100 100 40 100 101 100 269 100 100 100 800 800 800 269 733 100 60 100 100 100 225 100 100 100 800 800 800 225 699 100 60 200 100 100 100 100 100 100 800 800 800 100 514 100 60 500 100 100 100 100 100 100 800 800 467 100 132 100 60 1000 100 100 100 100 100 100 800 800 100 100 100 100 60 2000 100 100 100 100 100 100 800 800 100 100 100 100 4000 60 100 100 100 100 100 100 100 100 800 800 100 100 4000 100 100 100 100 100 100 100 100 100 800 800 100 100 8000 60 100 100 100 100 100 100 100 100 800 800 100 100 8000 100 100 100 100 100 100 100 100 100 800 800 100 100 60 4000 100 100 100 100 100 100 800 800 100 100 100 100 100 4000 100 100 100 100 100 100 800 800 100 100 100 100 60 8000 100 100 100 100 100 100 800 800 100 100 100 100 100 8000 100 100 100 100 100 100 800 800 100 100 100 100 10 50 800 800 800 800 800 800 800 800 800 800 800 718 10 100 800 800 800 800 800 800 800 800 800 800 800 588 20 100 473 394 629 100 100 100 800 800 800 629 800 100 100 10 800 800 800 800 800 800 800 800 800 800 800 800 100 20 562 481 716 100 100 100 800 716 800 800 800 100 oes: able I able VIII repor he esimaed number of facors ˆk averaged over 1000 simulaions. he rue number of facors is r and kmax = 8. When he average of ˆk is an ineger, he corresponding sandard error is zero. In he few cases when he averaged ˆk over replicaions is no an ineger, he sandard errors are no larger han.6. In view of he precision of he esimaes in he majoriy of cases, he sandard errors in he simulaions are no repored. he las five rows of each able are for models of small dimensions eiher or is small. common componen, one migh expec he common facors o be esimaed wih less precision. Indeed, IC p1 and IC p2 underesimae r when min < 60, bu he crieria sill selec values of k ha are very close o r for oher configuraions of he daa. he models considered hus far have idiosyncraic errors ha are uncorrelaed across unis and across ime. For hese sric facor models, he preferred crieria are PC p1 PC p2 IC 1, and IC 2. I should be emphasized ha he resuls repored are he averages of ˆk over 1000 simulaions. We do no repor he sandard deviaions of hese averages because hey are idenically zero excep for a few

approximae facor models 205 ABLE II DGP: X i = r j=1 ij F j + e i ; r = 3; = 3. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 300 300 390 300 300 300 800 390 800 800 782 290 100 60 300 300 354 300 300 300 800 354 800 800 753 298 200 60 300 300 300 300 300 300 800 300 800 800 614 300 500 60 300 300 300 300 300 300 595 300 800 800 313 300 1000 60 300 300 300 300 300 300 300 300 800 800 300 300 2000 60 300 300 300 300 300 300 300 300 800 800 300 300 100 100 300 300 423 300 300 300 800 423 800 423 720 300 200 100 300 300 300 300 300 300 800 300 800 800 621 300 500 100 300 300 300 300 300 300 800 300 800 800 315 300 1000 100 300 300 300 300 300 300 301 300 800 800 300 300 2000 100 300 300 300 300 300 300 300 300 800 800 300 300 40 100 300 300 370 300 300 300 800 800 800 370 763 292 60 100 300 300 342 300 300 300 800 800 800 342 739 299 60 200 300 300 300 300 300 300 800 800 800 300 583 300 60 500 300 300 300 300 300 300 800 800 544 300 303 300 60 1000 300 300 300 300 300 300 800 800 300 300 300 300 60 2000 300 300 300 300 300 300 800 800 300 300 300 300 4000 60 300 300 300 300 300 300 300 300 800 800 300 298 4000 100 300 300 300 300 300 300 300 300 800 800 300 300 8000 60 300 300 300 300 300 300 300 300 800 800 300 297 8000 100 300 300 300 300 300 300 300 300 800 800 300 300 60 4000 300 300 300 300 300 300 800 800 300 300 300 299 100 4000 300 300 300 300 300 300 800 800 300 300 300 300 60 8000 300 300 300 300 300 300 800 800 300 300 300 298 100 8000 300 300 300 300 300 300 800 800 300 300 300 300 10 50 800 800 800 800 800 800 800 800 800 800 800 721 10 100 800 800 800 800 800 800 800 800 800 800 800 601 20 100 522 457 662 295 292 298 800 800 800 662 800 268 100 10 800 800 800 800 800 800 800 800 800 800 800 800 100 20 600 529 739 295 291 299 800 739 800 800 800 272 cases for which he average iself is no an ineger. Even for hese laer cases, he sandard deviaions do no exceed 0.6. We nex modify he assumpion on he idiosyncraic errors o allow for serial and cross-secion correlaion. hese errors are generaed from he process e i = e i 1 + v i + J j 0j= J v i j he case of pure serial correlaion obains when he cross-secion correlaion parameer is zero. Since for each i, he uncondiional variance of e i is 1/1 2, he more persisen are he idiosyncraic errors, he larger are heir variances relaive o he common facors, and he precision of he esimaes can be expeced o fall. However, even wih = 5, able VI shows ha he esimaes provided by he proposed crieria are sill very good. he case of pure cross-

206 jushan bai and serena ng ABLE III DGP: X i = r j=1 ij F j + e i ; r = 5; = 5. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 499 498 517 488 468 499 800 517 800 800 794 305 100 60 500 500 507 499 494 500 800 507 800 800 787 350 200 60 500 500 500 500 500 500 800 500 800 800 691 380 500 60 500 500 500 500 500 500 688 500 800 800 501 388 1000 60 500 500 500 500 500 500 500 500 800 800 500 382 2000 60 500 500 500 500 500 500 500 500 800 800 500 359 100 100 500 500 542 500 500 501 800 542 800 542 775 416 200 100 500 500 500 500 500 500 800 500 800 800 706 480 500 100 500 500 500 500 500 500 800 500 800 800 502 497 1000 100 500 500 500 500 500 500 500 500 800 800 500 498 2000 100 500 500 500 500 500 500 500 500 800 800 500 498 40 100 500 499 509 486 469 500 800 800 800 509 786 296 60 100 500 500 505 499 494 500 800 800 800 505 781 346 60 200 500 500 500 500 500 500 800 800 800 500 671 383 60 500 500 500 500 500 500 500 800 800 644 500 500 391 60 1000 500 500 500 500 500 500 800 800 500 500 500 379 60 2000 500 500 500 500 500 500 800 800 500 500 500 358 4000 60 500 500 500 500 500 500 500 500 800 800 500 337 4000 100 500 500 500 500 500 500 500 500 800 800 500 496 8000 60 500 500 500 500 500 500 500 500 800 800 500 310 8000 100 500 500 500 500 500 500 500 500 800 800 500 493 60 4000 500 500 500 500 500 500 800 800 500 500 500 335 100 4000 500 500 500 500 500 500 800 800 500 500 500 496 60 8000 500 500 500 500 500 500 800 800 500 500 500 312 100 8000 500 500 500 500 500 500 800 800 500 500 500 493 10 50 800 800 800 800 800 800 800 800 800 800 800 728 10 100 800 800 800 800 800 800 800 800 800 800 800 630 20 100 588 541 699 417 379 468 800 800 800 699 800 279 100 10 800 800 800 800 800 800 800 800 800 800 800 800 100 20 649 594 762 424 387 481 800 762 800 800 800 293 secion dependence obains wih = 0. As in Chamberlain and Rohschild 1983, our heory permis some degree of cross-secion correlaion. Given he assumed process for e i, he amoun of cross correlaion depends on he number of unis ha are cross correlaed 2J, as well as he magniude of he pairwise correlaion. We se o.2 and J o max /20 10. Effecively, when 200, 10 percen of he unis are cross correlaed, and when >200 20/ of he sample is cross correlaed. As he resuls in able VII indicae, he proposed crieria sill give very good esimaes of r and coninue o do so for small variaions in and J. able VIII repors resuls ha allow for boh serial and cross-secion correlaion. he variance of he idiosyncraic errors is now 1 + 2J 2 /1 2 imes larger han he variance of he common componen. While his reduces he precision of he esimaes somewha, he resuls generally confirm ha a small degree of correlaion in he idiosyncraic errors will no affec he properies of

approximae facor models 207 ABLE IV DGP: X i = r j=1 ij F j + e i ; e i = e 1 + i e 2 i = 1 for Even, = 0 for Odd; r = 5; = 5. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 496 486 609 409 337 493 800 609 800 800 800 181 100 60 499 490 585 469 418 501 800 585 800 800 800 208 200 60 500 499 500 493 487 500 800 500 800 800 800 222 500 60 500 500 500 499 498 500 800 500 800 800 791 223 1000 60 500 500 500 500 500 500 797 500 800 800 647 202 2000 60 500 500 500 500 500 500 551 500 800 800 503 172 100 100 500 498 660 498 479 524 800 660 800 660 800 256 200 100 500 500 500 500 500 500 800 500 800 800 800 333 500 100 500 500 500 500 500 500 800 500 800 800 794 393 1000 100 500 500 500 500 500 500 800 500 800 800 613 398 2000 100 500 500 500 500 500 500 536 500 800 800 500 385 40 100 494 480 539 404 330 490 800 800 800 539 799 168 60 100 498 488 541 466 414 500 800 800 800 541 799 204 60 200 500 499 500 495 487 500 800 800 800 500 756 214 60 500 500 500 500 499 498 500 800 800 729 500 507 213 60 1000 500 500 500 500 500 500 800 800 500 500 500 190 60 2000 500 500 500 500 500 500 800 800 500 500 500 159 4000 60 500 500 500 500 500 500 500 500 800 800 500 146 4000 100 500 500 500 500 500 500 500 500 800 800 500 367 8000 60 500 500 500 500 500 500 500 500 800 800 500 116 8000 100 500 500 500 500 500 500 500 500 800 800 500 337 60 4000 500 500 500 500 500 500 800 800 500 500 500 130 100 4000 500 500 500 500 500 500 800 800 500 500 500 362 60 8000 500 500 500 500 500 500 800 800 500 500 500 108 100 8000 500 500 500 500 500 500 800 800 500 500 500 329 10 50 800 800 800 800 800 800 800 800 800 800 800 727 10 100 800 800 800 800 800 800 800 800 800 800 800 634 20 100 613 562 723 285 223 393 800 800 800 723 800 186 100 10 800 800 800 800 800 800 800 800 800 800 800 800 100 20 752 699 799 331 264 617 800 799 800 800 800 230 he esimaes. However, i will generally be rue ha for he proposed crieria o be as precise in approximae as in sric facor models, has o be fairly large relaive o J canno be oo large, and he errors canno be oo persisen as required by heory. I is also noeworhy ha he BIC 3 has very good properies in he presence of cross-secion correlaions see ables VII and VIII and he crierion can be useful in pracice even hough i does no saisfy all he condiions of heorem 2. 61 Applicaion o Asse Reurns Facor models for asse reurns are exensively sudied in he finance lieraure. An excellen summary on mulifacor asse pricing models can be found in Campbell, Lo, and Mackinlay 1997. wo basic approaches are employed. One

208 jushan bai and serena ng ABLE V DGP: X i = r j=1 ij F j + e i ; r = 5; = r 2. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 463 429 514 279 191 447 800 800 800 514 793 082 100 60 478 441 506 373 261 496 800 800 800 506 786 092 200 60 490 480 500 442 403 494 800 800 800 500 692 093 500 60 496 494 499 477 468 492 800 800 688 499 501 077 1000 60 497 497 498 488 486 493 800 800 500 498 500 056 2000 60 498 498 499 491 489 492 800 800 500 499 500 034 100 100 496 467 542 464 361 501 800 542 800 542 774 123 200 100 500 499 500 498 490 500 800 800 800 500 705 180 500 100 500 500 500 500 500 500 800 800 800 500 502 219 1000 100 500 500 500 500 500 500 800 800 500 500 500 217 2000 100 500 500 500 500 500 500 800 800 500 500 500 206 40 100 461 425 507 265 184 448 800 507 800 800 783 074 60 100 476 438 505 366 260 497 800 505 800 800 781 092 60 200 490 478 500 443 407 495 800 500 800 800 670 088 60 500 497 495 499 478 471 493 644 499 800 800 500 074 60 1000 498 497 499 487 484 492 500 499 800 800 500 051 60 2000 499 498 499 489 488 492 500 499 800 800 500 032 4000 60 499 499 499 492 492 493 800 800 500 499 500 018 4000 100 500 500 500 500 500 500 800 800 500 500 500 172 8000 60 499 499 499 492 492 493 800 800 500 499 500 008 8000 100 500 500 500 500 500 500 800 800 500 500 500 140 60 4000 499 499 499 493 492 495 500 499 800 800 500 015 100 4000 500 500 500 500 500 500 500 500 800 800 500 170 60 8000 499 499 499 492 492 493 500 499 800 800 500 008 100 8000 500 500 500 500 500 500 500 500 800 800 500 140 100 10 800 800 800 800 800 800 800 800 800 800 800 724 100 20 800 800 800 800 800 800 800 800 800 800 800 618 10 50 573 522 690 167 133 279 800 690 800 800 800 112 10 100 800 800 800 800 800 800 800 800 800 800 800 800 20 100 639 579 757 185 144 304 800 800 800 757 800 131 is saisical facor analysis of unobservable facors, and he oher is regression analysis on observable facors. For he firs approach, mos sudies use grouped daa porfolios in order o saisfy he small resricion imposed by classical facor analysis, wih excepions such as Connor and Korajczyk 1993. he second approach uses macroeconomic and financial marke variables ha are hough o capure sysemaic risks as observable facors. Wih he mehod developed in his paper, we can esimae he number of facors for he broad U.S. sock marke, wihou he need o group he daa, or wihou being specific abou which observed series are good proxies for sysemaic risks. Monhly daa beween 1994.1 1998.12 are available for he reurns of 8436 socks raded on he ew York Sock Exchange, AMEX, and ASDAQ. he daa include all lived socks on he las rading day of 1998 and are obained from he CRSP daa base. Of hese, reurns for 4883 firms are available for each

approximae facor models 209 ABLE VI DGP: X i = r j=1 ij F j + e i ; e i = e i 1 + v i + J v j= Jj 0 i j; r = 5; = 5, = 5, = 0, J = 0. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 731 659 800 552 453 800 800 800 800 800 800 297 100 60 611 527 800 500 476 800 800 800 800 800 800 309 200 60 594 538 788 501 499 739 800 788 800 800 800 331 500 60 568 539 679 500 500 511 800 679 800 800 800 341 1000 60 541 527 602 500 500 500 800 602 800 800 800 327 2000 60 521 514 550 500 500 500 800 550 800 800 800 306 100 100 504 500 800 500 497 800 800 800 800 800 800 345 200 100 500 500 775 500 500 712 800 775 800 800 800 426 500 100 500 500 521 500 500 500 800 521 800 800 800 468 1000 100 500 500 500 500 500 500 800 500 800 800 800 473 2000 100 500 500 500 500 500 500 800 500 800 800 800 469 40 100 537 505 730 458 408 582 800 800 800 730 800 245 60 100 513 499 788 493 467 740 800 800 800 788 800 280 60 200 500 500 502 499 496 500 800 800 800 502 800 284 60 500 500 500 500 500 500 500 800 800 800 500 753 272 60 1000 500 500 500 500 500 500 800 800 572 500 504 254 60 2000 500 500 500 500 500 500 800 800 500 500 500 228 4000 60 511 508 522 500 500 500 800 522 800 800 800 281 4000 100 500 500 500 500 500 500 800 500 800 800 800 462 8000 60 505 505 508 500 500 500 800 508 800 800 800 255 8000 100 500 500 500 500 500 500 800 500 800 800 800 437 60 4000 500 500 500 500 500 500 800 800 500 500 500 192 100 4000 500 500 500 500 500 500 800 800 500 500 500 421 60 8000 500 500 500 500 500 500 800 800 500 500 500 164 100 8000 500 500 500 500 500 500 800 800 500 500 500 397 100 10 800 800 800 800 800 800 800 800 800 800 800 747 100 20 800 800 800 800 800 800 800 800 800 800 800 669 10 50 716 668 789 357 292 570 800 800 800 789 800 242 10 100 800 800 800 800 800 800 800 800 800 800 800 800 20 100 800 799 800 793 758 800 800 800 800 800 800 392 of he 60 monhs. We use he proposed crieria o deermine he number of facors. We ransform he daa so ha each series is mean zero. For his balanced panel wih = 60 = 4883 and kmax = 15, he recommended crieria, namely, PC p1 PC p2 IC p1, and IC p2, all sugges he presence of wo facors. 7 concluding remarks In his paper, we propose crieria for he selecion of facors in large dimensional panels. he main appeal of our resuls is ha hey are developed under he assumpion ha and are hus appropriae for many daases ypically used in macroeconomic analysis. Some degree of correlaion in he errors is also allowed. he crieria should be useful in applicaions in which he number of facors has radiionally been assumed raher han deermined by he daa.

210 jushan bai and serena ng ABLE VII DGP: X i = r j=1 ij F j + e i ; e i = e i 1 + v i + J v j= Jj 0 i j; r = 5; = 5, = 00, = 20, J = max /20 10. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 550 527 602 509 501 563 800 602 800 800 798 424 100 60 557 524 603 515 502 596 800 603 800 800 796 472 200 60 597 594 600 588 576 599 800 600 800 800 763 489 500 60 501 501 510 500 500 501 744 510 800 800 600 493 1000 60 500 500 500 500 500 500 598 500 800 800 593 493 2000 60 500 500 500 500 500 500 505 500 800 800 501 488 100 100 579 530 631 543 504 603 800 631 800 631 795 498 200 100 600 600 600 600 598 600 800 600 800 800 784 500 500 100 521 511 564 506 503 541 800 564 800 800 602 500 1000 100 500 500 500 500 500 500 600 500 800 800 600 500 2000 100 500 500 500 500 500 500 572 500 800 800 541 500 40 100 517 506 595 500 498 530 800 800 800 595 796 422 60 100 530 506 601 503 500 587 800 800 800 601 794 469 60 200 535 516 595 504 501 565 800 800 800 595 739 489 60 500 543 529 583 505 502 535 800 800 749 583 604 494 60 1000 555 545 579 508 505 525 800 800 601 579 600 493 60 2000 564 559 576 507 504 517 800 800 600 576 600 491 4000 60 500 500 500 500 500 500 500 500 800 800 500 484 4000 100 500 500 500 500 500 500 500 500 800 800 500 500 8000 60 500 500 500 500 500 500 500 500 800 800 500 472 8000 100 500 500 500 500 500 500 500 500 800 800 500 500 60 4000 565 563 572 505 504 509 800 800 600 572 600 485 100 4000 600 600 600 600 600 600 800 800 614 600 602 500 60 8000 567 566 571 504 504 505 800 800 600 571 600 477 100 8000 600 600 600 600 600 600 800 800 600 600 600 500 100 10 800 800 800 800 800 800 800 800 800 800 800 734 100 20 800 800 800 800 800 800 800 800 800 800 800 649 10 50 623 584 718 482 467 514 800 800 800 718 800 372 10 100 800 800 800 800 800 800 800 800 800 800 800 800 20 100 675 627 775 497 473 571 800 775 800 800 800 381 Our discussion has focused on balanced panels. However, as discussed in Rubin and hayer 1982 and Sock and Wason 1998, an ieraive EM algorihm can be used o handle missing daa. he idea is o replace X i by is value as prediced by he parameers obained from he las ieraion when X i is no observed. hus, if i j and F j are esimaed values of i and F from he jh ieraion, le Xi j 1 = X i if X i is observed, and Xi j 1 = ij 1 F j 1 oherwise. We hen minimize V k wih respec o Fj and j, where V k = 1 X i j 1 k i jf kj2. Essenially, eigenvalues are compued for he marix X j 1X j 1. his process is ieraed unil convergence is achieved. Many issues in facor analysis awai furher research. Excep for some resuls derived for classical facor models, lile is known abou he limiing disribuion

approximae facor models 211 ABLE VIII DGP: X i = r j=1 ij F j + e i ; e i = e i 1 + v i + J v j= Jj 0 i j; r = 5; = 5, = 050, = 20, J = max /20 10. PC p1 PC p2 PC p3 IC p1 IC p2 IC p3 AIC 1 BIC 1 AIC 2 BIC 2 AIC 3 BIC 3 100 40 754 692 800 643 552 800 800 800 800 800 800 414 100 60 657 593 800 568 528 800 800 800 800 800 800 439 200 60 652 615 797 600 591 784 800 797 800 800 800 468 500 60 616 597 712 540 530 592 800 712 800 800 800 476 1000 60 571 556 620 503 502 508 800 620 800 800 800 476 2000 60 533 526 561 500 500 500 800 561 800 800 800 469 100 100 598 571 800 572 527 800 800 800 800 800 800 480 200 100 601 600 795 600 599 778 800 795 800 800 800 503 500 100 589 581 606 559 546 594 800 606 800 800 800 500 1000 100 513 509 537 501 501 509 800 537 800 800 800 500 2000 100 500 500 500 500 500 500 800 500 800 800 800 500 40 100 588 546 755 507 493 657 800 800 800 755 800 376 60 100 584 545 796 524 505 779 800 800 800 796 800 425 60 200 567 544 599 520 507 583 800 800 800 599 800 442 60 500 559 547 588 513 508 548 800 800 800 588 791 450 60 1000 561 554 581 513 508 534 800 800 691 581 615 440 60 2000 564 560 574 511 508 522 800 800 600 574 600 427 4000 60 512 510 524 500 500 500 800 524 800 800 800 456 4000 100 500 500 500 500 500 500 800 500 800 800 800 500 8000 60 505 505 508 500 500 500 800 508 800 800 800 437 8000 100 500 500 500 500 500 500 800 500 800 800 800 500 60 4000 563 561 570 507 506 512 800 800 600 570 600 404 100 4000 600 600 600 600 600 600 800 800 644 600 617 500 60 8000 563 562 568 506 505 507 800 800 600 568 600 383 100 8000 600 600 600 600 600 600 800 800 608 600 602 500 100 10 800 800 800 800 800 800 800 800 800 800 800 754 100 20 800 800 800 800 800 800 800 800 800 800 800 685 10 50 734 687 793 484 437 682 800 800 800 793 800 341 10 100 800 800 800 800 800 800 800 800 800 800 800 800 20 100 800 800 800 799 784 800 800 800 800 800 800 454 of he esimaed common facors and common componens i.e., ˆ F i. Bu using heorem 1, i may be possible o obain hese limiing disribuions. For example, he rae of convergence of F derived in his paper could be used o examine he saisical propery of he forecas ŷ +1 in Sock and Wason s framework. I would be useful o show ha ŷ +1 is no only a consisen bu a consisen esimaor of y +1, condiional on he informaion up o ime provided ha is of no smaller order of magniude han. Addiional asympoic resuls are currenly being invesigaed by he auhors. he foregoing analysis has assumed a saic relaionship beween he observed daa and he facors. Our model allows F o be a dependen process, e.g, ALF =, where AL is a polynomial marix of he lag operaor. However, we do no consider he case in which he dynamics ener ino X direcly. If he