Maximum Likelihood Estimation for Factor Analysis. Yuan Liao

Size: px

Start display at page:

Download "Maximum Likelihood Estimation for Factor Analysis. Yuan Liao"

Bernice Nicholson
6 years ago
Views:

1 Maximum Likelihood Estimation for Factor Analysis Yuan Liao University of Maryland Joint worth Jushan Bai June 15, 2013

2 High Dim. Factor Model y it = λ i f t + u it i N,t f t : common factor λ i : loading u it : idiosyncratic component, correlated across i y it : is the only observable. Approx. factor model: Σ u = cov(u t ) is high-dimensional, non-diagonal

3 his talk 1 Estimate factors and loadings efficiently PC= argmin λi,f t i,t (y it λ i f t) 2 ignores cross-sec. hetero, corr. of u it 2 Estimate large covariance Σ u Key assumption: Sparsity 3 Understand the impact of large covariance estimation on statistical inference Improve efficiency, but technically challenging when N >

4 Maximum likelihood Estimation cov(y t ) = Λcov(f t )Λ + Σ u Suppose y t N (0,cov(y t )), Suppose cov(f t ) = I: identification, reduce # parameters log-likelihood: Bai and Li 2012 L(Λ,Σ u ) = 1 N log det(λλ + Σ u ) 1 N tr(s y(λλ + Σ u ) 1 ) Highly non-trivial: (1) highly nonlinear for Λ, (2) too many para. in Σ u

5 Estimating Σ u 1 Run SVD: S y = p j=1ˆλ jˆξj ˆξj = K j=1ˆλjˆξj ˆξj + p j=k +1ˆλ jˆξj ˆξj. 2 Compute p j=k +1ˆλ jˆξj ˆξj = (ˆrij ). 3 Apply (adaptive) thresholding (Cai and Liu, 11): Σ u = (s ij (ˆr ij )). s ij ( ): hard, soft, SCAD, adaptive Lasso,... Assume Σ u to be sparse, Fan, L, Mincheva (2013) showed: logn Σ u Σ u = O p (m N + m 1 logn N ) O p ( N ) finite-sample positive definite

6 Impact of large covariance estimation on inference Likelihood: 1 N log det(λλ + Σ u ) + 1 N tr(s y(λλ + Σ u ) 1 ) Incorporating covariance matrices often improves efficiency ˆθ = f(d,σ 1 u ), ˆθ 1 = f(d, Σ u ) (ˆθ θ 0 ) Gaussian Effect of estimating Σ u is negligible: (ˆθ ˆθ ) = A 1 (Σ 1 1 u Σ u )A 2 + o p (1) = o p (1)

7 Goal: to show A1 (Σ 1 1 u Σ u )A 2 = o p (1) Classical low dim. problems, consistency suffices Σ 1 1 u Σ u = o p (1), e.g., efficient GMM However, in high dimensions (N > ), Highly NON-trivial, even if Σ 1 1 u Σ u achieves the optimal rate, nearly root-. Optimal absolute converge is restrictive for inference when N > For efficient factor analysis, we need N Λ ( Σ 1 u Σ 1 u )Λ = o p (1) However, N Λ 2 Σ 1 u Σ 1 u O p ( N N 1 ) o p (1)

8 For inference in high-dimensional models, need to directly evaluate weighted convergence A1 ( Σ 1 u Σ 1 u )A 2 Intuition: averaged error converges faster weighted convergence A 1 (Σ 1 1 u Σ u )A 2 is faster than absolute convergence A 1 A 2 Σ 1 1 u Σ u. need sparsistency: with probability approaching one, If σ u,ij 0, ˆσ ij 0; If σ u,ij 0, ˆσ ij = 0

9 ˆΛ = argmin Λ log det(λλ + Σ u ) + tr(s y (ΛΛ + Σ u ) 1 ) heorem: j N, t, ˆft = argmin(y t ˆΛf t ) Σ 1 u (y t ˆΛf t ) ft (ˆλj λ j ) = 1 t=1f t u jt + o p (1) N(ˆft f t ) = 1 N N j=1ξ j u jt + o p (1) max j N λ j λ j = O p (m N max t ˆft f t = O p (m N logn logn + m N ) N + m N )(log) r N

10 Alternative: Penalized maximum likelihood Estimate Λ,Σ u simultaneously, max: 1 N log det(λλ +Σ u ) 1 N tr(s y(λλ +Σ u ) 1 ) w ij σ u,ij i j weight w ij : Lasso, adaptive lasso, SCAD, MCP... Still, technically highly non-trivial 1 N Λ ˆΛ 2 = o p (1), 1 N Σ u Σ u 2 = o p (1) Jushan and I spent months to prove this.

11 Numerical Examples

12 4 estimators are compared: PC: min i,t (y it λ i f t) 2 diag ML (Bai and Li 12): log det(λλ + diag(σ u )) tr(s y (ΛΛ + diag(σ u )) 1 ) two-step log det(λλ + Σ u ) tr(s y (ΛΛ + Σ u ) 1 ) penalized ML log det(λλ + Σ u ) tr(s y (ΛΛ + Σ u ) 1 ) i j w ij σ u,ij Algorithm: either EM (Bai and Li) or EM+Majorize minimize (Bien and ibshirani 11)

13 Penalized ML PCA DML γ = 1 γ = 5 N µ = 0.08 µ = 0.3 µ = 0.08 µ =

14 esting CAPM y it = α i + λ i f t + u it, H 0 : α i = 0, i Wald test c ˆα Σ 1 u ˆα, under H 0: Replacing with c ˆα Σ 1 u ˆα N d N (0,1). 2N Σ 1 u is difficult: ˆα (Σ 1 1 u Σ u )ˆα ˆα 2 Σ 1 1 u Σ u o p (1) 2N 2N N 1 1 N N =

15 Summary Factor analysis aking into account cross-sectional hetero. and corr. via Σ 1 u. Covariance estimation Estimating large error covariance is important for factor analysis and panel data models Effect of estimating covariance on inference is negligible, but tech. non-trivial

Theory and Applications of High Dimensional Covariance Matrix Estimation

1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications