Random Matrix Theory and its Applications to Econometrics

Size: px

Start display at page:

Download "Random Matrix Theory and its Applications to Econometrics"

Lionel Owens
5 years ago
Views:

1 Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018

2 Spectral Analysis of Large Dimensional Random Matrices The main goal of the Random Matrix Theory is to provide understanding of the diverse properties (most notably, statistics of matrix eigenvalues) of matrices with entries drawn randomly from various probability distributions traditionally referred to as the random matrix ensembles. from Scholarpedia. Dated back to Wishart and James in statistics. Wigner - studied the spectra of large dimensinoal random matrices in nuclear physics.

3 Empirical Spectral Distribution Empirical spectral distribution (ESD): Let A be an N N symmetric matrix and µ r (A) be the r th largest eigenvalue of A. The ESD of A is 1 N N I(µ r (A) x). r=1 Finite sample properties of ESD are hard to derive when the dimension of A is large. Find the limiting properties of the ESD. Suppose that e is the N T matrix of iid N(0, 1). Let µ r (e e) be the largest eigenvalue of e. Let κ := lim N T. F e e (x) := 1 N I(µ r (e e) Nx) F (x), T r=1 where F (x) =: f (x) = κ 2πx (b x)(x a), b = (1 + κ 1/2 ) 2, and a = (1 κ 1/2 ) 2. Wigner s semicircular law, Bai and Yin (1988), Marcenko and Pastur (1967), and Postur (1972, 1973),...

4 The Largest Eigenvalue Bounds on extreme eigenvalues. Suppose that e is the N T matrix of iid N(0, 1). Let λ 1(e e) be the largest eigenvalue of e. Let κ := lim N T. Starting from German (1980), many researchers derived the limit of the largest eigenvalue of the sample covariance matrix. For example, German (1980): 1 N λ1(e e) a.s. (1 + κ 1/2 ) 2 Johnstone (2001) and Soshnikov (2002): the properly normalized λ 1(e e) converge to the TracyWidom law, λ 1(e e) µ σ W 1, where µ = ( N 1 + T ) 2 and σ = ( N 1 + T )(1/ N 1 + 1/ T ) 1/3.

5 Expectation of Operator Norm of Random Matrix Suppose that u it are independent across i, t, and E(u it ) = 0 Let U = [u it ] R N T, the N T random matrix consisting of u it. Let U be the operator norm of U, that is, the largest singular value of U, or equivalently the square root of the largest eigenvalue of U U: U = ( 1/2 sup v U Uv) = sup w Uv. v =1 w =1, v =1 One of the well known results in the random matrix theory is E( U ) max(n, T ) if sup i,t E(u 4 it) M. (e.g., Latala (2005)). This implies that U = O p( max(n, T )).

6 Expectation of Operator Norm of Random Matrix: Extension I Suppose that ε it are independent across i, t with Eε it = 0, Eε 2 it = 1, and Eε 4 it uniformly bounded. (i) The u it MA( ) u it = ψ iτ ε i,t τ, for i = 1... N, t = 1... T, (1) τ=0 and the coefficients ψ iτ satisfy τ τ=0 max i=1...n ψ2 iτ < B, τ=0 max ψ iτ < B, (2) i=1...n for a finite constant B which is independent of N and T.

7 Expectation of Operator Norm of Random Matrix: Extension II (ii) The error matrix U is generated as U = σ 1/2 ε Σ 1/2. Here σ is the N N cross-sectional covariance matrix, and Σ is the T T time-serial covariance matrix, and they satisfy max j=1...n N i=1 σ ij < B, max τ=1...t T Σ tτ < B, (3) for some finite constant B which is independent of N and T. In this example we have Eu it u jτ = σ ij Σ tτ. See Moon and Weidner (2017, Appendix). t=1

8 Application 1: Factor Analysis Suppose that Y = λf + U, where λ R N R, f R T R λ and plim λ N N, plim T f f T > 0. Estimation R when N, T : Bai and Ng (2002), Onatski (2006), Ahn and Horenstein (2013) among others. We can estimate the number of the factors using the singular values of the observed samples in Y.

9 Factor Analysis: Cont d - I Let s i (A) be the i th largest singular of matrix A. Suppose that A and B are two matrices of dimension N T. For i = 1,..., n, we have s i (A + B) s i (A) s 1(B) (= B ). (Ky Fan s singular value inequality) Y = λf + U. We have rank(λf ) = R. So s i (λf ), if i R s i (λf ) = 0, if i R + 1. If U satisfies the previous conditions, U = O p( max(n, T )).

10 Factor Analysis: Cont d - II Therefore, ( ) Y s R ( ) Y s R+1 ( ) λf + U = s R ( ) λf s R U ( ) ( ) λf 1 s R O p min(n, T ) ( ) λf + U = s R ( ) 1 O p. min(n, T ) Choose ψ 0 with ψ min(n, T ). Then, R = min(n,t ) i=1 ( I ( ) ) Y s r ψ

11 Factor Analysis: Cont d - III This hints that we can distinguish weak factors or weak factor loadings (or local factor) whose variation is larger then max(n, T ).

12 Application 2: Interactive Fixed Effects Panel Regression with Missing Observations Moon and Weidner (2018). The latent model is Y = β 0X + Γ 0 + U rank(γ 0) = R 0 is fixed, that is, Γ 0 = λ 0f 0. S it : random selection dummy. Observe only (S it Y it, S it X it ). Goal: Estimate β 0. Computational Difficulty: min β,λ,f 1 2 i,t S it(y it βx it λ i f t) 2 is a non-convex optimization problem. Due to missing observations, we cannot use the principle component method of Bai (2009) and Moon and Weidner (2015,2017) No closed form profile objective function.

13 Nuclear Norm Reguralization Regularization: β ψ := argmin β min Γ 1 2 =: argmin Q ψ (β). β S it (Y it βx it Γ it ) 2 + Here Γ = min(n,t ) r=1 s r (Γ) = sup A 1 Tr(Γ A) is the nuclear norm of A. i,t Q ψ (β) is a convex function in β. ψ Γ Estimation of the reduced rank matrix Γ with missing observations: matrix completion problem (e.g., Netflix problem) in CS and ML literature. The parameter of interest is β 0. For simplicity, assume that X it has mean zero, and... Assume that E(S it ) = b > 0 and E(X 2 it ) > 0.

14 Consistency of β ψ - I Choose ψ such that ψ 0 and min(n, T )ψ 1/2. Let B ψ := {β B : β β 0 = Mψ 1/2 }. Since Q ψ (β) is convex, if argmin β Bψ Q ψ (β) Q ψ (β 0, Γ 0 ) > 0, then β ψ β 0 Mψ 1/2. A low bound of Q ψ (β) : By definition, for any A with A 1, S it (Y it βx it Γ it ) 2 + ψ Γ i,t S it (U it (β β 0 )X it (Γ it Γ 0,it )) 2 + i,t Choose A = 1 ψ (S it U it S it X it (β β 0 )). Then, A 1 1 [S it U it ] + β β 0 [S it X it ] ψ ψ ψ Tr(Γ A). ( ) ( ) 1 M O p + O p = o min(n, T )ψ min(n, T )ψ 1/2 p(1).

15 Consistency of β ψ - II Then, Q ψ (β) Q ψ (β 0, Γ 0 ) = min 1 S it (Y it βx it Γ it ) 2 + ψ Γ Γ 2 i,t min 1 S it (U it (β β 0 )X it (Γ it Γ 0,it )) 2 + ψ Tr(Γ A) Γ 2 i,t = 1 2 S it (U it (β β 0 )X it ) 2 + i,t 1 S it (U it (β β 0 )X it ) 2 2 i,t M S it U it X it i,t ψ1/2 + M2 i,t > 0 wp1 by choosing M large ψ Tr(Γ 0 A) ψ Γ 0 A 1 2 S it X 2 it ψ ψ Γ 0 S it Uit 2 i,t

16 Extension: Expectation of Operator Norm of Random Element Matrix Let u it (β) be a sequence of stochastic processes indexed by β B. Assume that the index set B is equipped with a pseudo metric d β (, ). Assume that u it (β) l (B) are independent across i, t. U(β) = [u it (β)] l (B) N T. Define U(β) B := sup U(β) = sup w U(β)v. β B β B, w =1, v =1 The main contribution: establish a bound of E( U(β) B ). We use some known results - covering numbers, chaining argument, concentration inequalities,...

17 Assumptions Assumptions (i) There exists a finite constant σ 2 such that sup log E (exp (λu it (β))) λ 2 σ 2 /2 β B for all λ > 0. (ii) For all (i, t) and (β 1, β 2 ) B B, ( x 2 ) P { u it (β 1 ) u it (β 2 ) > x} 2 exp 2d β (β 1, β 2 ) 2. (iii) The index set B is totally bounded with respect to d β (, ). Under the Assumption (i), sup β B E [max i,t u it (β) ] 2σ 2 log( ). u it (β) is a sub-gaussian process.

18 Main Result Suppose that N β (B, d β, ɛ) is the covering number of (B, d β ). Theorem E [ U(β) B ] log(max(n, T )) max(n, T ) + diam(b) 0 + max(n, T ) log N β (B, d β, ν)dν diam(b) 0 log N β (B, d β, ν)dν. If N β (B, d β, ɛ) ɛ K for some finite K, then E [ U(β) B ] log(max(n, T )) max(n, T ).

19 Sketch of Proof I Let g it iid N(0, 1), and independent of u it (β). Let G := [g it ]. Let Z(β) := U(β) G, the element by element product (the Hadamard product) of U(β) and G. Step 1: Using the symmetrization argument, we show that there exists a finite constant C such that E U(β) B C E Z(β) B. Let W := {x R N : x = 1} and V := {x R T : x = 1}. Let θ := (β, w, v ) Θ, where Θ := B W V. by definition we can express Z(β) B := sup sup β B w W, v V w Z(β)v. Let N w and N v be the ɛ-nets of (W, d w ) and (V, d v ), respectively.

20 Sketch of Proof II Step2: From w Z(β)v being bi-linear in w, v and by definition, we show ( ) ( ) sup Z(β) 2ɛ sup Z(β) + sup max w Z(β)v. β B β B β B (w,v) N w N v From this, we deduce [ ] E sup Z(β) β B [ 1 ( 1 2ɛ E sup β B max w Z(β)v (w,v) N w N v ) ]. Apply the chaining argument (to control sup β B ) and the maximal inequality (to control max (w,v) Nw N v ). Also, we use the fact that N w (W, d w, ɛ) Step 3 We deduce the desired result. ( ) N 3, N v (V, d v, ɛ) ɛ ( ) T 3. ɛ

21 Application: Least Operator Norm Estimation I Suppose that ε it (β) are moment functions such that E(ε it (β 0)) = 0. For simpliticy, assume that ε it (β) R and β B R. Let ε(β) = [ε it (β)], the N T matrix of moment functions. The conventional method of moment estimator solves 1 ( ) β = argmin ε it (β) β B = argmin 1 N ε(β) 1T, N T i,t where 1 N is the N-vector of ones. The least operator norm estimator minimizes the operator norm of the moment function matrix ε(β), β := argmin β B = argmin β B ε(β) max w =1, v =1 β B ( ) ε(β) w v.

22 Consistency I For consistency of ˆβ, we need the following conditions. (i) B is bounded. (ii) ε it (β) E(ε it (β)) satisfies the sub-gaussian process assumption. (iii) For any ɛ > 0, there exists δ > 0 such that inf β β0 ɛ E(ɛ(β)) > 2δ. The consistency of β follows if for any ɛ > 0, there exists δ > 0 such that ε(β) inf ε(β0) > δ wp1. (4) β β 0 ɛ Notice by the triangle inequailty, ε(β) E(ε(β)) ε(β) E(ε(β)) inf inf sup β β 0 ɛ β β 0 ɛ β B E(ε(β)) = inf + o p(1) β β 0 ɛ δ wp1. This shows that the least operator norm estimator β is consistent.

23 Extensions I If ε it (β) are iid, E(ε(β)) = E(ε it (β) 1 1 N T = E(εit (β). In this case, the identification condition (iii) becomes the usual identificaiton condition that for any ɛ > 0, there exists δ > 0 such that inf β β0 ɛ E(ε it (β)) > 2δ. Suppose that ε it (β) = (ε 1,it (β),..., ε L,it (β)) R L. For the least operator norm objective function, we may consider L l=1 ω l ε l (β). Suppose that R slowly satifying objective function: R 1 s r (ε(β)), r=1 R 0. Extend the min(n,t ) where s r (A) is the r th largest singular valeu of matrix A.

24 Extensions II Since ε(β) E(ε(β)) = s 1(ε(β) E(ε(β))), we have sup β B R 1 r=1 s r (ε(β) E(ε(β))) ( ) ε(β) E(ε(β)) R R = O p = o p(1). min(n, T )

25 Conclusion In today s talk, we derived a uniform bound of the expectation of the operator norm of random element matrix. As an application, we invesigate a new estimator that minimizes the operator norm of the moment function matrix, and show its consistency. Future study: convergence rate and asymptotic distribution.

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models