The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Similar documents
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Limit theorems for dependent regularly varying functions of Markov chains

Random matrices: Distribution of the least singular value (via Property Testing)

Nonlinear Time Series Modeling

Stochastic volatility models with possible extremal clustering

Extremogram and Ex-Periodogram for heavy-tailed time series

Extremogram and ex-periodogram for heavy-tailed time series

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Concentration Inequalities for Random Matrices

Regular Variation and Extreme Events for Stochastic Processes

Exponential tail inequalities for eigenvalues of random matrices

Heavy Tailed Time Series with Extremal Independence

Stochastic volatility models: tails and memory

Practical conditions on Markov chains for weak convergence of tail empirical processes

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

Large deviations for random walks under subexponentiality: the big-jump domain

Tail process and its role in limit theorems Bojan Basrak, University of Zagreb

Fluctuations from the Semicircle Law Lecture 4

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Extreme Value Analysis and Spatial Extremes

The sample autocorrelations of financial time series models

Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE

Comparison Method in Random Matrix Theory

Random regular digraphs: singularity and spectrum

Max stable Processes & Random Fields: Representations, Models, and Prediction

Parameter estimation in linear Gaussian covariance models

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

2. Matrix Algebra and Random Vectors

1 Tridiagonal matrices

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Markov operators, classical orthogonal polynomial ensembles, and random matrices

Finite-time Ruin Probability of Renewal Model with Risky Investment and Subexponential Claims

Estimation of the functional Weibull-tail coefficient

Non white sample covariance matrices.

Precise Large Deviations for Sums of Negatively Dependent Random Variables with Common Long-Tailed Distributions

Large sample distribution for fully functional periodicity tests

Asymptotic Statistics-III. Changliang Zou

STAT 200C: High-dimensional Statistics

Stein s Method and Characteristic Functions

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Asymptotics and Simulation of Heavy-Tailed Processes

STA205 Probability: Week 8 R. Wolpert

On the Estimation and Application of Max-Stable Processes

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

GARCH processes probabilistic properties (Part 1)

International Conference on Applied Probability and Computational Methods in Applied Sciences

Invertibility of random matrices

Math 576: Quantitative Risk Management

The Subexponential Product Convolution of Two Weibull-type Distributions

Poisson Cluster process as a model for teletraffic arrivals and its extremes

Estimation of large dimensional sparse covariance matrices

arxiv: v3 [math-ph] 21 Jun 2012

Regularly varying functions

Asymptotics of random sums of heavy-tailed negatively dependent random variables with applications

III. Quantum ergodicity on graphs, perspectives

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Basic concepts of probability theory

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Ruin Probabilities of a Discrete-time Multi-risk Model

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Basic concepts of probability theory

Large sample covariance matrices and the T 2 statistic

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

Concentration of Measures by Bounded Couplings

Heavy-tailed time series

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Concentration inequalities for non-lipschitz functions

Inference For High Dimensional M-estimates. Fixed Design Results

Thomas Mikosch and Daniel Straumann: Stable Limits of Martingale Transforms with Application to the Estimation of Garch Parameters

8.1 Concentration inequality for Gaussian random matrix (cont d)

Concentration inequalities: basics and some new challenges

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

On the Bennett-Hoeffding inequality

Inference for High Dimensional Robust Regression

Stable limits for sums of dependent infinite variance random variables

A Functional Central Limit Theorem for an ARMA(p, q) Process with Markov Switching

Eigenvalue variance bounds for Wigner and covariance random matrices

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

Operational Risk and Pareto Lévy Copulas

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

LARGE DEVIATIONS FOR SOLUTIONS TO STOCHASTIC RECURRENCE EQUATIONS UNDER KESTEN S CONDITION D. BURACZEWSKI, E. DAMEK, T. MIKOSCH AND J.

Lecture 2: Review of Basic Probability Theory

Basic concepts of probability theory

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables

Lectures 6 7 : Marchenko-Pastur Law

Lecture 32: Asymptotic confidence sets and likelihoods

Lesson 4: Stationary stochastic processes

Zeros of Random Analytic Functions

Applications of Distance Correlation to Time Series

2 Eigenvectors and Eigenvalues in abstract spaces.

CS Lecture 19. Exponential Families & Expectation Propagation

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

January 22, 2013 MEASURES OF SERIAL EXTREMAL DEPENDENCE AND THEIR ESTIMATION

The Convergence Rate for the Normal Approximation of Extreme Sums

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Matrices and Multivariate Statistical Analysis

Reconstruction from Anisotropic Random Measurements

Concentration inequalities and the entropy method

Submitted to the Brazilian Journal of Probability and Statistics

Transcription:

The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University) 1 Suzhou, June 2018 1

2 4 3.5 3 lower tail index 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Upper tail index Figure 1. Estimation of tail indices for S&P 500.

1. Heavy-tailed matrices: preliminaries 3 We consider p n matrices X = X n = ( X it )i,...,p;t=1,...,n, where all entries have the same distribution with heavy tails; X denotes a generic element with this distribution. We are interested in the asymptotic behavior of the largest eigenvalues and corresponding eigenvectors of the sample covariance matrices for p = p n : ( n XX = X it X jt, n = 1, 2,..., )i,j=1,...,p t=1

4 1.1. An excursion to regularly varying random variables. The random variable X and its distribution are regularly varying with index α > 0 if there exists a slowly varying function L and constants p ± such that p + + p = 1 and P(X > x) p + L(x) x α and P(X > x) p L(x) x α, x > 0. Then P(X 2 > x) L( x) x α/2. For iid copies (X i ) of X > 0, some slowly varying l. Embrechts and Goldie (1984): P(X 1 X 2 > x) l(x) x α.

5 If X, σ > 0 independent and E[σ α+δ ] <, then Breiman (1965) P(σ X > x) E[σ α ] P(X > x), x. A regularly varying random variable X > 0 and its distribution are subexponential: for some (any) n 2, P(X 1 + + X n > x) n P(X > x), n. Other subexponential distributions: log-normal, Weibull F (x) = exp( x τ ), τ (0, 1). Assume X > 0. There exists a sequence x n such that P(X 1 + + X n > x n ) n P(X > x n ), n, if and only if X is subexponential. Cline, Hsing (1998)

6 1.2. Heavy-tail large deviations. Define for iid (X i ) with a generic element X, S n = X 1 + + X n, n 1. Assume E[X] = 0 if the first moment is finite. If x n, A is bounded away from zero, and x 1 n S n P 0, n, we call P(x 1 n S n A) a large deviation probability.

If (X i ) is iid regularly varying with index α > 0 then P(±S n > x) sup x γ n n P( X > x) p ± 0, n, 7 where γ n /a n for α (0, 2), n P( X > a n ) 1, γ n = c n log n for c > α 2, α > 2. A.V. Nagaev (1969), S.V. Nagaev (1979), C.C. Heyde (1969), Cline, Hsing (1998),... If α > 2, 0 < c < α 2, var(x) = 1, uniformly for x < c n log n, P(S n > x) Φ(x/ n), n.

8 Separating sequences for the normal and subexponential approximations to large deviation probabilities exist for general subexponential distributions.

2. Heavy-tailed matrices with iid entries: Finite p 9 We consider p n matrices X = X n = ( X it )i=1,...,p;t=1,...,n, where all entries have the same distribution with heavy tails L(x) L(x) P(X > x) p + and P(X < x) p x α x α for a slowly varying function L, some α > 0, p + + p = 1. Assume E[X] = 0 if expectation is finite. Therefore Gnedenko and Kolmogorov (1949), Feller (1971), Resnick (2007) ( n a 2 n X 2 it c ) d (α/2) n ξ i, α (0, 4), t=1 where P( X > a n ) n 1 and (ξ (γ) i ) are iid γ-stable, γ (0, 2],

10 and for i j, b 1 n n t=1 X itx jt d ξ (2 α) i,j, α (0, 4), where P( X 1 X 2 > b n ) n 1 for α (0, 2) and b n = n for α (2, 4). Since a 2 n n2/α and b n n 1/(2 α), ( a 2 n XX diag(xx ) ) 2 b2 n 2 a 4 n 1 i j p ( 1 b n n ) 2 P X it X jt 0. t=1 Therefore, by Hermann Weyl s inequality, a 2 n max λ(i) λ (i) (diag(xx )) i=1,...,p a 2 XX diag(xx P ) 0, 2 n where λ (1) (A) λ (p) (A) for any symmetric matrix A and λ (i) = λ (i) (XX ).

11 Hence for α (0, 4), ξ (α/2) (1) ξ (α/2) (p) ( ) a 2 n λ(i) c n i=1,...,p d ( ξ (α/2) ) (i) i=1,...,p. In particular, n a 2 n (λ (1) n E[X2 ] 1 (2,4) (α) ) d max i=1,...,p ξ(α/2) i. The unit eigenvector V j associated with λ (j) = λ Lj can be localized: V j e Lj P 0.

12 We have for α (2, 4), with c n = n E[X 2 ] 1 ( p det(xx ) cn) p = a 2 n cp 1 n d = i=1 p i=1 p i=1 a 2 n (λ (i) c n ) ξ (α/2) (i) ξ (α/2) i d = p 2/α ξ (α/2) 1. i 1 j=1 λ (j) c n

3. Heavy-tailed matrices with iid entries: p increases with n 13 We consider p n matrices X = X n = ( X it )i,...,p;t=1,...,n, where the entries are iid and have a distribution with heavy tails P(X > x) p + x α and P(X < x) p x α for some α (0, 4), p + + p = 1, and p = p n = n β for some β > 0. Then for a np = n (1+β)/α, i.e., np P( X > a np ) 1, a 2 np XX diag(xx ) P 2 0, β (0, 1], X X diag(x X) P 2 0, β (1, ). a 2 np

14 One needs Nagaev-type large deviations: for i j and β 1, ( p 2 P a 2 np n ) X it X jt > ε p 2 n P( X 1 X 2 > εa 2 np ) P 0. t=1 Observe that XX and X X have the same positive eigenvalues. For β > 1, interchange the roles of n and p, and observe that n = p 1/β. For the moment, we consider the case β 1 only.

Write λ 1,..., λ p for the eigenvalues of XX and 15 λ (p) λ (1). We also write n D i = X 2 it = λ i(diag(xx )), i = 1,..., p, t=1 and for the ordered values By Weyl s inequality, 1 λ (i) D (i) 1 a 2 np max i=1,...,p D (p) D (1). a 2 np XX diag(xx ) 2 P 0, β (0, 1].

16 Limit theory for the order statistics of (D i ) can be handled by point process convergence towards a Poisson process: for example, p i=1 ε a 2 np (D i c n ) d i=1 ε Γ 2/α i = N Γ for Γ i = E 1 + + E i, (E i ) iid standard exponential, if and only if Resnick (2007) p P(a 2 np (D i c n ) > x) x α/2, x > 0, p P(a 2 np (D i c n ) < x) 0, x > 0. This follows by Nagaev-type large deviations.

Centering with c n is only needed for α [2, 4) and if 17 (n p)/a 2 np 0. Centering is not needed if for example when min(β, β 1 ) ( (α/2 1) +, 1 ], (3.1) p/n γ (0, ). Under (3.1), this was proved by Soshnikov (2004,2006) for α (0, 2) and by Auffinger, Ben Arous, Péché (2009) for α (0, 4). They do not make explicit use of large deviation probabilities but prove their results by approximating the largest eigenvalue λ (i) by the corresponding ith order statistic X(i) 2 from (Xjt 2 ) k=1,...,p ;t=1,...,n.

18 Partial results (for particular choices of p) were proved in Davis, Pfaffel and Stelzer (2014), Davis, Mikosch, Pfaffel (2016) for (X it ) satisfying some linear dependence conditions. If α < 1, (p n ) could be chosen of almost exponential rate.

In the light-tailed case, limit results for eigenvalues of XX are very sensitive with respect to the growth rate of p. 19 For example, Johnstone (2001) showed for iid standard normal X it under (3.1) that n + p ( 1/ n + 1/ p ) 1/3 ( λ (1) ( n + p) 2 1 ) d Tracy-Widom distr. This result remains valid for iid entries X it whose first four moments match those of the standard normal, and all moments of X are finite Tao and Vu (2011).

20 In the iid regular variation case, using a.s. continuous mappings, folklore (Resnick (1987,2007)) applies to prove the joint convergence of the order statistics (possibly with centering c n for α [2, 4)) ( ) a 2 d ( 2/α np λ(1),..., λ (k) Γ 1,..., Γ 2/α ) k, in particular, P(Γ 2/α 1 x) = exp( x α/2 ), x > 0, is the Fréchet distribution Φ α/2. the joint convergence of the ratios of successive order statistics (λ (2) λ (1),..., λ (k 1) λ (k) ) d ((Γ 1 Γ 2 ) 2/α,..., (Γ k 1 Γ k ) 2/α ).

the convergence of the ratio of largest eigenvalue to the 21 trace of XX for α (0, 2) 2/α λ (1) d Γ 1 λ 1 + + λ p Γ 2/α 1 + Γ 2/α 2 + With these techniques one can handle results involving finitely many of the largest eigenvalues of XX. One cannot handle the smallest eigenvalues, determinants,.... The largest eigenvalues have very distinct sizes and the eigenvectors are localized: let V k be the unit eigenvector corresponding to λ (k) = λ Lk. Then V k e Lk P 0.

22 log(λ(i+1) λ(i)) 2.0 1.5 1.0 0.5 0.0 0 10 20 30 40 50 i Figure 2. The logarithms of the ratios λ (i+1) /λ (i) for the S&P 500 series. We also show the 1, 50 and 99% quantiles (bottom, middle, top lines, respectively) of the variables log((γ i /Γ i+1 ) 2 ).

23 12 x 105 Eigenvalues and their ratios as in Lam & Yao 10 8 λ (i) 6 4 2 0 0 50 100 150 200 250 300 350 400 450 i 1 0.8 λ (i+1) /λ (i) 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 450 i Figure 3. Eigenvalues and ratios λ (i+1) /λ (i) for S&P 2002-2014, p = 442

24 0 50 100 150 200 1e 15 5e 16 0e+00 5e 16 Normal data Indices of components Size of components 0 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 Pareto data Indices of components Size of components Figure 4. The components of the eigenvector V 1. Right: The case of iid Pareto(0.8) entries. Left: The case of iid standard normal entries. We choose p = 200 and n = 1, 000.

4. Stochastic volatility models 25 The main idea in the iid case is to exploit the heavier tails of X 2 in comparison with those of X 1 X 2. Similar behavior can be observed for regularly varying stochastic volatility models X it = σ it Z it, i, t Z, where (σ it ) is a strictly stationary random field 2 (with finite moments of order α + δ) independent of the iid random field (Z it ) assumed regularly varying with index α (0, 4). 2(log σ it ) is often assumed Gaussian.

26 By Breiman s lemma, (X it ) is (jointly) regularly varying and P(X 2 > x) E[σ α ] P(Z 2 > x) x α/2, P(X it X jt > x) E[(σ it σ jt ) α ] P(Z 1 Z 2 > x) x α. Nagaev-type large deviations for stochastic volatility models Mikosch and Wintenberger (2016) confirm that a 2 XX diag(xx ) P 2 0. and np p i=1 ε a 2 np (λ i c n ) d i=1 ε Γ 2/α i Then, essentially, the same results as for iid (X it ) hold..

Thinning of X 27 Similar techniques apply when σ it = σ (n) it assumes zero with high probability, e.g. (σ (n) it ) iid Bernoulli distributed for n 1 and q (n) 0 = P(σ (n) = 0) 1 and lim n n P(σ (n) = 1) = lim n n q (n) 1 > 0, and independent of (Z it ); see Auffinger, Tang (2016), q (n) 1 = n v, v (0, 1) New normalization (b n ) n p E[(σ (n) ) α ] P( Z > b n ) = n p q (n) 1 P( Z > b n) 1, n. or b n = a [ n p E[(σ (n) ) α ]]

28 Then, for β 1, using Nagaev-type large deviations if lim n n p (n) 1 = or regular variation if lim n n p (n) b 2 n sup i=1,...,p λ(i) λ (i) ( diag(xx ) ) b 2 XX diag(xx ) 2 0 n 1 is finite and p i=1 ε b 2 n (λ i c n ) d i=1 ε Γ 2/α i.

5. Another structure where the squares dominate: linear 29 processes. Davis, Pfaffel, Stelzer (2014), Davis, Mikosch, Pfaffel (2016) Special case: X it = θ 0 Z i,t + θ 1 Z i 1,t for iid (Z it ) with regularly varying Z with index α (0, 4), real coefficients θ i, E[Z] = 0 if finite. Choose (a n ) such that n P( Z > a n ) 1. Observe that n X 2 i,t = t=1 n t=1 ( θ 2 0 Z2 i,t + ) n θ2 1 Z2 i 1,t + 2θ0 θ 1 Z i,t Z i 1,t t=1 = θ 2 0 D i + θ 2 1 D i 1 + o P (a 2 n ).

30 Here we used Nagaev-type large deviations and the fact that Z 2 has tail index α/2, while Z 1 Z 2 has tail index α. Similarly, n X i,t X i+1,t = θ 0 θ 1 t=1 n t=1 Z 2 i,t + o P (a 2 n ) = θ 0 θ 1 D i + o P (a 2 n ). This leads to the approximation ( n n t=1 X2 i,t t=1 X i,tx i+1,t ) n t=1 X i,tx i+1,t ( θ 2 0 θ 0 θ 1 θ 0 θ 1 θ 2 1 ) D i + n t=1 X2 i+1,t ( θ 2 1 0 0 0 ) D i 1 + ( 0 0 0 θ 2 0 ) D i+1.

The sample covariance matrix can be approximated by p XX D 2 i M i = o P (a 2 np ), where M 1 = i=1 θ 2 0 θ 0 θ 1 0... 0 θ 0 θ 1 θ 2 1 0... 0 0 0 0... 0....... 0 0 0... 0, M 2 = Denote the order statistics of the D i D L i = D (i). 0 0 0... 0 0 θ 2 0 θ 0 θ 1... 0 0 θ 0 θ 1 θ 2 1... 0 0 0 0... 0....... 0 0 0... 0,... by D (1) D (p) and 31 Then a 2 np XX p D 2 P L i M Li 0. i=1

32 For k = k n slowly, a 2 np XX k D 2 P L i M Li 0. i=1 Since (D i ) is iid, (L 1,..., L p ) is a random permutation of (1,..., p), hence the event A k = { L i L j > 1, i j = 1,..., k} has probability close to one provided k 2 = o(p). On the set A k, the matrix k i=1 D L i M Li is block-diagonal with non-zero eigenvalues DL i (θ0 2 + θ2 1 ), i = 1,..., k. Here we used that M Li has rank 1 with non-zero eigenvalue equal to θ 2 0 + θ2 1.

By Weyl s inequality, λ (i) D L i (θ 2 0 + θ2 1 ) a 2 a 2 np max i=1,...,k np XX k D 2 P L i M Li 0. i=1 33 Extension to general linear structure: X it = k,l h kl Z k i,l t Use truncation of the coefficient matrix H = (h kl ) of the linear process. Then a 2 np max λ(i) δ (i) P 0, n, i=1,...,p where δ (1),..., δ (p) are the p ordered values (with respect to absolute value) of the set

34 {(D i E[D 1 ])v j, i = 1,..., k; j = 1, 2,...} for α (0, 2), (α (2, 4)) where (v j ) are the eigenvalues of H H = ( h il h jl )i,j=1,2,... l=0 The mapping theorem implies for suitable real or complex numbers (v j ) j=1 p i=1 ε a 2 np (D i ED 1 )v j d j=1 The limit is a Poisson cluster process. i=1 ε Γ 2/α i v j. An example: The separable case: We assume h kl = θ k c l. The matrix HH = ( ) l=0 c2 l θi θ j has rank r = 1 and i,j 0 v (1) = θ 2 k. l=0 c 2 l k=0

The limit point process is Poisson as in the iid case. For α < 2, ( ) d ( 2/α λ(1),..., λ (k) v(1) Γ 1,..., Γ 2/α ), a 2 np 2/α λ (1) d Γ 1 λ (1) + + λ (k) Γ 2/α k 2/α λ (1) d Γ 1 λ 1 + λ 2 + + λ p Γ 2/α 1 + + Γ 2/α 1 + Γ 2/α 2 +, k, 35

36 6. Final comments The asymptotic behavior of the largest eigenvalues of a heavy-tailed large sample covariance matrix XX are understood * if the diagonal diag(xx ) = ( n t=1 X 2 it ) i=1,...,p dominates the asymptotic behavior of the eigenvalues * for linear random fields X it = k,l h k,lz i k,t l with iid heavy-tailed (Z it ) Davis, Mikosch, Pfaffel (2016), Davis, Heiny, Mikosch, Xie (2017) * if the rows are iid. Davis, Pfaffel, Stelzer (2015) Nagaev-type large deviations are key techniques.

The eigenvalues of the sample covariance matrices of other 37 types of heavy-tailed multivariate time series are less understood.