Extremal covariance matrices by

Size: px

Start display at page:

Download "Extremal covariance matrices by"

Dale Chandler
5 years ago
Views:

1 Extremal covariance matrices by Youssouph Cissokho Department of Mathematics and Statistics Faculty of Science University of Ottawa. Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfilment of the requirements For the MSc degree in Mathmatics and Statistics c Youssouph Cissokho, Ottawa, Canada, 208 The MSc program is a joint program with Carleton University, administered by the Ottawa- Carleton Institute of Mathematics and Statistics

2 Abstract The tail dependence coefficient TDC is a natural tool to describe extremal dependence. Estimation of the tail dependence coefficient can be performed via empirical process theory. In case of extremal independence, the limit degenerates and hence one cannot construct a test for extremal independence. In order to deal with this issue, we consider an analog of the covariance matrix, namely the extremogram matrix, whose entries depend only on extremal observations. We show that under the null hypothesis of extremal independence and for finite dimension d 2, the largest eigenvalue of the sample extremogram matrix converges to the maximum of d independent normal random variables. This allows us to conduct an hypothesis testing for extremal independence by means of the asymptotic distribution of the largest eigenvalue. Simulation studies are performed to further illustrate this approach. Declaration I, the undersigned, hereby declare that the work contained in this essay is my original work, and that any work done by others or by myself previously has been acknowledged and referenced accordingly. ii

3 Acknowledgements First and formost, I would like to thank God for his grace and mercy he has given me to be able to finish this work. I would like to express my deep gratitude to Professor Rafal Kulik, my research supervisor, for his valuable and constructive suggestions during the planning and development of this research work. His willingness to give his time so generously has been really appreciated. My grateful thanks are also extended to all my family members. iii

4 Contents List of Figures List of Tables vi viii Introduction 2 Preliminaries 4 2. Regular variation Regularly varying functions Bounds and limits Regularly varying random variables Vague convergence Regularly varying random vectors Tail dependence coefficient TDC Weak convergence in metric spaces Lindeberg s condition Tightness via entropy Auxiliary results Weak convergence of the tail empirical process Tail empirical process Proof of Theorem Estimation and testing for extremal Independence Estimation of Tail Dependence Coefficient TDC Testing for Extremal Independence using TDC iv

5 4.3 Extremogram matrix: bivariate case Asymptotics for eigenvalues under extremal independence Testing for Extremal Independence using random matrices Extremogram matrix: d-dimensional case Testing for Extremal Independence using random matrices Tail empirical process with random levels Implementation: Simulation studies Real data analysis Conclusion and further work 59 References 6 v

6 List of Figures 2. Simulated samples of size 0000 of two i.i.d. Pareto random variables Simulated samples of size 0000 of two dependent random variables Asymtotic confidence intervals for the tail dependence coefficient for Example Blue line: the estimator X,Y β n plotted against different choices of the threshold u n ; red dashed lines - confidence interval using 4.9; yellow line - confidence interval using the true value of β X,Y The density function of Ms =.2 blue dashed line and the density of the normal distribution with µ = EMs and σ 2 = V arms red line The scatter plot left panel and the estimator of the TDC right panel for X and Y drown independently from Pareto distribution with α = The scatter plot left panel and the estimator of the TDC right panel for the model Y = Z 2 + Z 3, X = Z + Z 2, where Z i, i =, 2, 3 are independent Pareto with α = The bivariate t distribution: the scatter plot left panel and the estimator of the TDC right panel plotted against the order statistics Time series plot of NASDAQ left panel and S&P500 right panel QQ-plot of NASDAQ left panel and S&P500 right panel The absolute log returns of daily stock prices for S&P500 vs. NASDAQ data set: the scatter plot top left panel, Hill plot top right panel of NASDAQ red dashed line and S&P 500 blue dashed line with CI, the densities bottom left panel and the estimator of the TDC bottom right panel Time series plot of German Mark left panel and French Franc right panel QQ-plot of German Mark left panel and French Franc right panel vi

7 4. The absolute log returns of daily exchange-rates for German Mark and French Franc data set: the scatter plot top left panel, Hill plot top right panel of German Mark red line and French Franc blue line, the densities bottom left panel and the estimator of the TDC bottom right panel vii

8 List of Tables 4. Testing for extremal independence. The 95% asymptotic confidence intervals are CI.2= M.025.2, M , 0.56, CI.3= M.025.3, M = 0.23, 0.5 and CI.5= M.025.5, M = 0.6, 0.35 for s =.2,.3,.5, u n = n δ and δ =.0. The + sign indicates that T belongs to the CI Testing for extremal independence. The 95% asymptotic confidence interval for s =.2 is CI= M.025, M.975 = 0.25; The + sign indicates that T 2 belongs to the CI and the numbers in parentheses correspond to the true TDC Testing for extremal independence. The 95% asymptotic confidence interval for s =.3 is CI= M.025, M.975 = 0.23, 0.5. The + sign indicates that T 2 belongs to the CI and the numbers in parentheses correspond to the true TDC Testing for extremal independence for the stock prices data set. The 95% asymptotic confidence interval is CI.2= M.025.2, M = 0.24; The - sign indicates that T 2 doesn t belong to the CI Testing for extremal independence for the exchange-rates data set. The 95% asymptotic confidence interval is CI.2= M.025.2, M = 0.25; The + sign indicates that T 2 belongs to the CI viii

9 Chapter Introduction The study of the behavior of extreme events becomes one of the priorities in recent years, because the occurence of these events may cause damage in many sectors e.g. finance, economic, insurance, loss of human lives, etc.. For univariate random variables a good theoretical toll for the statistical modeling of such events is embedded in the framework of Extreme Value Theory EVT with the famous Fisher-Tippett-Gnedenko theorem that describes the limiting behavior for the properly normalized maxima. In a multivariate case, one of the many tools used to study the extreme events is the so-called tail dependence coefficient. If extremes in a multivariate vector do not occur together, then we have extremal independence. Otherwise, we have extremal dependence. The main goal of this thesis is to test a null hypothesis for extremal independence by means of the tail dependence coefficient. To estimate the tail dependence coefficient, we use a nonparametric approach. It is more advantageous in a sense that it avoids any misidentification about the underlying distribution. Related studies on the tail dependence coefficient. Asymptotic properties of the estimators of the tail dependence coefficient are investigated and reviewed in [0]. The authors prove limit theorems for the proposed estimators under known and unknown marginal distributions. In our case we work under the assumption of regular variation and consider the estimator which is obtained from the definition of the TDC 2.4 by replacing the cdfs with their empirical versions. As such, we provide limit theorems under a weaker technical assumptions, but restricting a class of bivariate vectors. More specifically, our Theorem 4.. corresponds to Theorem 5 in [0]. Similarly, our Theorem corresponds Theorem 6 in [0]. The estimator used in [0] is based on the rank order statistic in contrast to our estimator used in Theorem which is based on the intermediate order statistic. Our contribution,

10 as compared to [0], is a much simpler proof of the asymptotic normality. However, we do not claim any novelty here. Testing for extremal independence. A test for extremal independence is considered, where the null hypothesis is formulated as follows: H 0 : There is an extremal independence vs. H : There is an extremal dependence. However, under the null hypothesis of extremal independence, the limit for the estimator of the tail dependence coefficient is degenerated and as such the estimator cannot be used directly to construct a test for extremal independence. This is the problem shared with virtually all estimators of the extremal independence - they degenerate under extremal independence. To overcome this problem, we introduce an analog of the covariance matrix, namely the extremogram matrix, whose entries depend only on the extremes. Random matrices, especially for high dimensional problems, became very popular in the last several years; see []. We use random matrices, for the first time, in a novel context of extremal independence. We study the asymptotic properties of the largest eigenvalue of the sample extremogram matrix. We show that under the null hypothesis of extremal independence, the largest sample eigenvalue converges properly normalized to the maximum of a finite number of normal random variables. As such, for the first time, we can construct a proper test for extremal independence. Organisation of the thesis. The purpose of Chapter 2 is to review some known results on regular variation, vague and weak convergence. This chapter is organized as follows, we start in Section 2. with a survey on regular variation. Important results and bounds, such as Karamata Theorem, Potter s bound, Breiman Lemma and the uniform convergence theorem, are presented. Thereafter, we introduce the concept of vague convergence and connect it to the notion of regularly varying random vectors. With the help of regular variation, we define the notion of the tail dependence coefficient. In Section 2.2 we discuss weak convergence in metric spaces. Some important results like Lindeberg condition are presented. We also introduce the notion of tightness and uniform equicontinuity using a sophisticated entropy method. The results in this chapter are not new and can be found in literature as mentioned in each section. 2

11 In Chapter 3, we discuss weak convergence of tail empirical processes of an i.i.d. regularly random vectors using the results developed in the previous chapter. The main result is Theorem We do not claim originality of this result. In Chapter 4, the nonparametric estimation of the tail dependence coefficient TDC and an hypothesis testing for extremal independence are developed. The advantage of the nonparametric approach is that it avoids any misidentification about the underlying distribution in contrast to the semiparametric or parametric approach. First, in Section 4., we discuss the estimation of the TDC by the standard approach, i.e. the estimator is obtained from the definition of the TDC where the cdfs are replaced with their empirical versions; see 4.. We prove limit theorems for the estimator with deterministic levels. We do not claim the originality of this result. The limiting result can be used to construct confidence intervals for the tail dependence coefficient. However, under the null hypothesis of extremal independence, the limit is degenerated see Corollary 4..2 and Section 4.2 and as such the estimator cannot be used to construct a test for extremal independence. This is the problem shared with virtually all estimators of the extremal independence - they degenerate under extremal independence. See also [0], Theorems 5 and 6. To avoid this drawback, we consider an analog of the covariance matrix, namely the extremogram matrix, whose entries depend only on the extremes. Its sample counterpart is obtained by plugging-in the estimators of the tail dependence coefficient. We use random matrices, for the first time, in a novel context of extremal independence. We work under the finite dimensional case, say d = 2 in Section 4.3, and an extension to arbitrary but finite dimension d 2 in Section 4.4. In both cases, we prove that the largest eigenvalue of the sample extremogram matrix converges after a proper normalization in distribution to the maximum of a finite number of independent Gaussian random variables with explicit mean and variance. Having that in hand, we are now ready to conduct an hypothesis testing by means of the distribution of the largest eigenvalue of the sample extremogram matrix. Section 4.5 deals with the estimation of the tail dependence coefficient with data based, random levels. We obtain the same limit theorems as in the deterministic levels case. The transition to random levels follows the path as in [7]. However, the transition to random matrices is based on the original author s contribution. Simulation studies are conducted in Section 4.6, while the real data analysis in Section

12 Chapter 2 Preliminaries A purpose of this chapter is to review some known results on regular variation, vague and weak convergence. This chapter is organized as follows, we start in Section 2. with a survey of regular variation. Important results and bounds, such as Karamata Theorem, Potter s bound, Breiman Lemma and the uniform convergence theorem, are presented. Thereafter, we introduce the concept of vague convergence and connect it to the notion of regularly varying random vectors. With the help of regular variation, we define the notion of the tail dependence coefficient. In Section 2.2 we discuss weak convergence in a metric space. Some important results like Lindeberg condition are presented. We also introduce the notion of tightness and uniform equicontinuity using a sophisticated entropy method. All the results in this chapter are not new and can be found in literature as indicated in each section. 2. Regular variation In this section we recall the concept of regular variation of functions and random vectors and connect it to vague convergence. With help of regular variation, we will define the tail dependence coefficient. Several examples are given. The results presented in this section are known and can be found in e.g. [] and [9]. We provide some proofs for completeness. 2.. Regularly varying functions We begin this section by introducing a basic definition of slowly varying functions followed by regularly varying functions and finally we end up with regularly varying random variables. Let l be a positive measurable function defined on [0,. Then l is called slowly varying at infinity, denoted 4

13 by SV, if ltx lim x lx =, t > 0. A positive measurable function f defined on [0, is said to be regularly varying at infinity with index α R if fx 0 for large x and ftx lim x fx = tα, t > 0. We denote by RV α the set of regularly varying functions at infinity with index α. In fact, every regularly varying function can be expressed as fx = x α lx, where l is slowly varying. Lemma 2.. The convergence in 2.. is uniform. More specifically: i if α > 0, the convergence is uniform [0, b], 0 b < ; ii if α < 0, the convergence is uniform on [b, ], 0 < b < ; iii if α = 0, the convergence is uniform on [a, b], 0 < a b <. Example 2..2 The function fx = x δ, for δ R\{0} is regularly varying at infinity with index δ. Indeed for all x, t > 0, ftx fx = txδ x δ = t δ. Similarly, functions like fx = x δ logx, f x = x δ logx β and f 2 x = x δ loglogx, δ 0, β R are also regularly varying functions. The powers of nonnegative regularly varying functions are also regularly varying, the product resp. the composition of two nonnegative regularly varying functions with indices α and α 2 is regularly varying with index α + α 2 resp. α α 2. Also, the sum two nonnegative regularly varying functions is regularly varying with index max{α, α 2 }. In other words if f, f and f 2 are positive measurable functions, then f f 2 RV α + α 2. if f RV α, f 2 RV α 2 f f 2 RV α α 2 if lim x f 2 x =. f + f 2 RV max{α, α 2 }. 5

14 if f RV α δ R, f δ RV αδ. if f RV α, then for α 0, as x fx if α > 0, fx 0 if α < Bounds and limits We briefly provide some useful and well known results such as Potter s bounds, or Karamata s theorem. Lemma 2..3 Suppose g RV α and l SV, locally bounded away fron zero and, α R \ {0}. Take δ 0,. Then there exists t 0 such that for all x > 0 and t t 0, δx α δ gtx gx + δxα+δ and δx δ ltx lx + δxδ. In what follows we write fx gx whenever lim x fx/gx =. Lemma 2..4 Karamata s theorem Let f be locally bounded on [0, +, positive and regularly varying function with index α R and let γ R. α + γ >, then t γ ft dt = and x t γ ft dt γ + α + x γ+ fx. 2 If α + γ <, then t γ ft dt < and x t γ ft dt γ + α + x γ+ fx. 3 If α + γ =, then lx = x tγ ft dt is slowly varying and x γ fx = olx. Proof: Note that the proof is not original, but we provide it in order to present the techniques. Since f RV α, then there exists a slowly varying function l such that fx = x α lx. Assume that α + γ >. Then x γ+ fx x t γ ft dt = /x s γ fsx fx ds = gsx /x gx ds, 6

15 where gx = x γ fx. The function g is regularly varying with index γ + α. If α + γ > 0, then by Lemma 2..i, gsx gx sγ+α, uniformly on [0, ]. Hence, by the uniform convergence theorem, the above integral converges to the desired result. If α + γ, 0], we cannot conclude the result directly, since the convergence gsx/gx is uniform on [a, ], for a > 0. There exists ɛ > 0 small enough so that α + γ + ɛ > 0, we have /x s γ fsx fx ds = s ɛ γ+ ɛ fsx s /x fx ds = ɛ gsx s /x gx ds, where gx = x γ+ ɛ fx RV α + γ + ɛ. Now we have gsx/gx s α+γ+ ɛ, uniformly on [0, ]. Therefore, is proven. The proof of 2 is similar Regularly varying random variables In this section we apply the concept of regular variation to random variables and their distributions. Definition 2..5 A cdf cumulative distribution function resp. right tail distribution of a random variable X denoted by F X resp. F X is defined as F X x = PX x, x R. F X x = F X x = PX > x, x R. Note that F X is also called the survival function. In what follows, we may use F or F X accordingly. Definition 2..6 A nonnegative random variable X is said to be regularly varying at infinity with index α if and only if its tail distribution function is regularly varying with index α α > 0, that is F X tx PX > tx lim = x F X x PX > x = t α. The parameter α is called the tail index. It measures the heaviness of the tail X. We will write X RV α 7

16 or F X RV α. We state important properties of regularly varying random variables. Some of these properties parallel the results in Section Lemma 2..7 Let X be a nonnegative random variable with distribution function F. If F RV α, α > 0. Then EX β < if β < α and EX β = if β > α. Moreover, lim x x 0 uβ df u x β F x = α α β if β < α lim x x 0 uβ df u x β F x = α β α if β > α. Lemma 2..8 Potter s bound Let X be a nonnegative random variable with distribution function F and RV α. Then ɛ > 0 C > : x 0, y 0, F y x F x = PyX > x PX > x C max, yɛ+α. Lemma 2..9 Breiman s Lemma Assume that X and Y are two independent nonnegative random variables. If X RV α, α > 0 and EY α+ɛ < for some ɛ > 0, then XY RV α and PY X > x lim = EY α. x PX > x In what follows, we define a quantile function. It will play an important role in the thesis. We first provide its definition, followed by the transfer of regular variation. Definition 2..0 We define the quantile function of a random variable X by F q = inf{x R : F x q}, q 0,, with the convention that the infimum of a empty set is. Let recall some of properties of the quantile functions: F is non-decreasing on 0, and left continuous; F F x x for any x R; F F q q for any q 0,. 8

17 The following function will be of interest: Qu = F, u. u For this function we have: F RV α Q RV α. We will work with random vectors with regularly varying marginals. Then, several situations can happen according to different behaviour of the marginals:. X = d Y, i.e. PX > x = PY > x = x α lx, for lx SV. 2. PX > x PY > x, i.e. PX > x = x α lx and PY > x = x α lx with lx, lx SV and lim x lx/ lx =. 3. PX > x = x α lx and PY > x = x α lx with lx, lx SV and lim x lx/ lx =. 4. PX > x = x α lx and PY > x = x β lx with α < β and lx, lx SV. We note that in the last two situations X has a heavier tail than Y, that is PX > x/py > x as x. We will not deal with such situations in the thesis, we will focus on the first two cases. We will write X d Y to indicate that the tails of X and Y are asymptotically the same Vague convergence The notion of vague convergence appears as a useful tool in studying regular variation, while weak convergence is a tool to study the limiting behavior of random variables. We summarize some useful definitions, results and properties of vague and weak convergence. Let E be a locally compact topological space with countable base; often it is safe to think of E as a finite dimensional Euclidean space or R. Let ν be a measure on E. If νk < for all relatively compact sets K E, then ν is called a Radon measure. Definition 2.. Vague convergence Let M + E be the set of all nonnegative Radon measures on E and C + K E be the set of all continuous functions f : E R + with compact support. Let ν and ν n be Radon measures on E. We say that the sequence ν n converges vaguely to ν, written ν n v ν, if f dν n f dν, for all f C + K E. 9

18 The following result gives a link between regular variation and vague convergence. Proposition 2..2 Assume that a nonnegative random variable X, with distribution function F, is regularly varying with index α. Then there exists a sequence of constants a n such that, as n, nf a n x x α and ν n := npa n X v ν, where νx, ] = x α and the convergence holds in M + 0, ], the set of all nonnegative Radon measures on 0, ] Regularly varying random vectors We begin this section by introducing the notion of multivariate regularly varying random vectors and some of their properties. Definition 2..3 An R d valued random vector X and its distribution are said to be regularly varying with index α > 0, written as X RV α, if there exists a Radon measure ν on the Borel σ-field BR d 0 of R d 0 = R d \{0} such that Px X P X > x v ν, as x, where is a vector norm in R d. The limiting measure is homogeneous, νxc = x α νc for any x > 0 and all Borel sets C R d 0 bounded away from zero. This definition is equivalent to the sequential definition of regular variation: there exists a scaling sequence c n such that npc n X v ν on R d \{0}. Example 2..4 If X and Y are i.i.d. nonnegative regularly varying random variables such that X d Y, then X = X, Y is a regularly varying vector. Indeed, if A is of the form A = [0, s] [0, t] c R 2 + \ {0, 0}, then the limiting measure is given up to a constant by νa = s α + t α. 0

19 Indeed, PX, Y xa PX > x = = PX > sx or Y > tx PX > x PX > sx PY > tx + PX > x PX > x PX > sx PY > tx. PX > x Since X and Y are RV α, we have as x, PX > sx PX > x s α, PY > tx PX > x t α, PX > sx PY > tx 0. PX > x Therefore PX, Y xa PX > x ν A = s α + t α, as x. 2. Hence, the result follows. The latter result means that the limiting measure is concentrated on the axis. In other words, X and Y cannot be big at the same time. Furthermore, for any norm, we have P X > x PX > x ℵ, as x, where ℵ is any constant. Indeed, if e.g. X = X = maxx, Y, then P X > x = PmaxX, Y > x = PX > x or Y > x. So dividing both sides of the above equation by PX > x gives us P X > x PX > x = PX > x or Y > x PX > x 2, as x, using 2. with s, t =, Tail dependence coefficient TDC When dealing with joint extreme events, one of the most important tools used for this purpose is the notion of tail dependence. Definition 2..5 Let X, Y be a bivariate random vector with marginal distributions F X and F Y. We define the tail dependence coefficient TDC between X and Y by λ X,Y TDC = lim p 0 PX > FX p, Y > F Y p p [0, ], 2.2

20 where F X and F Y denote the quantile functions of X and Y, respectively. If there is no risk for misinterpretation, we will write λ TDC instead of λ X,Y TDC. Note that λ TDC plays an important role since it quantifies the extremal relation between the variables. In other words, the bigger λ TDC is, the stronger is the extremal dependence. We say that: if λ TDC = 0, then there is no extremal dependence between X and Y. We say that X and Y are asymptotically extremally independent; if λ TDC 0, ], then there is extremal dependence between X and Y. We say that X and Y are asymptotically extremally dependent. We note that 2.2 is equivalent to λ TDC = lim PU > p, V > p, 2.3 p 0 PU > p where U = F X X and V = F Y Y are standard uniform random variables. If we assume further that X and Y are s.t. X d Y, then 2.3 can be rewritten as follows: λ TDC = lim x PX > x, Y > x. 2.4 PX > x Indeed, if we assume for simplicity that F X and F Y are continuous and strictly increasing, then we have λ TDC = lim PU > p, V > p p 0 PU > p = lim p 0 PF X U > F X ppf X U > F X p, F X V > F X p = lim p 0 PF X U > F X ppf X U > F X p, F Y V > F Y p X d Y = lim p 0 PFX U > x p PF X U > x p, FY V > x p = lim PX > x, Y > x. x PX > x x p = F X p The formula 2.4 also makes sense when X and Y are asymptotically the same. Therefore, in this thesis, we will work with 2.4. Example 2..6 Assume that X and Y are i.i.d. X d Y. Then λ TDC = 0. Indeed, nonnegative regularly varying random variables and 2

21 PX > x, Y > x λ TDC = lim = lim PY > x = 0. x PX > x x See the Figure 2. for an example where X and Y are independently drawn from Pareto distribution. We can see that the large observations do not occur jointly. Figure 2.: Simulated samples of size 0000 of two i.i.d. Pareto random variables Example 2..7 Assume that Z i, i =, 2, 3, are i.i.d. nonnegative regularly varying random variables with the same index α, i.e. PZ i > x = x α lx, i =, 2, 3, where lx is a slowly varying function. Define X = Z + Z 2 and Y = Z 2 + Z 3. Then λ TDC = 2. Note first that PX > x lim = 2, i =, 2, x PZ i > x In fact, observe first that {Z > x} {Z 2 > x} {Z + Z 2 > x} thus PZ + Z 2 > x PZ > x PZ > x or Z 2 > x PZ > x 2. 2, as x. On the other hand, for ɛ 0,, 2, we have 3

22 {Z + Z 2 > x} {Z > x ɛ} {Z 2 > x ɛ} {Z > xɛ, Z 2 > xɛ} therefore, PZ + Z 2 > x PZ > x PZ > x or Z 2 > x or {Z > xɛ, Z 2 > xɛ} PZ > x 2. 2, as x. Leading to the desired result. Coming back to our example, let δ 0,, we have PX > x, Y > x PX > x = PZ + Z 2 > x, Z 2 + Z 3 > x PX > x = PZ + Z 2 > x, Z 2 + Z 3 > x, Z 2 > δx PX > x + PZ + Z 2 > x, Z 2 + Z 3 > x, Z 2 δx PX > x PZ 2 > δx PX > x + PZ > x δ PZ 3 > x δ PX > x PZ 2 > δx PX > δx PX > δx PX > x + PZ > x δ PX > x δ PX > x δ PZ 3 > x δ PX > x 2 δ α + 2 δ α 0, as x. Hence lim lim PX > x, Y > x δ x PX > x On the other hand, PX > x, Y > x PX > x = PZ + Z 2 > x, Z 2 + Z 3 > x PX > x PZ 2 > x PX > x 2.5, as x Combining 2.6 and 2.7, we get λ TDC = 2. Figure 2.2 shows the graph of X and Y defined in this example, where Z i are drown independently from Pareto distribution. We can see that some of the large values X and Y coincide. Example 2..8 Let Z, Z 2 and V be independent nonnegative random variables such that Z, Z 2 4

23 Figure 2.2: Simulated samples of size 0000 of two dependent random variables RV α, Z d = Z2, EV α+ɛ <, for some ɛ > 0. Define X = V Z and Y = V Z 2. Then the tail dependence coefficient vanishes, λ TDC = 0, which just means that X and Y are dependent but asymptotically independent. In fact, note first that by Breiman s Lemma Theorem 2..9, X and Y are RV α: PX > x EV α PZ > x, PY > x EV α PZ 2 > x. 5

24 We have PX > x, Y > x PV Z > x, V Z 2 > x λ TDC = lim = lim x PX > x x PX > x PV Z > x, V Z 2 > x = lim x EV α PZ > x E PV Z > x, V Z 2 > x V = lim x EV α PZ > x F Z x V = lim x = EV α E F Z x F Z x V EV α E F Z x V lim x x F Z x lim F Z x V = EV α EV α 0 = 0. Hence λ TDC = 0. Note that the argument used in order to exchange the limit and the expectation comes from the fact that Z, Z 2 are regular varying, and by application of Potter s bound Theorem 2..8 and the dominated convergence theorem, using the assumption EV α+ɛ <. Example 2..9 Let Z, Z 2 and V be independent nonnegative random variables such that Z, Z 2 RV α, Z d = Z2, V RV β with β < α. This means that V has a heavier tail than Z and Z 2, that is PV > x PZ > x and EV α+ɛ =, for all ɛ > 0. We also note that EZ β+ɛ < for some ɛ > 0. Define X = V Z and Y = V Z 2. Then the tail dependence coefficient λ TDC 0. By Breiman s Lemma PX > x EZ β PV > x, PY > x EZβ 2 PV > x. 6

25 Thus, we have PX > x, Y > x PV Z > x, V Z 2 > x λ TDC = lim = lim x PX > x x PX > x E PV Z > x, V Z 2 > x Z, Z 2 E PV > x Z = lim x EZ α = lim, V > x Z 2 Z, Z 2 PV > x x EZ α PV > x E P V > x Z Z 2 Z, Z 2 = lim x EZ α PV > x = lim x F EZ α E V x Z Z 2 F V x = EZ α E F V x Z lim Z 2 x F V x C = EZ α E β = Z Z 2 EZ α E Z Z 2 β 0. Hence λ TDC 0, which implies asymptotic dependence between X and Y. Note that C comes from the fact that F V x Z Z 2 β, x, F V x Z Z 2 and by application of Potter s bound and the dominated convergence theorem we can exchange the expectation with the limit, on account of EZ β+ɛ <. Remark From these couple of examples, we have learned that independence or dependence does not always imply extremal independence or dependence. 2.2 Weak convergence in metric spaces In this section we discuss weak convergence in a metric space. Some important results like Lindeberg condition are presented. We also introduce the notion of tightness and uniform equicontinuity using a sophisticated entropy method. These results are known from the literature and can be found for example in [2], [2] and [9, p.57-00]. sets. Let S, d be a complete, separable metric space, equipped with the Borel σ-field F, generated by open Definition 2.2. Weak convergence Let P n, n 0, be a sequence of probability measures on S. We 7

26 say that P n converges weakly to P, written P n P, if P n f Pf, for every f CS, the set of bounded, continuous function f mapping from S to R. ξ is called a random element of S if it is a measurable function from a probability space Ω, B, P into a metric space S, F, d. Furthermore, if S equals R, C, R n, n > 0, then ξ is called a random variable, a random function and a random vector respectively. Definition Convergence in distribution Let ξ n be a sequence of random elements with values in a metric space S, F, defined on possibly different probability spaces Ω n, B n, P n. We say that ξ n converges in distribution to ξ on Ω, B, P, if Efξ n Efξ, for every f CS. We write ξ n d ξ. We note that the definitions above are virtually the same. The second one describes weak convergence of random elements ξ n with laws P n. The following theorem gives a characterization of weak convergence. Theorem Skorokhod representation theorem Let ζ, ζ 2,... be a random elements in S, d such that ζ n d ζ. Then there exists a random elements ξ d = ζ, ξn d = ζn, n N such that ξ n a.s ξ, on a suitable probability space. Note that this theorem is interesting since it allows one to replace the convergence in distribution with the convergence almost surely Lindeberg s condition Assume that for each n, the random elements ξ n,,..., ξ n,mn, is independent. Suppose that m n Eξ n,j = 0, Eξn,j 2 = σn,j 2 < +, and s 2 n = j= σ 2 n,j. If for any ɛ > 0 then lim m n n + s 2 j= n E ξ n,j 2 I { ξn,j >ɛs n} = 0, 2.8 m n S n /s n = ξ n,j /s n N0,. j= 8

27 There is a stronger condition than Lindeberg s, called the Lyapunov condition, which says that if the moments of order 2 + δ, for some δ > 0 exists, then lim n + j= m n E ξ n,j 2+δ = 0. Remark If Lyapunov s condition is true, then so is Lindeberg s condition. In fact for any ɛ, δ > 0 such that Y > ɛ, we have Y 2 I { Y >ɛ} Y 2+δ ɛ δ = EY 2 I { Y >ɛ} < ɛ δ E Y 2+δ. If E ξ n,j 2+δ < +, then the sum in 2.8 is at most ɛ δ s 2+δ n m n E ξ n,j 2+δ. j= Tightness via entropy Entropy is a useful tool to prove tightness of a sequence of independent random processes. The method is dimension free. Let G be a class of functions. Let ξ n, n be a sequence of random elements mapping from Ω, F, P to l G, the set of bounded functions indexed by G, endowed with the norm F G = sup g G F g, where F is a map defined in l G. Example Let X j, j be a standard uniform i.i.d. random variables. Define ξ n t = n n n I {Xj t} t. j= Then ξ n are random elements in l G, where G is the class of indicators of [0, t], t [0, ]. Definition Let ξ n be a sequence of random maps indexed by a class of function G, with values in l G endowed with a semi-metric ρ induced with a supremum norm: The sequence is asymptotically tight if for each ɛ > 0, there exists a compact set K l G such that lim sup Pξ n / K δ < ɛ, for every δ > 0 n 9

28 where K δ = {g l G : ρg, K < δ}. The sequence is asymptotically uniformly ρ-equicontinuous in probability if for every ɛ, η > 0, there exists δ > 0, such that lim sup P n sup g,f G: ρg,f<δ ξ n f ξ n g > ɛ < η. For the purpose of this thesis, these two concepts are in fact equivalent. Proposition A sequence ξ n is asymptotically tight if and only if ξ n g is tight in R for every g G and there exists a semi-metric ρ on G such that G, ρ is totally bounded and ξ n is asymptotically uniformly ρ-equicontinuous in probability. Let ξ n,j f, f G be independent stochastic processes indexed by a common semi-metric space G, ρ Define the random semi-metric by d n f, g = n j= ξ n,j f ξ n,j g 2. The following theorem is a simplified version of Theorem 2.. in [2]. Theorem For each n, let ξ n,,..., ξ n,mn be independent stochastic processes with finite second moments indexed by a totally bounded semi-metric space G, ρ. Assume that i The maps x,..., x mn sup ρf,g<δ m n e j ξn,j f ξ n,j g r, r =, 2, j= are measurable for every δ > 0, every vector e,..., e mn {, 0, } mn and every n N. ii For every ɛ > 0, iii It holds lim ɛ 0 lim m n n + j= lim sup n + sup E ξ n,j 2 GI { ξn,j G >ɛ} = m n f,g:ρf,g<ɛ j= 2 E ξ n,j f ξ n,j g = iv For every ɛ > 0, 20

29 lim lim P δ 0 n + δ 0 log Nɛ, G, d n dɛ > ɛ = 0, 2. where Nɛ, G, d n is the minimal number of balls {g : d n g, f < ɛ } of radius ɛ needed to cover the set G. Then the sequence m n ξ n,j Eξ n,j j= is asymptotically ρ-equicontinuous and hence tight. The latter condition condition iv is not easy to check due to the random metric d n. However, the spaces G we are going to consider are linearly ordered and then this condition is satisfied for free Auxiliary results The following result describes convergence of inverses. Lemma Vervaat lemma - convergence of inverses i Suppose that ξ 0 is a continuous function and ξ j, j are a decreasing functions on [a, b]. Moreover, let g be defined on [a, b] with a positive derivative g. Let γ n be a sequence of positive values such that γ n 0 as n goes to infinity and uniformly on [a, b]. Then ξ n s gs γ n ξ 0 s, n +, ξ n s g s γ n g sξ 0 g s, n +, uniformly on [ga, gb], where g, ξ n any way consistent with the monotonicity. are inverse functions right- or left-continuous or defined in ii If H n, n 0, are nondecreasing functions on R with range on [a, d] such that H n H, then H n H. 2

30 Chapter 3 Weak convergence of the tail empirical process In this section we discuss weak convergence of tail empirical processes of an i.i.d. regularly random vectors. The main result is Theorem Although the result has not been stated in the literature in its form, we do not claim any originality, since it deals with i.i.d. random vectors. The proof is provided for completeness. 3. Tail empirical process Let X j, Y j, j =..., n, be an i.i.d. sequence of random vectors sampled from X, Y. We assume for simplicity that X d Y. Let F be the marginal distribution of X. Let u n be a non decreasing sequence such that u n + and nf u n +. For s 0 0,, define β X,Y n s = nf u n n I {Xj >su n,y j >su n}, s s 0, which will be called the tail empirical function. Furthermore, define j= β n X,Y βx,y s = E n s = PX > su n, Y > su n PX > u n 3. and β X,Y s = lim n + βx,y n s. 3.2 We assume that the limit in 3.2 exists which is certainly true under bivariate regular variation. 22

31 Remark 3.. Note that if we let s = in 3.2, then β X,Y = λ TDC, is the tail dependence coefficient between X and Y, defined in Section 2..6; cf Definition 3..2 We define the tail empirical processes } { βx,y G X,Y n s = nf u n n s β n X,Y s 3.3 Hn X s = G X,X n s = nf u n δ n X s δn X s, s s 0, 3.4 H Y n s = G Y,Y n s = nf u n δ Y n s δ Y n s, s s 0, where δ X n s = nf u n n I {Xj >su n}, δy n s = j= nf u n n I {Yj >su n}, 3.5 j= and δ Y s = δ X s = lim n + δx n s = lim E[ δ X n + n s] = s α. 3.6 The main result of this section is the following weak convergence result. The result is not new, but we provide a proof for completeness. Theorem 3..3 Let X j, Y j, j =,..., n be i.i.d. regularly varying random vectors such that X j d Yj. Let F be a distribution function of X. nf u n +. Then Let u n be a non decreasing sequence such that u n + and G X,Y n s G X,Y s, n +, 3.7 H X n s H X s, H Y n s H Y s, n +, 3.8 in l [s 0, + endowed with the sup-norm. The convergences hold jointly and G X,Y, H Y and H X are Gaussian processes with covariance functions given respectively by CovG X,Y s, G X,Y s 2 = β X,Y s s And CovH Y s, H Y t = CovH X s, H X t = s t α

32 3.. Proof of Theorem 3..3 Fix s 0 0,. Let s [s 0, +, define K n,j s = nf u n nf u n I {Xj >su n,y j >su n} PX j > su n, Y j > su n, j =,..., n. Then we have G X,Y n s = n K n,j s = j= βx,y nf u n n s β n X,Y s. The proof requires establishing the finite dimensional convergence and the tightness. For the former, we use the Lindeberg conditions. We first calculate the limiting covariance. Lemma 3..4 Under the conditions of Theorem 3..3, lim n + CovGX,Y n s, G X,Y n s 2 = β X,Y s s 2, 3. lim n + VarGX,Y n s = β X,Y s. 3.2 Proof: We have CovG X,Y n n s, G X,Y n s 2 = Cov K n,j s, = Cov nf u n n nf u n j= n j= j= n K n,j s 2 j= I {Xj >s u n,y j >s u n} PX j > s u n, Y j > s u n, I {Xj >s 2 u n,y j >s 2 u n} PX j > s 2 u n, Y j > s 2 u n = F u n E I {X >s u n,y >s u n}i {Xj >s 2 u n,y j >s 2 u n} o β X,Y s s 2, as n, where o stands for PX > s u n, Y > s u n PX > s 2 u n, Y > s 2 u n. F u n Hence 3. and 3.2 hold for all s, s, s 2 [s 0, +. The finite dimensional convergence is stated in Proposition

33 Proposition 3..5 Under the conditions of Theorem 3..3 we have for any finite set {s,, s k } R, G X,Y n s,..., G X,Y n s k d N 0, Σ, where the limiting covariance matrix is given by Σ k k = [ β X,Y s i s j ] k i,j=. This result also implies the convergence of the margins, in distribution, to a centered normal random variable with limiting cavariances defined in Lemma Proof: Note first that EK n,j s = 0. Next, we show that, as n +, n j= By Hölder s inequality for each j, E Kn,jsI 2 { Kn,j s >ɛ} 0, ɛ > 0. E Kn,jsI 2 { Kn,j s >ɛ} E /2 K 4 n,jse /2 I { Kn,j s >ɛ} E /2 K 4 n,jsp /2 K n,j s > ɛ. Since EK 4 n,js = 4 n 2 F 2 u n E I {Xj >su n,y j >su n} PX j > su n, Y j > su n C PX j > su n, Y j > su n n 2 F 2, for some C > 0. u n We have E /2 Kn,js 4 C /2 P/2 X j > su n, Y j > su n. nf u n Also, by Markov inequality, we have, This implies that P /2 K n,j s > ɛ E/2 Kn,j 4 s ɛ 2 ɛ 2 C/2 P/2 X j > su n, Y j > su n. nf u n n j= E Kn,jsI 2 {Xj >su n,y j >su n} C PX > su n, Y > su n. ɛ 2 nf u n F u n 25

34 Therefore, n j= E Kn,jsI 2 {Xj >su n,y j >su n} 0, as n +. proof. Consequently, the Lindeberg condition holds. This, together with the Cramer-Wold device, finishes the Next, we prove tightness. Proposition 3..6 Under the conditions of Theorem 3..3, the sequence {G X,Y n, n } is tight. Proof: By Theorem 2.2.8, it is enough to check 2.9 and 2.0. indicator functions G = {f s = I {s,+ }, s s 0 }. Consider the case of the class of the Define for every f s G, Z n,j f s = Z n,j s = I {Xj >su n,y j >su n}, K n,j s = Z n,j s EZ n,j s nf u n and the semi-metric ρ by ρf s, f t = ρs, t = 3 β X,X s β X,X t + 3 β Y,Y s β Y,Y t = 6 β X,X s β X,X t, where β X,X s and β Y,Y s are defined in 3.2. The convergence in 2.9 follows straightforwardly from the application of Lindeberg s condition as in the proof of Proposition 3..5, since Z n,j 2 G = I {X j >s 0 un,y j >s 0 un}. nf u n For 2.0, note first that for t > s s 0 we have I {Xj >su n,y j >su n} I {Xj >tu n,y j >tu n} I {sun<xj <tu n} + I {sun<yj <tu n}. By the above inequality, n j= E [Z n,j s Z n,j t] 2 F u n E [I {sun<xj <tu n} + I {sun<yj <tu n}] 2 { 3 β X,X n } s β n X,X t + β n Y,Y s β n Y,Y t. 26

35 Fix ɛ > 0, then since as n goes infinity, β n X,Y s β X,Y s, lim δ 0 lim sup n + sup ρs,t<δ j= n E[Z n,j s Z n,j t] 2 ɛ + lim sup δ 0 ρs,t<δ ρs, t < ɛ which leads to the desired result since ɛ is arbitrary. 27

36 Chapter 4 Estimation and testing for extremal Independence This chapter deals with the nonparametric estimation of the tail dependence coefficient TDC and an hypothesis testing for extremal independence. The advantage of the nonparametric approach is that it avoids any misidentification about the underlying distribution in contrast to the semiparametric or parametric approach. In Section 4., we discuss the estimation of the TDC by the standard approach, i.e. the estimator is obtained from the definition of the TDC where the cdfs are replaced with their empirical versions; see 4.. We prove limit theorems for the estimator with deterministic levels. The limiting result can be used to construct confidence intervals for the tail dependence coefficient. However, under the null hypothesis of extremal independence, the limit is degenerated see Corollary 4..2 and Section 4.2 and as such the estimator cannot be used to construct a test for extremal independence. This is the problem shared with virtually all estimators of the extremal independence - they degenerate under extremal independence. See [0], Theorems 5 and 6. To avoid this drawback, we consider an analog of the covariance matrix, namely the extremogram matrix, whose entries depend only on the extremes, i.e. the tail dependence coefficient. Random matrices, especially for high dimensional problems, became very popular in the last several years; see []. We use them, for the first time, in a novel context of extremal independence. We work under the finite dimensional case, say d = 2 in Section 4.3, and an extension to arbitrary but finite dimension d 2 in Section 4.4. In both cases, we prove that the largest eigenvalue of its sample counterpart converges in distribution to the maximum of two independent Gaussian random variables with 28

37 explicit mean and variance. Having that in hand, we are now ready to conduct an hypothesis testing by means of the distribution of the largest eigenvalue of the sample extremogram matrix. Estimators of the tail dependence coefficient with data based, random levels, are presented in Section 4.5. We obtain the same limit theorems as in the deterministic levels case. Simulation studies are conducted in Section 4.6. Note that the transition from deterministic levels to random levels follows the path as in [7]. However the extension to random matrices is a new idea and is the original author s contribution. 4. Estimation of Tail Dependence Coefficient TDC Let X j, Y j, j =..., n, be an i.i.d. sequence of regularly varying nonnegative random vectors sampled from X, Y. Assume that X d Y. Recall that cf. 3. β X,Y n s = PX > su n, Y > su n PX > u n and β X,Y s = lim n + βx,y n s, where u n as n. Our goal is to estimate β X,Y s, using the empirical estimate defined as: β X,Y n s = n j= I {X j >su n,y j >su n} n j= I. 4. {X j >u n} From the theoretical point of view, the estimation result will be stated for s [s 0, +, s 0 0, however, the main interest is the estimation when s is in the neighbourhood of. We first need to find the asymptotic behavior of βx,y nf u n n s β X,Y s. 4.2 To do so, the approach is to use the tail empirical process G X,Y n defined in 3.3 and express the process in 4.2 as a function of G X,Y n. The following theorem is an immediate consequence of Theorem Theorem 4.. Assume that X j, Y j, j =,..., n are i.i.d. random variables such that X j d Yj. Assume moreover that regularly varying vectors of nonnegative β X,Y lim nf u n sup n + n s β X,Y s = s s 0 Then βx,y nf u n n s β X,Y s G X,Y s := G X,Y s β X,Y sh X 4.4 in l [s 0,. 29

38 Note that by Proposition 3..5, we have for any fixed s s 0, s 0 0,, the convergence in 4.4 holds in distribution and the limiting distribution is a centrered normal with variance defined in 4.3. This can be used to construct asymptotic confidence interval for the corresponding estimator. Proof: We have, n nf u n β n X,Y s β X,Y j= s = nf u n I {X j >su n,y j >su n} n j= I β X,Y s {X j >u n} n j= I{Xj >su n,y j >su n} PX > su n, Y > su n = nf u n n j= I + npx > su n, Y > su n n {X j >u n} j= I β X,Y s {X j >u n} = = nf u n n j= I {X j >u n} nf u n j= npx > su n, Y > su n + nf u n n j= I β X,Y s {X j >u n} n I{Xj >su n,y j >su n} PX > su n, Y > su n nf u n n j= I G X,Y npx > su n, Y > su n n s + nf u n n {X j >u n} j= I β X,Y s {X j >u n} = A n G X,Y n s + B n, where A n = nf u n n j= I {X j >u n} and B n = npx > su n, Y > su n nf u n n j= I β X,Y s. {X j >u n} Since G X,Y n s G X,Y s, we have nf u n n I{Xj >su n,y j >su n} PX > su n, Y > su n P 0, j= uniformly on compact subsets of [s 0, +, s 0 0, where P means convergence in probability. This also implies n j= I {X j >su n,y j >su n} nf u n β X,Y s, 4.5 in probability for each s. If we let s = and replace Y by X, then 4.5 gives n j= I {X j >u n} nf u n P. 30

39 By the continuous mapping theorem, the reciprocal A n = nf u n n j= I {X j >u n} P. 4.6 Also by Slutsky s theorem, convergence of G X,Y n and 4.6 give A n G X,Y n s G X,Y s. 4.7 Now let us have a look at B n. We have B n = = npx > su n, Y > su n nf u n n j= I β X,Y s {X j >u n} nf u n nf u n n j= I {X j >u n} PX > su n, Y > su n F u n PX > su n, Y > su n + nf u n β X,Y s F u n Since G X,Y n s G X,Y s, for all s s 0, then in particular, for s =, n Hn X := G X,X j= n = nf u n I {X j >u n} d G X,X =: H X, nf u n n Hn Y := G Y,Y n j= = nf u n I {Y j >u n} d G Y,Y =: H Y, nf u n jointly with the convergence of G X,Y n. to G X,Y. Application of the delta method gives nf u n nf u n n j= I {X j >u n} d H X. Therefore, B n β X,Y sh X. 4.8 Hence, combining 4.8 and 4.7, bearing in mind the joint convergence, we have βx,y nf u n n s β X,Y s G X,Y s β X,Y sh X

40 Properties of the limiting process G X,Y s First, we recall from Lemma 3..4 that Cov G X,Y s, G X,Y s 2 = β X,Y s s 2. Using the same method as in the proof of Lemma 3..4 and letting Θ n x, y = lim n Θ n x, y, then we can further conclude that PX>xun,Y >yun F u n and Θx, y = Cov G X,Y s, G X,X s 2 Cov G X,Y s, G Y,Y s 2 = Cov G X,Y s, H X s 2 = Cov G X,Y s, H Y s 2 = Θs s 2, s, 4.0 = Θs, s s This together with the fact that X d Y, implies that for s [s 0, +, Cov Cov s, G X,Y s G X,Y G X,Y, G X,Y = β X,Y s 2Θs, s + β X,Y s. 4.2 = β X,Y β X,Y. 4.3 Thus, we have a very important corollary which shows that we cannot construct a test for extremal independence using the proposed estimator of the tail dependence coefficient see also Section 4.2. Corollary 4..2 Under the extremal independence, G X,Y s is degenerated, that is VarG X,Y s = Moreover, Cov Cov Cov Cov G X,Y G X,Y s, G X,X s, G Y,Y G Y,Y s, G Y,Y G X,X s, G Y,Y s s s s = β X,Y s s α + s α β X,Y s Θs, s. 4.5 = β X,Y s Θ, s + s α β X,Y sβ X,Y Θs,. 4.6 = Cov G X,X s, G X,X s = s α 2s α + s α. 4.7 = β X,Y s Θ, s + s α β X,Y s Θs, s. 4.8 Therefore regardless whether we have the extremal dependence or independence, Cov G X,Y, G X,X = Cov G X,Y, G Y,Y = 0. 32

41 Confidence intervals βx,y From Theorem 4.., we know that nf u n n β X,Y converges in distribution to G X,Y { } N 0, β X,Y β X,Y, on account of Proposition Hence, we can construct the 00 θ% an asymptotic confidence interval for β X,Y as βx,y n Z θ/2 ; βx,y n + Z θ/2, nf u n nf u n where Z θ/2 is the quantile of G X,Y = G X,Y β X,Y H X, i.e. PG X,Y > Z θ/2 = θ/2. Recalling that G X,Y is a centered normal random variable with the variance given in 4.3 and replacing therein β X,Y s with constructed as βx,y n z θ/2 βx,y n nf u n β X,Y n ; βx,y n + z θ/2 X,Y β n s, the confidence interval can be X,Y n β n, 4.9 nf u n βx,y where z θ/2 is the standard normal percentile. Example 4..3 [Example 2..7 continued] We simulate 000 observations from the model given in Example 2..7 where Z j s are independent Pareto with α = 4 and X = Z + Z 2, Y = Z 2 + Z 3. In this case, the tail dependent coefficient is β X,Y = 0.5. Figure 4. shows the plot of the confidence intervals against the threshold u n = n δ which is such that n F u n = n αδ +, that is δ 0, /α. Note that the true confidence interval with β X,Y = 0.5 and the estimated confidence interval are almost not distinguishable. 4.2 Testing for Extremal Independence using TDC In this section we indicate problems related to testing for extremal independence between two random variables using their tail dependence coefficient. 33

42 Figure 4.: Asymtotic confidence intervals for the tail dependence coefficient for Example Blue line: X,Y the estimator β n plotted against different choices of the threshold u n ; red dashed lines - confidence interval using 4.9; yellow line - confidence interval using the true value of β X,Y. The test for independence would have the form: H 0 : β X,Y = 0 vs H : β X,Y 0, where β X,Y s = lim n + PX > su n, Y > su n. PX > u n To conduct the test, we could use estimator β X,Y n defined in 4.. However, under H 0 the limit of the X,Y β n of β X,Y is degenerated, hence we cannot construct a formal test; see Corollary On the other hand, we can test for specific values of the tail dependence coefficient: H 0 : β X,Y = β 0 vs H : β X,Y β 0. If β 0 0, then to conduct the test we can use the confidence interval 4.9. But we have some issues with this approach since: the distribution F is unknown and u n is arbitrary; 34

43 we cannot extend this procedure to higher dimension, the problem we will encounter here is the multiple testing which involves the Type I errors or False positive which is the probability of rejecting H 0 while it is true. For example, assume that we want to perform 50 tests at the level of significance θ = 5% that is for each test we have 5% chance of making a Type I errors. If all the null hypothesis are true, then the expected number of Type I error is 2.5. Moreover, if all tests are assumed to be independent, then we have a binomial distribution with n = 50 and p = 5%, therefore Pat least one Type I error = P no Type I error = = 92.3%. So, as we can see, with 50 tests, we have 92.3% chance of having at least one Type I error. The first issue is addressed using radom levels, while the second issue is tackled using random matrices approach. As we will see, the latter approach will allow us to construct a formal test for extremal independence. 4.3 Extremogram matrix: bivariate case The covariance matrix of a random vector is a standard object summarizing all the dependence between several its components. Since the tail dependence coefficient describes extremal dependence, we can think of introducing an extremal counterpart to the covariance matrix, namely the extremogram matrix we borrow the extremogram terminology from [3]. The section is organized as follows. We first define the extremogram matrix and its sample counterpart. Then we prove, under the null hypothesis of the extremal independence, that the distribution of the largest eigenvalue of the sample extremogram matrix converges to the maximum of two independent Gaussian random variables; see Theorem Then, we apply the limiting result to construct a test for independence. Theory is illustrated by simulated data. The material presented in this section is the author original contribution. Let X, Y be a random vector. Definition 4.3. Extremogram matrix We define the extremogram matrix by s = βx,x s β Y,X s β X,Y s β Y,Y s

44 It is an analog of the covariance matrix but it depends only on extreme values. If s =, then we have := β X,Y β Y,X. A sample counterpart to the covariance matrix is the sample covariance matrix. We extend this idea to the extremal dependence. Definition Sample extremogram matrix We call the sample extremogram matrix, denoted by n s, the matrix whose entries are estimators of the tail dependence coefficients, i.e. n s = X,X β n s βx,y n β n Y,X s βy,y n s s, 4.2 where β U,V n s = n j= I {U j >su n,v j >su n} n j= I {U j >u n} with U, V equal X or Y. We note that the extremogram matrix and the sample extremogram matrix are symmetric. The next goal is to establish the asymptotic distribution of the eigenvalues of the sample extremogram matrix. Why random matrices are important? Before we continue, let us give a brief motivation. Attention are given to spectral properties of large dimensional random matrices known as Random Matrix Theory RMT because of their interesting properties and statistical applications. The use of the limiting properties of eigenvalues originates from quantum mechanics where they are utilized to describe energy levels of particles in a large system and also serve as finite dimensional approximation of infinite dimensional operator. From statistical point of view, they may be used to correct traditional tests or estimators which fail in large dimension. Furthermore, in the principal component analysis PCA, the first k principal components correspond to the k largest eigenvalues of the sample covariance matrix e.g. [6], [], [4] Asymptotics for sample extremogram matrices By 4.4, we have nf u n vec n s vec s vechs

45 where, Hs = s G X,Y s, 4.23 s G Y,Y s GX,X G Y,X G U,V s = G U,V s β U,V sh U. and veca is a vector obtained by stacking the columns of the matrix A on top of one another. Before we prove the result on limiting distribution for the sample eigenvalues, we briefly introduce the operator norm of matrices. Operator norms. Let Σ = [σ ij ] d i,j= Rd d and be a norm in R d. Matrix norms are defined as follows: Σ 2 = max{ Σx : x = } and Σ = max i d j= d σ ij It holds that Σ 2 2= max i d δ i and Σ 2 Σ, 4.25 where δ i are eigenvalues of Σ. Recall the Weyl inequality for two matrices Σ, Γ: if δ i and γ i, i =,, d are ordered eigenvalues of Σ and Λ, respectively. Then max δ i γ i Σ Γ 2. i d 4.3. Asymptotics for eigenvalues under extremal independence Again, the goal is to test for extremal independence between X and Y. Therefore, we test H 0 : β X,Y = 0 vs H : β X,Y 0. Limiting distribution of the sample eigenvalues under the null hypothesis. Under the null hypothesis we have n s = X,X β n s βx,y n β n Y,X s βy,y n s s, n = βx,y n β Y,X n, 37

46 s = s α 0 0 s α, = 0 0. We recall that under H 0 the process G X,Y is degenerated. Therefore under H 0 the matrix H defined in 4.23 becomes Hs = GX,X s 0 0 G Y,Y s We also recall that for s = we have G X,X = G Y,Y = 0, hence it is important to keep s above. In summary,. Corollary Assume that the conditions of Theorem 4.. are satisfied. Then, under H 0 nf u n β n X,X s s α β n X,Y s β n Y,X s β n Y,Y s s α = G X,X s 0 0 G Y,Y s. We recall that s = leads to the tail dependence coefficient. However, if s = then nf u n 0 β X,Y n β Y,X n 0 = Hence, under H 0, the limiting distribution does not provide any valuable information for hypothesis testing. To avoid this drawback, we have to consider s or the supremum over a given set. Let I be a compact set, I R + such that I. Then we have the following result. Theorem Assume that s. Let λ n s λ n2 s be the ordered eigenvalues of n s and λ s = λ 2 s = s α be the eigenvalues of s. Then, under H 0 and if the conditions of Theorem 4.. hold, we have Proof: nf u n λ n s λ s max G X,X s, G Y,Y s. 38

47 Define a diagonal matrix ns whose diagonal elements are those of n s, that is ns = β X,X n s 0 0 βy,y n s. Then, it holds by Corollary that nf u n vec n s ns = nf u n 0 β n X,Y s β n Y,X s 0 = Therefore, we should expect that the norm for n s ns goes to zero. In fact, Lemma Under the conditions of Theorem 4.3.4: Proof: We have by 4.24 that sup s I sup s I sup nf u n n s ns = o p. s I nf u n n s ns = sup nf u n β X,Y n s + sup s I s I nf u n max nf u n β Y,X n s = o p, β n X,Y s, Y,X β n s on account of Corollary By 4.25 and Lemma we have nf u n n s ns 2 = o p, 4.26 that is nf u n λ ni s λ ni s = o p, where λ n s λ n2 s are the ordered eigenvalues of ns. That is λ X,X Y,Y n s = max β n s, β n s and λ s = maxs α, s α = s α. λ X,X Y,Y n2 s = min β n s, β n s and λ 2 s = s α. 39

48 We continue with the proof of Theorem By application of Corollary to ns and 0 s, we get and We now want to obtain the limiting distribution of we deduce that nf u n λ n s λ s = nf u n β n X,X s s α G X,X s 4.27 nf u n β n Y,Y s s α G Y,Y s max nf u n λ n s λ s. From 4.27 and 4.28, β n X,X s s α, G X,X s, G Y,Y s nf u n max β Y,Y n s s α Hence, combining 4.26 and 4.29, it holds nf u n λ n s λ s max G X,X s, G Y,Y s, which leads to the desired result Testing for Extremal Independence using random matrices Define the test statistic for s [s 0, +, s 0 0, as T s = nf u n λ n s λ s Then under H 0 and for any fixed s, T s has asymptotically the same distribution as Ms = max G X,X s, G Y,Y s. We know that G X,X s and G Y,Y s are jointly normal random variables. By 4.8, under H 0, the random variables are uncorrelated for s s 0. Hence, they are independent. By [8], The distribution function of Ms can be written as F Ms x = Φ 2 x, x R. σs Hence, f Ms x = 2 x x σs φ Φ, x R, σs σs 40

49 where σ 2 s = Var G X,X s 4.7 = s α 2s α + s, α φ and Φ are, respectively, the pdf and the cdf of the standard normal distribution. The mean and the variance of Ms are EMs = σsπ /2, VarMs = σ 2 sπ π. Therefore, the.025 and.975 quantiles of Ms can be calculated as M.025 s = σsφ.025, M.975 s = σsφ.975. Note that the distribution of Ms is not symmetric, see Figure 4.2. The hypothesis testing can be performed as follows: If θ n s M.025 s, M.975 s, then we fail to reject H 0, othewise, we reject H 0. Figure 4.2: The density function of Ms =.2 blue dashed line and the density of the normal distribution with µ = EMs and σ 2 = V arms red line. We apply these results to the models defined in Examples 2..6, 2..7 and the model Y = φx + σ Z where φ 0,, X is Pareto and Z is standard normal independent of X. We simulate 000 independent observations from X, Y from each of these models. Random variables 4

50 Model //True TDC Ind. Pareto Dep. Pareto Third model λ T DC = 0 λ T DC = 0.5 λ T DC = T CI T CI T CI Table 4.: Testing for extremal independence. The 95% asymptotic confidence intervals are CI.2= M.025.2, M = 0.25, 0.56, CI.3= M.025.3, M = 0.23, 0.5 and CI.5= M.025.5, M = 0.6, 0.35 for s =.2,.3,.5, u n = n δ and δ =.0. The + sign indicates that T belongs to the CI. X and Y are Pareto with the parameter α = 4. For the third model, we chose φ = 0.8 and σ = 0.. We calculate the test statistics T s with u n = n δ, δ =.0 and s =.2,.3,.5. The simulation results are summarized in Table 4.. As we can see, the simulated results confirmed the theoretical results obtained in the previous section. Indeed, for independent Pareto, we fail to reject H 0 for any s {.2,.3,.5}, while for the dependent Pareto Example 2..7 and the third model mentioned above, we reject the null hypothesis. A comprehensive simulation study will be done using the practical estimators, based on the order statistics, not on the deterministic threshold u n. A good parameter choice For a good asymptotic confidence set, the parameter s has to be chosen roughly in [0.7; ;.5]. However, large values of s produce a narrower asymptotic confidence sets and small values of s generate a wider asymptotic confidence sets. 4.4 Extremogram matrix: d-dimensional case The following section contains an extension of the two dimensional case to higher but finite dimension, say d 2. To this end, we will conveniently introduce the following notation. Definition 4.4. Let X R d. Let X j = X j,..., X dj, j = 0..., n, be an i.i.d. sequence of regularly varying nonnegative random vectors. Assume that X j d =... d = Xdj. Define the tail dependence coefficient 42

51 between the k th and l th component of X 0 as β k,l s = lim n + βk,l n s = lim n + PX k0 > su n, X l0 > tu n, k, l =,..., d, PX 0 > u n and its empirical estimate by β k,l n s = n j= I {X kj >su n,x lj >tu n} n j= I. {X kj >su n} We also define the extremogram matrix and its sample estimate as follows s = d d β k,l s, n s = β n k,l s. k,l= k,l= 4.4. Testing for Extremal Independence using random matrices Again, the goal is to test for extremal independence between the components of the vector X 0. Therefore, we test H 0 : β k,l = 0 vs H : β k,l 0 for all k l. Limiting distribution of sample eigenvalues in higher but finite dimension d 2. Note first that under the null hypothesis, β k,l s = 0, for k l. We have the following theorem which is an extension of the two dimensional case see Theorem Theorem Let X j, j be a random sequence as in Definition Let λ n s... λ nd s be the ordered eigenvalues of n s and λ s =... = λ d s = s α be the eigenvalues of s. Assume that for all k, l =,..., d we have Then, under H 0 we have where G k,k lim nf u n sup β n + n k,l s β k,l s = s>ɛ>0 nf u n λ n s λ s max G k,k s, 4.32 k d s, k =,..., d, are independent Gaussian processes with the covariance CovG k,k s, G k,k s = s α 2s α + s α. Proof: The same as in the two dimensional case. 43

52 4.5 Tail empirical process with random levels In this section, we replace the deterministic levels u n with random, data based, levels and obtain the same limit theorems as in the deterministic levels case. The transition from deterministic to random levels follows the same path as in [7]. Then, we apply again these results to hypothesis testing using random matrices. This approach is again the original author s contribution. The process considered in 3.3 depends on the deterministic levels u n which are chosen such that u n + and nf u n +. In other words the choice of u n depends on the index of regular variation α, which is unknown. To tackle this drawback, empirical processes with random levels are considered. Assume for a moment that F is known, is continuous and strictly increasing. We consider X n:... X n:n, the order statistics from the sample X j, j =,..., n. For the given u n, choose a sequence of integers k = nf u n depending on n the dependence in n is omitted for the notation point of view. Consequently, k = k n and k/n 0. Then F k n = u n, where F is the inverse of F. Furthermore, let F n u = inf{y : F n y > u}. The empirical estimator of u n = F k n is given by F n k n = X n:n k. We call X n:n k the intermediate order statistics. In conclusion, u n can be approximated by X n:n k. This motivates the following data-driven estimator of the tail dependence coefficient between X and Y : β n X,Y s = k n I {Xj >sx n:n k,y j >sx n:n k } j= We note that β X,Y n s is just The goal is to obtain the limiting theory for β X,Y n s defined in 4., where u n is replaced with X n:n k. X,Y β n s. In order to do so, we need to obtain the joint convergence of the intermediate order statistics and the tail empirical process G X,Y n s. Recall from 3.5 and 3.6 that δ X n s = nf u n n I {Xj >su n}, δy n s = j= nf u n n I {Yj >su n}, j= 44

53 and δs := δ Y s = δ X s = We make the following assumption: uniformly in a neighborhood of. lim n + δx n s = lim E[ δ X n + n s] = lim F su n/f u n = s α. n dδ n dδ s ds ds s = αs α, 4.34 Proposition 4.5. Assume that the conditions of Theorem 4.. hold. satisfied, then in R l [s 0, +. { } Xn:n k k, G n X,Y α H X, G X,Y, u n If moreover condition 4.34 is Proof: Recall from 3.3 and 3.4 G X,Y n s = H X n s = { βx,y nf u n nf u n δ n s δ n s, s s 0. n s β n X,Y } s, s s 0, By Theorem 3..3 and Skorokhod representation theorem Theorem 2.2.3, there exist sequences of processes ζ X,Y n d = G X,Y, ζ X,Y = d G X,Y d and ζ n = H X n, ζ = d H on some probability space such that n ζ X,Y n ζ X,Y, ζ n ζ, 4.35 almost surely, uniformly on a compact subsets of [s 0, +, s 0 0,. Then 4.35 and the composition mapping theorem yield ζ n δn s = k δ n δn s s ζδ s, a.s, 4.36 uniformly on compact subsets of [s 0, +. By application of Vervaat s Lemma 2.2.9, with /γ n = k /2, ξ n s = δ n δn s, gs = s and ξ 0 s = ζ δ s, we have ζ n δn s = k δn δ s s ζδ s, a.s, n 45

54 uniformly on compact subsets of [s 0, +. That is k δ n δ n s s ζδ s, a.s Observe now that δ n = X n:n k /u n and δ n =, then by Taylor s expansion around of δ n δ n, we have Hence, by 4.37, 4.38 and 4.34, we have δ n δ n = δ n δ n δ n δn = dδ n ds δ n δ n δn + o = dδ n ds δ n X n:n k /u n + o k δ n δ n = α k X n:n k /u n + o i.e. { } a.s. k X n:n k /u n α ζ Since 4.39 and 4.35 hold almost surely, then they hold jointly. Therefore, the convergence holds also in the original probability space. Remark If 4.39 holds, then X n:n k /u n 0, in probability. Recall from 4.33 that β n X,Y s = k n I {Xj >sx n:n k,y j >sx n:n k }, j= δn s = n I k {Xj >X n:n k } j= Consider the following empirical processes Ĝ X,Y n s = βx,y k n s β X,Y s, Ĥn X s = k δn s δs, s s ĤY n s is defined similarly. We have the weak convergence of the tail empirical processes defined in 4.4. Theorem Suppose that the conditions of Theorem 4.. hold, where nf u n is replaced by k. Furthermore, assume that 4.34 holds and d/dsβ X,Y s exists and is finite in a neighborhood of. Then, Ĝ X,Y n s G X,Y s β X,Y sh X,

55 as n + in l [s 0, +. Moreover, Ĥ X n s H X s s α H X, Ĥ Y n s H Y s s α H Y 4.43 and the convergences hold jointly. Proof: Recall that where We deduce that G X,Y n βx,y s = nf u n n s β X,Y s, s s 0, β X,Y n s = nf u n n I {Xj >su n,y j >su n}. j= Ĝ X,Y n = G X,Y n sx n:n k /u n. We have Ĝ X,Y n s = βx,y k + k n s β n X,Y sx n:n k /u n β n X,Y sx n:n k /u n β X,Y sx n:n k /u n = G X,Y n sx n:n k /u n + J s + J 2 s. + k β X,Y sx n:n k /u n β X,Y s By Theorem 3..3, Remark and bearing in mind the joint convergence, the first term converges weakly to G X,Y. The second term J s vanishes by assumption 4.3. For J 2 s, we apply the delta method to 4.39 to have { } k β X,Y sx n:n k /u n β X,Y s β X,Y sh X. This leads to the desired result. Remark Comparison between Theorem 4.. and Theorem For Theorem 4..: βx,y nf u n n s β X,Y s G X,Y s := G X,Y s β X,Y sh X. For Theorem 4.5.3: βx,y k n s β X,Y s G X,Y s := G X,Y s β X,Y sh X. 47

56 The limits are the same. As a consequence, the estimator of the tail dependence coefficient with deterministic levels and random normalization β n X,Y has the same limit as the estimator with random levels and deterministic normalization β X,Y n. Now we want to use the above result to derive the asymptotic distribution of the corresponding sample extremogram matrix. Recall the matrix n s from 4.2. Consider the following matrices: n s = X,X β n s βx,y n β n X,Y s βy,y n s s, n = βy,x n β X,Y n. Note that n s is just n s, where β X,Y n is replaced with X,Y β n. By Theorem 4.5.3, we have { k vec } n s vec s vechs, where Hs = s G X,Y s GX,X G Y,X s G Y,Y s and G U,V s = G U,V s β U,V sh U, defined in Recall that H 0 : β X,Y = 0 vs H : β X,Y 0. Under the null hypothesis, keeping in mind that under H 0, the limiting process is degenerated, we have 0 s = s α 0 0 s α, H 0 s = GX,X s 0 0 G Y,Y s. Now, we can extend Theorems and to random levels. Theorem Assume that the conditions of Theorem are satisfied. Let λ n s λ n2 s be the ordered eigenvalues of n s and λ s = λ 2 s = s α be the eigenvalues of s. Assume that lim k sup n + β n X,Y s>ɛ>0 48 s β X,Y s = 0.

57 Then under H 0 we have k λ n s λ s max G X,X s, G Y,Y where G X,X s = H X s s α H X and G Y,Y s = H Y s s α H Y. s, Theorem Let X j, j be a random sequence as in Definition Let λ n s... λ nd s be the ordered eigenvalues of n s and λ s =... = λ d s = s α be the eigenvalues of s. Assume that for all m, l =,..., d we have lim k sup β n + n m,l s β m,l s = 0. s>ɛ>0 Then, under H 0 we have k λ n s λ s max G l,l s, l d where G l,l s, l =,, d, are as in Theorem Hypothesis testing with random levels We run the same simulations as in Table 4. using the test statistic obtained wih random levels for the models used therein. Indeed, we calculate the test statistics for s [s 0, +, s 0 0, T 2 s = k λ n s λ s. The value of k ranges from 0.0n to 0.n, and s =.2 see Table 4.2 and s =.3 see Table 4.3. As we can see, the simulated results confirm the theoretical results obtained in this section. Indeed, for independent Pareto, we fail to reject H 0, while for the dependent Pareto Example 2..7 and the third model mentioned above, we reject the null hypothesis. 49

58 ModelTrue TDC k 0.0n 0.02n 0.03n 0.04n 0.05n 0.06n 0.07n 0.08n 0.09n 0.n Ind. Pareto 0 Dep. Pareto 0.5 Third model T CI T CI T CI Table 4.2: Testing for extremal independence. The 95% asymptotic confidence interval for s =.2 is CI= M.025, M.975 = 0.25; The + sign indicates that T 2 belongs to the CI and the numbers in parentheses correspond to the true TDC. ModelTrue TDC k 0.0n 0.02n 0.03n 0.04n 0.05n 0.06n 0.07n 0.08n 0.09n 0.n Ind. Pareto 0 Dep. Pareto 0.5 Third model T CI T CI T CI Table 4.3: Testing for extremal independence. The 95% asymptotic confidence interval for s =.3 is CI= M.025, M.975 = 0.23, 0.5. The + sign indicates that T 2 belongs to the CI and the numbers in parentheses correspond to the true TDC. 4.6 Implementation: Simulation studies In this section, we perform some simulations studies to support our theoretical results. We deal with the estimation of the tail dependence coefficient defined in 2.4 as λ TDC = lim x PX > x, Y > x, F x where X and Y have the same distribution. We make use of the estimator β X,Y defined in The estimator is computed for different values k, where k is the number of order statistics and plotted against the order statistics X n:,..., X n:n, being arranged in increasing order. The choice of the threshold k is not addressed here but can be found in the literature e.g. [0]. The estimator captures the dependence 50

59 structure. Example 4.6. Independent Pareto We simulate 000 independent observations from X, Y, where X and Y are independent Pareto-distributed with α = 4. In this case the tail dependence coefficient is just 0. Figure 2.2 shows the estimated values of the tail dependence coefficient computed using the estimator 4.33 for different values of k, where k is the number of order statistics used. Figure 4.3: The scatter plot left panel and the estimator of the TDC right panel for X and Y drown independently from Pareto distribution with α = 4. Example Dependent Pareto We simulate 000 independent observations from the model X, Y, where Y = Z 2 + Z 3, X = Z + Z 2, and Z i, i =, 2, 3 are independent Pareto with α = 4. In this case the tail dependence coefficient is 0.5. The results are displayed on Figure 4.4. This dependence is captured by the estimator. Example Bivariate t Figure 4.5 shows the estimate of the tail dependence coefficient for the bivariate t distribution i.e X, Y = Z U, U 2, where α/z is chi-square distributed with α = 4 degrees of freedom and U, U 2 are standard normal with correlation ρ = 0.9. The tail dependence coefficient in this case is 0.63, see [5]. The scatter plot indicates strong dependence in the upper and lower tail which is confirmed by the estimation of the tail dependence coefficient. 5

Figure 4.4: The scatter plot left panel and the estimator of the TDC right panel for the model Y = Z 2 + Z 3, X = Z + Z 2, where Z i, i =, 2, 3 are independent Pareto with α = 4. Figure 4.

60 Figure 4.4: The scatter plot left panel and the estimator of the TDC right panel for the model Y = Z 2 + Z 3, X = Z + Z 2, where Z i, i =, 2, 3 are independent Pareto with α = 4. Figure 4.5: The bivariate t distribution: the scatter plot left panel and the estimator of the TDC right panel plotted against the order statistics. 4.7 Real data analysis In this section we apply our method to two financial data sets. Example 4.7. Stock prices data set The first data set contains the absolute log returns of daily stock 52

Heavy Tailed Time Series with Extremal Independence

Heavy Tailed Time Series with Extremal Independence Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Herold Dehling Bochum January 16, 2015 Rafa l Kulik and Philippe Soulier Regular variation