Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010
Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate (multivariate) Extremes Point Process Representation Forms of Extremal Dependence
Part-1: Motivation
Part-1: Extreme Quantile Estimation X F (unknown) = VaR q =F 1 (q) i.e. Pr(X >VaR q )=1 q
Part-1: General Considerations Main Concern: Estimating the upper tail of the distribution. Practical Difficulties: There are very few observations in the tail region. The central observations are irrelevant for statistical inference. There is usually a need for extrapolation beyond the range of the observed data. Asymptotic Results: EVT characterizes the limiting behaviour of { Definition extreme (tail) observations = Modelling
Part-1: Threshold Exceedances
Part-1: Excess Distribution Let X 1,..., X n be independent and identically distributed (iid) random variables with X i F (unknown). Let u be a high threshold and consider all threshold exceedances X : X > u. Define the conditional distribution of X X > u as 1 F u (x) }{{} Excess Distribution = Pr [X > x X > u] = 1 F (x) 1 F (u) where u < x < x = sup {x : F (x) < 1} is the upper endpoint of F
Part-1: GPD Specification Theorem (Balkema & de Haan 1974 and Pickands 1975) When u x, we have [ F u (x) 1 1 + ξ = GPD ϕ,ξ ( x u ϕ )] 1/ξ + = Generalized Pareto Distribution ϕ > 0 (scale parameter) and ξ R (shape parameter)
Part-1: GPD Sub-Families { GPD sub-families }} { ξ < 0 ξ = 0 ξ > 0 Weibull (short-tailed) Gumbel (light-tailed) Fréchet (heavy-tailed)
Part-1: GPD Parameter Estimation Parameter estimates can be obtained by maximizing the likelihood function given by L(ϕ, ξ) = n u i=1 1 ϕ [ 1 + ξ X i u ϕ ] 1/ξ 1 + where n u = #{X i > u} (Smith 1985) standard errors / confidence intervals are calculated using approximate normality of MLE provided ξ > 0.5 ( ) (( ) ) ϕ BVN, I ˆϕˆξ 1 ξ E ] I E = E [ 2 log L(ϕ,ξ) ϕ 2 2 log L(ϕ,ξ) ξ ϕ 2 log L(ϕ,ξ) ϕ ξ 2 log L(ϕ,ξ) ϕ 2
Part-1: GPD Threshold Selection Threshold selection is a trade-off between }{{} bias and } variance {{} low u high u Select u as low as possible, subject to adequate diagnostics
Part-1: GPD Application - Value-at-Risk VaR q for q (0, 1) is the loss level that occurs on average once every 1/(1 q) days. Mathematically Now for VaR q > u we have Pr (X > VaR q ) = 1 q (1) Pr (X > VaR q ) = Pr [X > VaR q X > u] Pr (X > u) By equating Eq.(1) & Eq.(2) we obtain VaR q = u + ϕ ξ = [1 + ξ (VaR q u) /ϕ] 1/ξ (n u /n) (2) [ (1 ) ξ q 1] n u /n
Part-1: GPD Application - Value-at-Risk
Part-2: Bivariate Extremes
Part-2: Marginal Standardization Given continuous bivariate random vector (X, Y ) F where X F X and Y F Y. Standardize margins to focus purely on dependence structure. Typical choice in EVT is the unit Fréchet distribution that is Pr(Z z) = exp ( 1/z), z > 0 Transformation involves using probability integral transform: X = 1/ log F X (X ) has unit Fréchet Y = 1/ log F Y (Y ) has unit Fréchet
Part-2: Marginal Standardization
Part-2: Copula Function Theorem (Sklar 1959) Consequently, F (X, Y ) = C{X, Y } C is unique and termed the copula function C captures the dependence structure of (X, Y ) C is the joint distribution function of (X, Y )
Part-2: Point Process Representation Let (X 1, Y 1 ),..., (X n, Y n ) be iid random vector with unit Fréchet margins but unknown copula C Construct a sequence of point processes P n as follows {( X P n = i n, Y i ) } : i = 1,..., n n On a region A bounded from the origin we then have P n Poisson Process as n That is N (A) Poisson ( Λ(A) ) with Λ (A) = E{N (A)}
Part-2: Point Process Representation radial distance R = X + Y X and W = n X + Y angular spread }{{} independent components Theorem (de Haan 1985) Λ(A) = A dr r 2 2dH(w) where H (dependence measure) is a distribution on [0, 1] with 1 0 wdh(w) = 1/2
Part-2: Point Process Representation Statistical Inference Specify A (bounded from origin) Estimate H (non) parametrically Popular choice is logistic model where the measure H has density h(w) = 1 2 (1/α 1){w(1 w)} 1 1/α {w 1/α + (1 w) 1/α } α 2 α = 1 α 0 independence h(w) large close to w = 0 & 1 perfect dependence h(w) large close to w = 0.5 For some large r min consider A = {(R, W ) : R > r min } Maximize likelihood function L(α) = {h(w) : w A}
Part-2: Point Process Representation
Part-2: Point Process Representation
Part-2: Forms of Extremal Dependence Define the Tail Dependence Coefficient (Sibuya 1960) χ = lim Pr {Y > z X > z } z Then It measures the tendency for one variable to be extreme given that the other variable being extreme. χ > 0 { X, Y are asymptotically dependent the larger χ the stronger dependence X, Y are asymptotically independent χ = 0 χ is not informative as tells nothing about the strength of dependence
Part-2: Forms of Extremal Dependence The classical bivariate EVT is based on distributions with { 0 if X Y χ = 1 min(w, 1 w)2dh > 0 otherwise 0 The presented theory supports the asymptotic dependence class only! This limitation has been overcome only recently. See Ledford & Tawn 1996 and Heffernan & Tawn 2004.
References Coles, S. An introduction to statistical modelling of extreme values. Springer, New York, 2001. Embrechts, P., Klüppelberg, C. and Mikosch, T. Modelling extremal events for insurance and finance. Springer, Berlin, 1997. THANK YOU