Bivariate generalized Pareto distribution

Similar documents
Models and estimation.

Bivariate generalized Pareto distribution in practice: models and estimation

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Extreme Value Analysis and Spatial Extremes

A Conditional Approach to Modeling Multivariate Extremes

On the Estimation and Application of Max-Stable Processes

Some conditional extremes of a Markov chain

Classical Extreme Value Theory - An Introduction

Bayesian Modelling of Extreme Rainfall Data

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Overview of Extreme Value Analysis (EVA)

Statistics of Extremes

New Classes of Multivariate Survival Functions

The extremal elliptical model: Theoretical properties and statistical inference

MULTIVARIATE EXTREMES AND RISK

Financial Econometrics and Volatility Models Extreme Value Theory

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Multivariate generalized Pareto distributions

Sharp statistical tools Statistics for extremes

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Multivariate Pareto distributions: properties and examples

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Bayesian Inference for Clustered Extremes

Bayesian Model Averaging for Multivariate Extreme Values

Extreme Value Theory and Applications

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

Estimating Bivariate Tail: a copula based approach

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Estimation of spatial max-stable models using threshold exceedances

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Assessing Dependence in Extreme Values

Math 576: Quantitative Risk Management

ESTIMATING BIVARIATE TAIL

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Tail negative dependence and its applications for aggregate loss modeling

Multivariate generalized Pareto distributions

Estimation of the Angular Density in Multivariate Generalized Pareto Models

Technische Universität München Fakultät für Mathematik. Properties of extreme-value copulas

Modelação de valores extremos e sua importância na

Statistical Methods for Clusters of Extreme Values

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Generalized Logistic Distribution in Extreme Value Modeling

Prediction for Max-Stable Processes via an Approximated Conditional Density

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Analysis of Bivariate Extreme Values

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Simulation of Max Stable Processes

Modelling extreme-value dependence in high dimensions using threshold exceedances

Introduction A basic result from classical univariate extreme value theory is expressed by the Fisher-Tippett theorem. It states that the limit distri

Max-stable Processes for Threshold Exceedances in Spatial Extremes

A conditional approach for multivariate extreme values

arxiv: v2 [stat.me] 25 Sep 2012

Statistics of Extremes

Financial Econometrics and Volatility Models Copulas

The Behavior of Multivariate Maxima of Moving Maxima Processes

A Brief Introduction to Copulas

Semi-parametric estimation of non-stationary Pickands functions

COPULA FITTING TO AUTOCORRELATED DATA, WITH APPLICATIONS TO WIND SPEED MODELLING

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

Approximating the Conditional Density Given Large Observed Values via a Multivariate Extremes Framework, with Application to Environmental Data

Peaks-Over-Threshold Modelling of Environmental Data

Construction and estimation of high dimensional copulas

Extreme Value Theory as a Theoretical Background for Power Law Behavior

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

Generalized additive modelling of hydrological sample extremes

Accounting for Choice of Measurement Scale in Extreme Value Modeling

Emma Simpson. 6 September 2013

Extreme Value Analysis of Multivariate High Frequency Wind Speed Data

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

Max-stable processes: Theory and Inference

Limit Distributions of Extreme Order Statistics under Power Normalization and Random Index

What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies

Tail Dependence Functions and Vine Copulas

On the Estimation and Application of Max-Stable Processes

Nonlinear Time Series Modeling

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula

Vine Copulas. Spatial Copula Workshop 2014 September 22, Institute for Geoinformatics University of Münster.

Shape of the return probability density function and extreme value statistics

Models for Spatial Extremes. Dan Cooley Department of Statistics Colorado State University. Work supported in part by NSF-DMS

Quantitative Modeling of Operational Risk: Between g-and-h and EVT

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

arxiv: v4 [math.pr] 7 Feb 2012

Tail dependence in bivariate skew-normal and skew-t distributions

Max stable Processes & Random Fields: Representations, Models, and Prediction

Contributions for the study of high levels that persist over a xed period of time

DETERMINATION OF COMBINED PROBABILITIES OF OCCURRENCE OF WATER LEVELS AND WAVE HEIGHTS FOR THE SAFTEY VALIDATION OF DIKES AT THE BALTIC SEA

Bayesian Multivariate Extreme Value Thresholding for Environmental Hazards

The estimation of M4 processes with geometric moving patterns

Accounting for extreme-value dependence in multivariate data

Extremogram and ex-periodogram for heavy-tailed time series

R&D Research Project: Scaling analysis of hydrometeorological time series data

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

Modulation of symmetric densities

Two practical tools for rainfall weather generators

Introduction to Algorithmic Trading Strategies Lecture 10

Transcription:

Bivariate generalized Pareto distribution in practice Eötvös Loránd University, Budapest, Hungary Minisymposium on Uncertainty Modelling 27 September 2011, CSASC 2011, Krems, Austria

Outline Short summary of extreme value theory (EVT), including: univariate/multivariate case representation of dependence structures within EVT multivariate generalized Pareto distribution of type II Construction of new asymmetric models Properties and comparison with well-known models Applications on environmental data (wind speed / flood peak)

Introduction There is often interest in understanding how the extremes of two different processes are related to each other One possible solution is using asymptotic approach from EVT EVT provides limits for the distribution of extremes for a single/multivariate stationary process

Limit for Maximum Theorem (Fisher and Tippet, 1928) If there exist {a n } and {b n } > 0 sequences such that ( Mn a n P b n ) z G(z) as n for some non-degenerate distribution G, then { ( G(x) = exp 1+ξ x µ σ ) 1 ξ where 1+ξ x µ σ > 0 and in such a case random variable belongs to the max-domain of attraction of a GEV distribution. Often applied for block maxima (monthly, yearly,...) },

Limit for Threshold Exceedances How to model large values over a given high threshold? Theorem (Balkema and de Haan, 1974) i.i.d. Suppose, that X 1,...,X n F, where F belongs to the max-domain of attraction of GEV for some ξ,µ and σ > 0. Then ( P(X i u z X i > u) H(z) = 1 1+ ξz σ ) 1 ξ where σ = σ +ξ(u µ). as u z +, Often applied for threshold exceedances over a high quantile of the obs.

Remarks The limit distributions are not Gaussian Strong link between GEV and GPD See examples below: GEV GPD density 0.0 0.1 0.2 0.3 0.4 Gumbel Frechet Weibull density ξ= 0 ξ= 0.5 ξ= 0.3 2 0 2 4 6 8 10 2 4 6 8 10 domain domain

Limit for Multivariate Maxima Component-wise maxima of X i = (X i,1,...,x i,d ) is ( n M n = X i,1,..., i=1 n X i,d ). i=1 If X F and there exist a n and b n > 0 sequences of normalizing vectors, such that ( Mn a n P b n ) z = F n (b n z+a n ) G(z), where G i margins are non-degenerate, then G is said to be a multivariate extreme value distribution (MEVD).

Characterization I. - Resnick (1987) MEVD with unit Fréchet margins can be written as ( ) G Fréchet (t 1,...,t d ) = exp V(t 1,...,t d ) V is called exponent measure V(t 1,...,t d ) = ν(([0,t 1 ] [0,t d ]) c ) = d S d i=1 ( wi t i ) S(dw), S spectral measure is a finite measure on the d-dimensional unit simplex satisfying the equations S d w i S(dw) = 1 for i = 1,...,d.

Some properties of the characterization I. Total mass of S is always S(S d ) = d Density not only on the interior of S d but also on each of the lower-dimensional subspaces of S d For independent margins S({e i }) = 1 for all e i vertices of the simplex For completely dependent margins S((1/d,..., 1/d)) = d

Characterization II. (bivariate case) - Pickands (1981) The dependence function A(t) must satisfy the following three properties: (P) 1. A(t) is convex 2. max{(1 t),t} A(t) t 3. A(0) = A(1) = 1 ( 1 V(t 1,t 2 ) = + 1 ) ( t1 ) A t 1 t 2 t 1 +t 2 A(t) (0,1) (1,1) CONVEX (1/2,1/2) t

For those who like uniform margins better The copula of a MEVD G is the distribution function of the random vector (G 1 (Y 1 ),...,G d (Y d )) and is given by ( ( 1 1 )) C(u) = exp V,...,,u [0,1] d. logu 1 logu d Such a copula necessarily satisfies the stability property C s (u) = C(u s 1,...,u s d ) and conversely if this property is satisfied than we get a copula of a MEVD.

v Introduction Extreme value models Applications Logistic Dependence (Gumbel copula) A(t) = (t α +(1 t) α ) 1/α C(u,v) = exp( (( logu) α +( logv) α ) 1/α ) Copula distribution function Spectral density Dependence function α = 2.5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 (α 1)( t(1 t) (α 2) (t α + (1 t) α (1 α 2) ) 0 1 2 3 4 5 6 7 α = 1.5 α = 2 α = 2.5 α = 3 α = 3.5 (t α + (1 t) α (1 α) ) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 α = 1 α = 2 α = 2 α = 3 α = 3 u t t

y y y y Logistic BEVD and its asymmetric extension A(t) = (1 Φ 1 )t +(1 Φ 2 )(1 t)+((φ 1 t) α +(Φ 2 (1 t)) α ) 1/α 0 Φ 1,Φ 2 1 S({0,1}) = 1 Φ 2,S({1,0}) = 1 Φ 1 2 0 2 4 6 symmetric, dep=2.5 0.02 0.16 0.08 0.12 0.1 0.14 0.06 0.04 0.25 0.20 0.15 0.10 0.05 0.00 2 0 2 4 6 asy = c(0.5,0.9) 0.1 0.02 0.14 0.08 0.12 0.04 0.06 0.15 0.10 0.05 0.00 2 0 2 4 6 2 0 2 4 6 x 2 0 2 4 6 asy = c(0.5,0.5) 0.04 0.12 0.02 0.08 0.1 0.06 0.15 0.10 0.05 0.00 2 0 2 4 6 asy = c(0.5,0.1) 0.1 0.08 0.01 0.02 0.04 0.07 0.11 0.05 0.09 0.03 0.06 0.12 0.10 0.08 0.06 0.04 0.02 0.00 2 0 2 4 6 2 0 2 4 6

MGPD of type II Problem: taking maxima hides the time structure within the period.. Might be better investigating all observations over a given high threshold Let Y = (Y 1,...,Y d ) denote a random vector, u = (u 1,...,u d ) a high threshold vector and so X = Y u the vector of exceedances. The distribution of X 0 exceedances can be well approximated by the multivariate generalized Pareto distribution (Rootzén and Tajvidi, 2006) ( ) 1 H(x 1,...,x d ) = logg ( 0,...,0 ) log G x1,...,x d G ( x 1 0,...,x d 0 ), where 0 < G(0,...,0) < 1.

What is type I. then? If we claim exceedance in all of the coordinates (BGPD I), we usually get simpler models, with nice properties (marginals are univariate GPD etc.) But we may use less data Logistic bgpd by evd Logistic bgpd by mgpd density curves density curves Hannover m/s 0 5 10 15 20 25 30 No Inference! No Inference! No Inference! Hannover m/s 0 5 10 15 20 25 30 No Inference! 0 5 10 15 20 25 Fehmarn m/s 0 5 10 15 20 25 Fehmarn m/s

Remarks H 1 (x) = H(x, ) is not a one dimensional GPD, only the conditional distribution of X 1 X 1 > 0 is GPD. The ξ i,µ i,σ i parameters cannot be considered as the ith marginal parameters any more Motivating problem: asymmetric logistic model does not lead to an absolutely continuous MGPD.. Theorem Let H be a MGPD represented by an absolutely continuous MEVD G with spectral measure S. The MGPD H is absolutely continuous if and only if S(int(S d )) = d holds, i.e all mass is put on the interior of the simplex.

Overview of parametric dependence models Model Asym. Density Complications Sym. Logistic Asym. Logistic (Tawn, 1988) Ψ-logistic convexity constr. Sym. Neg. Logistic Asym. Neg. Logistic (Joe, 1990) Ψ-negative logistic convexity constr. Bilogistic (Smith, 1990) not explicit Neg. Bilogistic (C. & T., 1994) not explicit Tajvidi s (Tajvidi, 1996) C-T (C. & T., 1991) only s(w) is given Mixed (Tawn, 1988) if ψ = 1 Asym. Mixed

Construction of absolutely continuous BGPDs 1. take an absolutely continuous baseline dependence model; 2. take a strictly monotonic transformation Ψ(x) : [0,1] [0,1], such that Ψ(0) = 0, Ψ(1) = 1; 3. construct a new dependence model from the baseline model A Ψ (t) = A(Ψ(t)); 4. check the constraints: A Ψ (0) = 1, A Ψ (1) = 1 and A Ψ 0 (convexity).

An example for Ψ(x) Feasible transformation if Ψ(t) = t +f(t) (A(Ψ(t))) = A (Ψ(t)) [1+f (t)] so choosing f such that f (0) = f (1) = 0, the A Ψ (0) = 1 and A Ψ (1) = 1 are fulfilled. f ψ1,ψ 2 (t) = ψ 1 (t(1 t)) ψ 2, for t [0,1], where ψ 1 R and ψ 2 1 are asymmetry parameters.

Graphical examples δ 2 ta αlog=2(ψ(t) δ 2 ta αneglog=1.2(ψ(t) 0.0 1.0 2.0 3.0 Psi Logistic ψ 1 = 0.3 ψ 1 = 0.15 ψ 1 = 0 ψ 1 = 0.15 ψ 1 = 0.3 0.0 1.0 2.0 3.0 Psi NegLog ψ 1 = 0.3 ψ 1 = 0.15 ψ 1 = 0 ψ 1 = 0.15 ψ 1 = 0.3 t t Constraints of convexity for psi logistic models Constraints of convexity for psi negative logistic models ψ 1 0.4 0.2 0.0 0.2 0.4 ψ 2 = 1.8 ψ 2 = 2 ψ 2 = 2.2 ψ 1 4 2 0 2 4 ψ 2 = 1.8 ψ 2 = 2 ψ 2 = 2.2 1.4 1.6 1.8 2.0 2.2 0.9 1.0 1.1 1.2 1.3 1.4 1.5 α α

Remarks Of course f and or the baseline dependence function can be chosen differently This transformation has particular impact beyond the BGPD models as well: applicable for BEVDs, BGPDs of type I or for copulas of any bivariate distributions Although the transformations Ψ was defined now for the bivariate case only, it can be extended for the higher dimensional cases as well as e.g. in the trivariate case it can have a form of Ψ(t 1,t 2 ) = (t 1 +ψ 1,1 (t 1 ([1 t 2 ] t 1 )) ψ 2,1,t 2 +ψ 2,1 (t 2 ([1 t 1 ] t 2 )) ψ 2,2 ).

Further models - polinomial transformations f a,p (t) = 64a(p 5 p 4 2p 3 2p 2 +5p 2)p 5 (p +1) 2 (p 1) 2 ( 1+2p)(1+2p 2 3p) t6 + 32a(5p6 2p 5 13p 4 12p 3 +15p 2 +6p 6)p 4 (p +1) 2 (p 1) 2 ( 1+2p)(1+2p 2 t 5 3p) + 32a(4p7 +3p 6 14p 5 15p 4 +23p 2 8p 2)p 3 (p +1) 2 (p 1) 2 ( 1+2p)(1+2p 2 t 4 3p) + 32a(p7 +4p 6 5p 5 10p 4 5p 3 +12p 2 +2p 4)p 3 (1+2p 2 3p)(p +1) 2 (4p 1+2p 3 5p 2 t 3 ) 32a(p 6 3p 4 p 2 +4p 2)p 3 + (1+2p 2 3p)(p +1) 2 (4p 1+2p 3 5p 2 ) t2.

Further models - polinomial transformations 4 2 0 2 4 0.30 0.25 0.20 0.15 0.10 0.05 0.00 4 2 0 2 4 0.30 0.25 0.20 0.15 0.10 0.05 0.00 4 2 0 2 4 0.30 0.25 0.20 0.15 0.10 0.05 0.00 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4

BEVD case 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Wind data Flood data Software Wind data Daily windspeed values at two locations in northern Germany (Bremerhaven and Schleswig) over the past five decades (1957-2007) Originally hourly windspeed was recorded, but in order to reduce the serial correlation within the series, we calculated the maxima of the original observations for each day. The data for the above cities are rather strongly correlated, due to the relatively short distance between them (Kendall s correlation is 0.57) and also can be considered as asymptotically dependent 95% quantiles have been used as threshold level (13.8m/s, 10.3m/s) Observations are considered as being stationary, non-stationarity can also be handled e.g. by choosing time-dependent model parameters

Wind data Flood data Software Wind data Exceedances (m/s) Copula sample Schleswig 5 10 15 20 Schleswig 10 15 20 25 Bremerhaven Bremerhaven

Wind data Flood data Software Parameter estimation We found that maximum likelihood methods perform well in general However there is strong pairwise correlation between the parameters µ i,σ i, i = 1,2 This can be extinguished in practice by keeping e.g. one location parameter fixed during the optimization process Estimation within this subclass gives stable convergence results Prediction regions (Hall and Tajvidi,2000) have been computed - the smallest region containing an observation with a given (usually high) probability..

Wind data Flood data Software Prediction regions for the 4 best models (having the smallest χ 2 -statistics) The effect of asymmetry parameters - relevant torsion in the regions regions: NegLog Psi NegLog Schleswig(m/s) 5 10 15 20 25 Regions γ= 0.99 γ= 0.95 γ= 0.9 Schleswig(m/s) 5 10 15 20 25 Regions γ= 0.99 γ= 0.95 γ= 0.9 5 10 15 20 25 30 5 10 15 20 25 30 Bremerhaven(m/s) Bremerhaven(m/s) Coles Tawn Tajvidi s Schleswig(m/s) 5 10 15 20 25 Regions γ= 0.99 γ= 0.95 γ= 0.9 Schleswig(m/s) 5 10 15 20 25 Regions γ= 0.99 γ= 0.95 γ= 0.9 5 10 15 20 25 30 5 10 15 20 25 30 Bremerhaven(m/s) Bremerhaven(m/s)

Wind data Flood data Software Goodness-of-fit: Bremerhaven és Schleswig γ 0.99 0.95-0.99 0.9-0.95 0.75-0.9 0.5-0.75 0.5 χ 2 Exp. 12.6 50.3 62.8 188.6 314.2 628.5 - Log 7 30 66 216 350 588 21.5 Ψ-L 9 33 60 215 349 591 16.9 NL 11 31 66 215 334 600 14.0 Ψ-NL 11 34 63 210 337 602 10.7 Mix 12 46 71 247 331 550 30.3 C-T 12 34 65 209 330 607 9.1 Tajv. 11 35 62 212 340 597 11.5 BL 6 29 62 220 354 586 25.6 NBL 13 32 61 220 337 594 15.5

Wind data Flood data Software Results C-T (χ 2 CT = 9.1) and Ψ-Neglog models (χ2 Ψ-NegLog = 10.7) perform the best In general the new asymmetric (Ψ) models are better than their symmetric baseline versions, χ 2 Log = 21.5 reduces to χ 2 Ψ-Log = 16.9 and χ2 NegLog = 14 reduces to χ2 Ψ-NegLog = 10.7 The difference between the log-likelihood values of the Neglog and Ψ-NegLog models is around 10, which shows a significant improvement in the likelihood ratio as well The Tajvidi s generalized symmetric logistic model (χ 2 Tajvidi s = 11.5) is the best among the symmetric ones.

Wind data Flood data Software Danube water level data - 100 years of daily flood peaks Fitted BGPD models for flood data from Budapest and Tass (threshold was chosen at about the 95% quantiles. Black region covers 95%, green: 90%, blue: 75%). Logistic Psi Logistic 400 200 0 200 400 400 200 0 200 400 400 200 0 200 400 400 200 0 200 400

Wind data Flood data Software R package available Graphs and computations have been made by the add-on package mgpd of R Wind data are also availabe Updated version is coming up very soon (mgpd v2.0) Acknowledgements: The European Union and the European Social Fund have provided financial support to the project under the grant agreement no. TÁMOP 4.2.1./B-09/KMR-2010-0003.

Wind data Flood data Software Thank you for your attention!