Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Extreme value statistics: from one dimension to many Lecture 1: one dimension Lecture 2: many dimensions

The challenge for extreme value statistics right now: to go from 1 or 2 dimensions to 50 or more The challenge for computation: To maximize fifty-dimensional likelihoods with hundreds of parameters The challenge for probability: To construct parametric models which makes this possible and to understand the models (and then model validation, asymptotics, ) Extreme rainfall at many catchments, flooding of many dykes, storm insurance, landslide risk assessment, consistent risk estimation for financial portfolios, extreme influenza epidemics,

Notation Bold symbols are d-variate vectors. For instance, γγ = (γγ 1,..., γγ dd ) and 00 = (0,..., 0) RR dd. Operations and relations are componentwise, with shorter vectors recycled. For instance aaaa + bb = (aa 1 xx 1 + bb 1,..., aa dd xx dd + bb dd ), xx yy iiii xx jj yy jj for jj = 1,..., dd, and tt γγ = (tt γγ 1,, tt γγ dd)

dd-dimensional generalized extreme value (GEV) distributions No finite-dimensional parametrization Marginal distributions GEV: GG ii xx = ee 1+xx μμ ii σσ ii + Can be expressed through spectral representations. 1/γγ ii Can be expressed in terms of point processes, see later slides

Let ηη, dd be the vector of lower endpoints of the marginal distributions of GG The GEV distributions are the limit distributions of maxima: Let XX 1, XX 2, be an i.i.d. sequence of dd-dimensional vectors with cdf FF. If for some scaling and location sequences aa nn > 00 and bb nn PPPP aa 1 nn nn ii=1 XX ii bb n ηη xx dd GG xx, as nn where GG has non-degenerate margins, then GG is a GEV distribution. Conversely all GEV distributions can be obtained in this way.

The GEV distributions are the max-stable distributions: Let XX 1, XX 2, be an i.i.d. sequence with nondegenerate marginal cdf-s GG ii. If for some scaling and location sequences αα nn > 0 and ββ nn 1 nn Pr αα nn XX ii ββ nn xx = GG xx, for nn 1, 2, ii=1 then GG is a GEV distribution. Conversely all GEV distributions can be obtained in this way.

Think of XX 1, XX 2,, XX n as points in RR dd. Subtract aa nn and divide by bb nn to get a point process with points XX ii aa nn bb nn ηη; ii = 1, nn. It converges as nn iff XX DD(GG), to a limit point process xx 2 xx 1 {WW ii RR ii γγ } where (in sequel think of γγ > 00) WW ii are i.i.d. copies of some ( any ) positive stochastic dd-dimensional vector and the RR ii are points of a unit rate Poisson process on RR + WW 1 /RR 1 γγ WW 2 /RR 2 γγ WW 3 /RR 3 γγ 1 2 3

All d-dimensional GEV distributions may be obtained as γγ G xx = PPPP(max{WW ii RR ii ii } xx) Often the vector WW = WW (dd) is thought of as obtained by sampling from a process WW(ss) with ss RR dd (as in figure on previous slide) Often G is defined on a standardized scale, say γγ = 11, and model then is transformed to the real scale Smith process: WW is a non-random Gaussian density function Brown-Resnick process: WW(ss) = exp{xx ss σσ 2 /2}, with XX a Gaussian process with variance σσ 2 Schlater model

The block maxima method As in one dimension: Get observations XX 1,, XX nn of block maxima; assume the observations are i.i.d and follow a GEV distribution; use XX 1,, XX nn to estimate the parameters of the GEV distribution Likelihood estimation: Distribution functions often are possible to calculate, but differentiation with respect to the dd component variables often lead to combinatorial explosion of number of terms (because often GG xx = exp{ l(xx)}) Composite likelihood: Use sum of likelihoods of pairs, or triplets, or Stephenson-Tawn likelihoods: in between block maxima and PoT A. C. Davison, S. A. Padoan and M. Ribatet Statistical Modeling of Spatial Extremes, Statistical Science 2012 R. Huser, A. C. Davison, M. G. Genton Likelihood estimators for multivariate extremes, Extremes 2016

Copula modelling (i.e. modeling after transforming marginal distributions to uniformity) is often used Popular copulas: logistic = Gumbel; asymmetric logistic; inverted logistic; Gaussian; Husler-Reiss; bilogistic; Dirichlet; distributions with same copula, but different margins: Uniform; Fréchet; Gumbel; and Laplace

pointwise 25-year return levels for rainfall (mm) obtained from latent variable and max-stable models. Top and bottom rows: lower and upper bounds of 95% pointwise credible/confidence intervals. Middle row: predictive pointwise posterior mean and pointwise estimates. Left column: latent variable model. Middle column: Another latent variable model. Right column: Extremal t copula model. Copied from Davison, Padoan & Ribatet paper

Multivariate peaks over thresholds modelling and likelihood inference Or: News from the exciting exploration of GP-territory Joint with Anna Kiriliouk, Johan Segers, Maud Thomas, Jennifer Wadsworth

The philosophy is simple Extreme episodes are often quite different from ordinary everyday behavior, and ordinary behavior then has little to say about extremes, so that only other extreme events give useful information about future extreme events Operationally: choose thresholds uu 1,, uu dd, say that an extreme episode occurs if at least one of the components of the observation YY = (YY 1,, Y dd ) exceeds its threshold, only model times of occurence and undershoots and overshoots, XX = YY uu, of extreme episodes Theory: times of occurrence follows a Poisson process, XX is a generalized Pareto (GP) distribution

Generalized Pareto(GP) distributions HH xx = 1 log GG(00) log GG(xx) GG(xx 00), where GG is a GEV cdf with GG 00 > 0

The GP distributions are the limit distributions threshold excesses: Let XX~ FF. If there exist continuous threshold and scaling functions uu tt and ss tt > 0, with FF uu tt < 1 and FF uu tt 1 as tt, such that PPPP ss tt 1 XX uu tt xx XX uu tt dd HH xx, as nn, where HH has non-degenerate margins, then HH is a GP distribution. Conversely all GP distributions can be obtained in this way. maxima of FF are in the domain of attraction of a GEV cdf if and only if threshold excesses of FF are in the threshold domain of attraction of a GP cdf HHH

The GP distributions are the threshold-stable distribution: Let XX~HH, with HH nondegenerate. If there exist continuous threshold and scaling functions uu tt, ss tt 00 and with FF uu tt < 1 and FF uu tt 1 as tt, such that PPPP ss tt 1 XX uu tt xx XX > uu tt = HH xx, for all tt 1, then HH is a GP distribution. Conversely all GP distributions has this property.

Closed under Properties of the class of GP distributions Scaling Increase of level Taking limits Mixing Going to conditional margins Not closed under Location changes Going to (unconditional) margins Conditional 1-d marginals GP: HH jj + xx = 1 1 + γγ jj σσ jj xx (= 1 ee xx/σσ jj if γγ jj = 0) 1/γγ jj

Multivariate PoT modelling Should ideally contain three components: 1) A model for the behavior of extreme episodes in the physical world 2) Understanding of how conditioning on threshold exceedance changes the distribution obtained in 1) 3) The possibility to incorporate trends in the models ( likelihood inference) Inference should be for the original observations, and not after (approximate) transformation of margins of observations to some standard form, say standard Pareto or standard GP

A typical point (positive shape) RR TT γγ RR an arbitrary dd-dimensional random vector, subject to EE RR jj <, and RR jj 0 if γγ jj > 0 TT improper random variable, has density = 1 with respect to Lebesgue measure on RR + RR and TT are mutually independent If γγ jj > 0 then TT γγ jj has density xx 1 γγ jj 1 with respect to Lebesgue measure on RR + If γγ jj = 0 then RR jj TT γγ jj is defined to mean RR jj log TT and log TT has density ee xx with respect to Lebesgue measure on RR

Conditioning on threshold exceedance FF d.f. of RR in typical point RR TT γγ. XX uu = XX uu, PPPP XX uu xx XX uu 00 = PPrr XX uu xx PPrr XX uu xx 00 PPrr XX uu 00 Set XX = WW RR γγ and uu = σσ γγ and condition (formally) on TT uu 2 xx 2 1 not here 1 uu 1 xx 1 (R) HH xx = 0 FF tt γγ xx+ σσ ddtt FF tt γγ xx 00+ σσ γγ γγ 0 FF(tt γγσσ γγ )ddtt is a GP and all GP-s can be obtained this way dddd

(R) HH rr xx = Three representations 0 FFRR tt γγ xx+ σσ ddtt FF γγ RR tt γγ xx 00+ σσ dddd γγ 0 FF RR (tt γγσσ γγ )ddtt is a GP and all GP-s can be obtained this way (R ss ) HH ss xx = 0 FFSS 1 γγ log xx+σσ γγ +log tt ddtt FF SS( 1 γγ log xx 00+σσ γγ 0 FF SS (log tt)ddtt is a GP and all GP-s can be obtained this way +log tt) dddd (R ss ) HH SS xx = 0 1 FF SS = FFSS 0 1 γγ log xx + σσ γγ + log tt dddd 1 γγ log xx + σσ γγ tt ee tt dddd for SS = SS max SS jj and all GP-s can be obtained this way Ferreira de Haan (2014) representation

Densities Explicit formulas for densities for all three representations Formula for (R ss ) density simplest!! Censored likelihood contributions computable

Estimation Likelihood for (R): NN dd 0 tt jj=1 γγ jj ff tt γγ xx ii + γγ σσ ii=1 0 FF(tt γγγγ σσ )ddtt dddd One-dimensional integrals, so relatively easy to compute if ff, FF are easy to compute. In some cases integrals can be computed analytically Censoring makes computation harder Parenthesis: Coefficient of asymptotic dependence: Pr(FF 1 (XX 1 ) > qq,, FF dd (XX dd ) > qq) χχ dd : = lim. qq 1 1 qq GEV and GP modelling aimed at asymptotic dependence, i.e. χχ > 0 for some subset of components. Substantial literature (Heffernan, Tawn, Resnick, Wadsworth, ) on modelling under asymptotic independence, when χχ = 0

Parametrization and identifiability Understanding is on the (R) scale Choice between (R), (R ss ), and (R ss ) a choice of parametrization σσ and γγ parameters of conditional margins Replacing RR by ZZ γγ RR doesn t change HH rr. Similar properties of HH ss and HH ss (R ss ) and (R ss ) continuous at γγ ii = 0. Not so for (R) Much left to explore

Probabilities and conditional probabilities Probabilities of general events given by same formula as HH rr Conditional densities same type of formulas Conditional probabilities same type of formulas (prediction) if γγ 1 = = γγ dd =: γγ then weighted sums of components, conditioned to be positive, are also GP

If γγ 1 = = γγ dd γγ then weighted sums of components, conditioned to be positive, are also GP RR Pf. If σσ = TT γγ γγ (RR 1 σσ 1, RR 2 σσ 2 ) then the sum of the TT γγ γγ TT γγ γγ components equals RR 1+RR 2 σσ 1+σσ 2 =: RR σσ, and by the TT γγ γγ TT γγ γγ onedimensional (R) representation, the conditional distribution of RR σσ given that it is positive is a one-dimensional GP TT γγ γγ distribution with parameters σσ and γγ

Simulation 1. Direct for (R ss ) 2. Change of measure + rejection sampling for (R) and (R ss ) 3. Change of measure + MCMC for (R) and (R ss ) 4. Approximate method for (R) and (R ss )

Before Gudrun After Gudrun SEK 2,8 billion loss (LF) Windstorm Gudrun, January 2005 55% of total loss 1982-2005 Remeber: analysis based on data 1982-1993: 1% chance biggest damage next 15-year period will be more than SEK 2,5 billion??? Bivariate logistic PoT analysis basered on data 1982-2005: 10% chance biggest damage next 15-year period will be more than SEK 7 billion (this time we didn t miss damage to forrest but did we miss something else?) Brodin & Rootzén, Insurance 2009

Landslides Landslide caused by one day with very extreme rainfall, or by two consequtive days with extreme rainfall. Simplest model: XX 1, XX 2, independent positive variables, γγ 1 = γγ 2 = γγ RR TT γγ = XX 1 XX 2 TT γγ, XX 1 + XX 2 TT γγ = XX 1 XX 2 TT γγ, XX 1 XX 2 + XX 1 XX 2 TT γγ = rainiest day, rainest day + rainest adjacent day

Slide: Anna Kiriliouk

Slides: Jennifer Wadsworth

Modelling strategy

Prediction of influenza epidemics in France yy 1 =#cases week 3,, yy 8 =#cases week 11 γγ = 00 (from data), RR ii NN(mm ii, ss ii 2 ), mm 1 = 0 Model: conditional distribution of RR σσ log TT given that RR 1 > σσ log TT Total #cases GP with γγ = 0, σσ = σσ 1 + σσ 8 LR-test of parabolic form for the mm ii, equality of the ss ii 2 #cases per week normal epidemics extreme epidemics

Challenges Model construction Parametrization Time series Prediction Asymptotics H. Rootzén, J. Segers, and J.L. Wadsworth Multivariate peaks over thresholds models, ArXive 2016