Bayesian Model Averaging for Multivariate Extreme Values

Bayesian Model Averaging for Multivariate Extreme Values Philippe Naveau naveau@lsce.ipsl.fr Laboratoire des Sciences du Climat et l Environnement (LSCE) Gif-sur-Yvette, France joint work with A. Sabourin and A-L. Fougères FP7-ACQWA, GIS-PEPER, MIRACLE & ANR-McSim, MOPERA 14 novembre 2011

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Air pollutants (Leeds, UK, winter 94-98, daily max) NO vs. PM10 (left), SO2 vs. PM10 (center), and SO2 vs. NO (right) Heffernan& Tawn 2004, Boldi & Davison, 2007, Cooley, Davis, Naveau, 2010 0 50 100 150 200 0 200 400 600 800 1000 PM10 NO 0 50 100 150 200 0 100 200 300 400 500 PM10 SO2 0 200 400 600 800 1000 0 100 200 300 400 500 NO SO2

Typical question What is the probability of observing data in the blue box?

100 largest extremes NO 0 200 400 SO2 0 200 400 PM 10 0 200 400 0 200 400 PM 10 0 200 400 NO 0 200 400 S02

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Multivariate Extreme Value Theory (de Haan, Resnick and others) Maxima Max-stability G t (tz) =G(z) Regularly varying High quantiles Scaling property Λ(tA z )=t 1 Λ(A z ) Tail behavior Counting exceedances

Siméon Denis Poisson (1781-1840) 0 20 40 60 80 100 1900 1920 1940 1960 1980 2000 Counting excesses As a sum of random binary events, the variable N n that counts the number of events above the threshold u n has mean n Pr(X > u n) Poisson s theorem in 1837 If u n such that lim n Pr(X > un) =λ (0, ). n then N n follows approximately a Poisson variable N.

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Y_i/a_n A y * * * * * * * * * * * * * * x X_i/a_n

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Poisson again If then lim E(Nn(A)) = Λ(A), n lim P(Nn(A) =0) =P(N(A) =0) =exp( Λ(A)) n

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Poisson again If then lim E(Nn(A)) = Λ(A), n lim P(Nn(A) =0) =P(N(A) =0) =exp( Λ(A)) n Two questions What is the sequence a n? What are its properties of Λ(A)?

Back to univariate case : Fréchet margins We know for the univariate GEV case with heavy-tailed and lim P(max(X 1,...,X n n)/a n x) =exp( x α ) with a n such that P(X > a n)=1/n Poisson condition with lim np(x/an Ax) =Λx(Ax) n Λ x(a x)=x α, for A x =[x, )

Scaling property Univariate case with Λ x(a x)=x α Λ x(ta x)=t α Λ x(a x) Multivariate case Λ(tA) =t α Λ(A)

Scaling property : an essential property of inference Λ(tA) =t α Λ(A) t A t α Λ t α A Area with data points

Interpreting the scaling property Λ(tA) =t α Λ(A) with α = 1 and y = y 1 + y 2 t y 1 + y 2 = t { {y : y/ y B and y >t} 1 y 1 + y 2 =1 B 1 t

Interpreting the scaling property Λ(tA) =t 1 Λ(A) A special case A = {x : x/r B and r > 1} where x, r = x and B any set belonging to the unit sphere A surprising property ta = {tx : x/r B and r > 1}, = {u : u/ u B and u > t}, with u = tx. This implies Λ({u : u/ u B and u > t}) =t 1 H (B) where H(.) spectral measure restricted to the unit sphere

Interpreting the scaling property Λ(tA) =t 1 Λ(A) A special case A = {x : x/r B and r > 1} where x, r = x and B any set belonging to the unit sphere A surprising property This implies ta = {tx : x/r B and r > 1}, = {u : u/ u B and u > t}, with u = tx. Λ({u : u/ u B and u > t}) =t 1 H (B) where H(.) spectral measure restricted to the unit sphere Independence between the radius r = x and the spectral measure The dependence among extremes is only captured by the spectral measure

Polar coordinates in 3D Radius r = x 1 + x 2 + x 3 and angle vector : w 1 = x 1 r, w 2 = x 2 r, w 3 = x 3 r

100 largest extremes NO 0 200 400 SO2 0 200 400 PM 10 0 200 400 0 200 400 PM 10 0 200 400 NO 0 200 400 S02

Dependence among the 100 angles W =(W 1, W 2, W 3 ) NO 0.00 0.35 0.71 1.06 1.41 SO 2 PM 10

Our main problems How to find appropriate models to describe the dependence over the simplex? How to infer the parameters of our models? How to combine competing models?

An unique moment constraint the spectral measure H R Simplex w idh(w) = 1 d

An unique moment constraint the spectral measure H R Simplex w idh(w) = 1 d Non-parametric versus parametric In theory, there is no difference between theory and practice. But, in practice, there is. Jan L. A. van de Snepscheut or Yogi Berra

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Objective Compute the posterior predictive density of the quantity of interest h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Priors Priors Averaged Model } likelihood h(w (θ 1, θ 2 )) = p 1 h 1 (w θ 1 )+p 2 h 2 (w θ 2 ) Priors

Model 1 Marginal likelihood Z p(data model 1) = h 1(data θ 1) prior(θ 1) Computationally hard posterior weights p(model 1 data) = p(data model 1) p(model 1) p(data) Computationally Z easy Computationally easy Averaging Model 1 and 2 h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Multivariate Extreme Value Theory and BMA A mixture of max-stable distributions is not max -stable A mixture of spectral measures is still a valid spectral measure, i.e. R Simplex w idh(w) = 1 d for dh = P p j dh j

Multivariate Extreme Value Theory and BMA Proposition 3.1. Let H 1,...H J be J angular spectral measures associated to J max-stable measures ν j (.). Let(p 1,...p J ) be a vector of positive weights summing to one. If H corresponds to their weighted average H = J j=1 p jh j, then (i) H is a valid spectral measure for a multivariate max-stable random vector M with unit-fréchet margins, exponent measure ν([0, x] c )= J p j ν j ([0, x] c ) j=1 (ii) M has max-combination representation J M = d p j M j j=1

Choosing two spectral parametric densities Model 1 PB Pairwise Beta Cooley, Davis, Naveau, 2010 Model 2 NL Nested Asym Logistic Gumbel, 1960, Tawn, 1990

1.6 1.4 Motivation Data EVT BMA BMA+EVT Wrapping up Simulation from two spectral parametric densities Model 1 PB w2 Model 2 NL w2 2 0.8 2.2 1.2 0.4 2 2.8 2.6 2.4 2.6 1.8 0.8 2 1 10 0.5 5 0.1 1.6 0.8 1.4 1 2 2.4 2.2 3 10 1.8 1 0.6 0.2 0.4 10 5 0.001 w3 0.00 0.35 0.71 1.06 1.41 alpha = 0.9 beta[1] = 15 beta[2] = 8 beta[3] = 0.5 w1 0.00 0.35 0.71 1.06 1.41 w3 w1

Our two spectral parametric densities where with Model 1 PB hpb(w α, β) = 1 i<j d hi,j(w α, βi,j) hi,j(w α, βi,j) = Kd(α) wij 2α 1 (1 wij) (d 2)α d+2 Γ(2βij) Γ 2 (βij) wβi,j 1 i/ij w βi,j 1 j/ij wi wij = wi + wj, w i/ij = wi + wj where hnl(w1w2) = 1 α 3α u v Model 2 2 α α NL (1 w12) 1 α 1 (w1w2) 1 α 12 α 1 u2(α12 1) v α 3 + 1 α12 α12α uα12 2 v α 2 1 α = w 12 α 1 + w2 1 α 12 α = u α12 +(1 (w1 + w2)) 1 α. S

Simulation from two spectral parametric densities Algorithm 1. Model 1 PB (i) Choose uniformly a pair (i <j) (ii) Generate independently R ij Beta(2α +1, (d 2)α) Θ ij Beta(β i,j, β i,j) S Dirichlet d 2(1,...,1) (iii) Change variables back to define W via Model 2 NL Algorithm 2. (Stephenson, 2003) (i) Generate independently S PS(α) and S12 PS(α12). (ii) Simulate three independent standard exponentials E1,E2,E3 (iii) Set for i {1, 2}, Xi = S12S 1/α αα12 12 Ei and X3 = S α. E3 Then, X =(X1,X2,X3) has the desired distribution. Proof. If is generated according to the above algorithm, the cond W i = R ijθ ij W j = R ij(1 Θ ij) W [ (i,j)] =(1 R)S

Metropolis-Hasting at work Model 1 Marginal likelihood Z p(data model 1) = h 1(data θ 1) prior(θ 1) Computationally hard posterior weights p(model 1 data) = p(data model 1) p(model 1) p(data) Computationally Z easy Computationally easy Averaging Model 1 and 2 h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

A few simulation results (with parameters tuned to our Leeds data) True model = PB Model Nsim p(data model) stdev(p(data model)) p(model data) PB 50 10 3 1.25 10 39 5 10 37 1 NL 50 10 3 4.76 10 17 6.5 10 16 6 10 22 True model = NL Model Nsim p(data model) stdev(p(data model)) p(model data) PB 50 10 3 1.19 10 37 7 10 35 2 10 7 NL 50 10 3 6.23 10 43 3 10 42 1

A few simulation results (with parameters tuned to our Leeds data) True model = Mixture (PB + NL)/2 Model Nsim p(data model) stdev(p(data model)) p(model data) PB 100 10 3 1.85 10 18 8 10 16 0.45 NL 100 10 3 2.27 10 18 4 10 16 0.55 Kullback-Leibler divergence (small = good) KL(h true, h PB) 0.084 KL(h true, h NL) 0.187 KL(h true, h BMA) 0.014

Back to our example Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Back to our example : priors (dotted) and posteriors (histo) for each parameter ModelPB Model NL log( ) log( [1]) alpha alpha12 0.0 1.0 2.0 3.0 0.0 0.5 1.0 1.5 3 2 1 0 1 2 3 log( [2]) 3 2 1 0 1 2 3 0.0 0.4 0.8 1.2 0.0 1.0 2.0 3 2 1 0 1 2 3 log( [3]) 3 2 1 0 1 2 3 0 5 10 15 0.0 0.4 0.8 0 2 4 6 8 10 12 0.0 0.4 0.8

0.5 0.8 Motivation Data EVT BMA BMA+EVT Wrapping up Back to our example : posteriors spectral densities ModelPB Model NL w2 PB w2 NL 0.4 0.1 0.2 0.3 0.6 1.3 1.1 0.6 0.9 1.4 1.7 0.6 1.3 2 1.1 5.8 1 0.9 4.6 0.9 1.7 0.8 1.6 1.4 1.3 1.1 1 0.8 0.6 0.9 1.2 2 1.2 0.2 1 02 2 0.3 0.1 1.7 0.4 1.6 0.5 1.4 0.6 0.5 0.4 0.1 1.1 0.8 0.3 0.9 1.2 1.3 1 0.8 1.2 4.6 1.6 0.00 0.35 0.71 1.06 1.41 w3 w1 0.00 0.35 0.71 1.06 1.41 w3 w1

Back to our example ModelPB Model NL

Back to our example BMA verdict Model Nsim p(data model) stdev(p(data model)) p(model data) PB 300 10 3 1.05 10 38 6.7 10 36 1.11 10 13 NL 300 10 3 9.53 10 50 1.6 10 49 1

Take home messages Feasibility of implementing BMA for multivariate extremes (in low dimensions) Computations can quickly become intensive The choice and number of parametric models are important Asymmetric nested logistic well tailored to represent bridges Pairwise beta model is flexible and be generalized (see Ballani and Schlater s extensions) More research needed to extend BMA to mixtures Going fully Bayesian non-parametric (Segers and colleagues, Boldi and Davison)