Bayesian Model Averaging for Multivariate Extreme Values

Similar documents
Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Two practical tools for rainfall weather generators

Bayesian model mergings for multivariate extremes Application to regional predetermination of oods with incomplete data

Non parametric modeling of multivariate extremes with Dirichlet mixtures

New Classes of Multivariate Survival Functions

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

A Conditional Approach to Modeling Multivariate Extremes

Extreme Value Analysis and Spatial Extremes

Bivariate generalized Pareto distribution

Bayesian Modelling of Extreme Rainfall Data

Models and estimation.

Multivariate Non-Normally Distributed Random Variables

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

MULTIVARIATE EXTREMES AND RISK

Bayesian Multivariate Extreme Value Thresholding for Environmental Hazards

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Approximating the Conditional Density Given Large Observed Values via a Multivariate Extremes Framework, with Application to Environmental Data

Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

Semi-parametric estimation of non-stationary Pickands functions

Statistics of Extremes

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Bayesian inference for multivariate extreme value distributions

Madogram and asymptotic independence among maxima

Prediction for Max-Stable Processes via an Approximated Conditional Density

On the Estimation and Application of Max-Stable Processes

Some conditional extremes of a Markov chain

Models for Spatial Extremes. Dan Cooley Department of Statistics Colorado State University. Work supported in part by NSF-DMS

Bernstein polynomial angular densities of multivariate extreme value distributions

Skew Generalized Extreme Value Distribution: Probability Weighted Moments Estimation and Application to Block Maxima Procedure

Nonlinear Time Series Modeling

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Extremogram and Ex-Periodogram for heavy-tailed time series

FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY

CPSC 540: Machine Learning

arxiv: v1 [stat.ap] 28 Nov 2014

Introduction to Probabilistic Graphical Models: Exercises

Multivariate Heavy Tails, Asymptotic Independence and Beyond

A conditional approach for multivariate extreme values

Statistics of Extremes

Estimating Bivariate Tail: a copula based approach

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

The extremal elliptical model: Theoretical properties and statistical inference

Foundations of Nonparametric Bayesian Methods

Bayesian Inference for Clustered Extremes

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Extreme Value for Discrete Random Variables Applied to Avalanche Counts

Extremogram and ex-periodogram for heavy-tailed time series

ESTIMATING BIVARIATE TAIL

Bayes methods for categorical data. April 25, 2017

Practical conditions on Markov chains for weak convergence of tail empirical processes

Max-stable processes: Theory and Inference

arxiv: v2 [stat.me] 25 Sep 2012

Data. Climate model data from CMIP3

Statistical Methods for Clusters of Extreme Values

Max stable Processes & Random Fields: Representations, Models, and Prediction

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Extreme Value Theory and Applications

Workshop Copulas and Extremes

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Bayesian Modeling of Air Pollution Extremes Using Nested Multivariate Max-Stable Processes

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Bayesian Methods for Machine Learning

Generalized additive modelling of hydrological sample extremes

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

CS Lecture 19. Exponential Families & Expectation Propagation

Combining regional estimation and historical floods: a multivariate semi-parametric peaks-over-threshold model with censored data

Extreme Values on Spatial Fields p. 1/1

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Curve Fitting Re-visited, Bishop1.2.5

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Nonparametric Bayesian Methods - Lecture I

Exponential Families

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

A Bayesian Spatial Model for Exceedances Over a Threshold

A weather generator for simulating multivariate climatic series

Spatial Extremes in Atmospheric Problems

Computational statistics

Learning Bayesian network : Given structure and completely observed data

Financial Econometrics and Volatility Models Extreme Value Theory

Tail dependence in bivariate skew-normal and skew-t distributions

Modelling extreme-value dependence in high dimensions using threshold exceedances

Supplementary Material for On the evaluation of climate model simulated precipitation extremes

Estimation of spatial max-stable models using threshold exceedances

Assessing Dependence in Extreme Values

13 : Variational Inference: Loopy Belief Propagation and Mean Field

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

RISK ANALYSIS AND EXTREMES

Math 576: Quantitative Risk Management

Bayesian model selection for computer model validation via mixture model estimation

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Conditioned limit laws for inverted max-stable processes

Intelligent Systems:

Bayesian non-parametric model to longitudinally predict churn

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Invariant HPD credible sets and MAP estimators

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Transcription:

Bayesian Model Averaging for Multivariate Extreme Values Philippe Naveau naveau@lsce.ipsl.fr Laboratoire des Sciences du Climat et l Environnement (LSCE) Gif-sur-Yvette, France joint work with A. Sabourin and A-L. Fougères FP7-ACQWA, GIS-PEPER, MIRACLE & ANR-McSim, MOPERA 14 novembre 2011

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Air pollutants (Leeds, UK, winter 94-98, daily max) NO vs. PM10 (left), SO2 vs. PM10 (center), and SO2 vs. NO (right) Heffernan& Tawn 2004, Boldi & Davison, 2007, Cooley, Davis, Naveau, 2010 0 50 100 150 200 0 200 400 600 800 1000 PM10 NO 0 50 100 150 200 0 100 200 300 400 500 PM10 SO2 0 200 400 600 800 1000 0 100 200 300 400 500 NO SO2

Typical question What is the probability of observing data in the blue box?

100 largest extremes NO 0 200 400 SO2 0 200 400 PM 10 0 200 400 0 200 400 PM 10 0 200 400 NO 0 200 400 S02

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Multivariate Extreme Value Theory (de Haan, Resnick and others) Maxima Max-stability G t (tz) =G(z) Regularly varying High quantiles Scaling property Λ(tA z )=t 1 Λ(A z ) Tail behavior Counting exceedances

Siméon Denis Poisson (1781-1840) 0 20 40 60 80 100 1900 1920 1940 1960 1980 2000 Counting excesses As a sum of random binary events, the variable N n that counts the number of events above the threshold u n has mean n Pr(X > u n) Poisson s theorem in 1837 If u n such that lim n Pr(X > un) =λ (0, ). n then N n follows approximately a Poisson variable N.

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Y_i/a_n A y * * * * * * * * * * * * * * x X_i/a_n

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Poisson again If then lim E(Nn(A)) = Λ(A), n lim P(Nn(A) =0) =P(N(A) =0) =exp( Λ(A)) n

Still counting P(max(X 1,...,X n)/a n x, max(y 1,...,Y n)/a n y) =P(N n(a) =0) Poisson again If then lim E(Nn(A)) = Λ(A), n lim P(Nn(A) =0) =P(N(A) =0) =exp( Λ(A)) n Two questions What is the sequence a n? What are its properties of Λ(A)?

Back to univariate case : Fréchet margins We know for the univariate GEV case with heavy-tailed and lim P(max(X 1,...,X n n)/a n x) =exp( x α ) with a n such that P(X > a n)=1/n Poisson condition with lim np(x/an Ax) =Λx(Ax) n Λ x(a x)=x α, for A x =[x, )

Scaling property Univariate case with Λ x(a x)=x α Λ x(ta x)=t α Λ x(a x) Multivariate case Λ(tA) =t α Λ(A)

Scaling property : an essential property of inference Λ(tA) =t α Λ(A) t A t α Λ t α A Area with data points

Interpreting the scaling property Λ(tA) =t α Λ(A) with α = 1 and y = y 1 + y 2 t y 1 + y 2 = t { {y : y/ y B and y >t} 1 y 1 + y 2 =1 B 1 t

Interpreting the scaling property Λ(tA) =t 1 Λ(A) A special case A = {x : x/r B and r > 1} where x, r = x and B any set belonging to the unit sphere A surprising property ta = {tx : x/r B and r > 1}, = {u : u/ u B and u > t}, with u = tx. This implies Λ({u : u/ u B and u > t}) =t 1 H (B) where H(.) spectral measure restricted to the unit sphere

Interpreting the scaling property Λ(tA) =t 1 Λ(A) A special case A = {x : x/r B and r > 1} where x, r = x and B any set belonging to the unit sphere A surprising property This implies ta = {tx : x/r B and r > 1}, = {u : u/ u B and u > t}, with u = tx. Λ({u : u/ u B and u > t}) =t 1 H (B) where H(.) spectral measure restricted to the unit sphere Independence between the radius r = x and the spectral measure The dependence among extremes is only captured by the spectral measure

Polar coordinates in 3D Radius r = x 1 + x 2 + x 3 and angle vector : w 1 = x 1 r, w 2 = x 2 r, w 3 = x 3 r

100 largest extremes NO 0 200 400 SO2 0 200 400 PM 10 0 200 400 0 200 400 PM 10 0 200 400 NO 0 200 400 S02

Dependence among the 100 angles W =(W 1, W 2, W 3 ) NO 0.00 0.35 0.71 1.06 1.41 SO 2 PM 10

Our main problems How to find appropriate models to describe the dependence over the simplex? How to infer the parameters of our models? How to combine competing models?

An unique moment constraint the spectral measure H R Simplex w idh(w) = 1 d

An unique moment constraint the spectral measure H R Simplex w idh(w) = 1 d Non-parametric versus parametric In theory, there is no difference between theory and practice. But, in practice, there is. Jan L. A. van de Snepscheut or Yogi Berra

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Objective Compute the posterior predictive density of the quantity of interest h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Objective Compute the posterior predictive density of the quantity of interest h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data) BMA : Sloughter et al. (2010), Raftery et al. (2005), Hoeting etal. (1999)

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Objective Compute the posterior predictive density of the quantity of interest h(w data) =p(model 1 data) h 1 (w data) + p(model 2 data) h 2 (w data) Z h 1 (w data) = h 1 (w data,θ 1 )[posterior of θ 1 data]

Bayesian model averaging (BMA) Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Objective Compute the posterior predictive density of the quantity of interest h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data) p(model 1 data) = p(data model 1) p(model 1) p(data) with Z p(data model 1) = marginal likelihood wrt θ 1 = h 1 (data θ 1 ) prior(θ 1 )

Model 1 Model 2 S collection likelihood of distribution functions likelihood on S: j {h 1 (. θ 1 ), θ j Θ 1 }. {h 2 (. θ 2 ), θ 2 Θ 2 } verage consists in adding a prior layer to the m Priors Priors Averaged Model } likelihood h(w (θ 1, θ 2 )) = p 1 h 1 (w θ 1 )+p 2 h 2 (w θ 2 ) Priors

Model 1 Marginal likelihood Z p(data model 1) = h 1(data θ 1) prior(θ 1) Computationally hard posterior weights p(model 1 data) = p(data model 1) p(model 1) p(data) Computationally Z easy Computationally easy Averaging Model 1 and 2 h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Multivariate Extreme Value Theory and BMA A mixture of max-stable distributions is not max -stable A mixture of spectral measures is still a valid spectral measure, i.e. R Simplex w idh(w) = 1 d for dh = P p j dh j

Multivariate Extreme Value Theory and BMA Proposition 3.1. Let H 1,...H J be J angular spectral measures associated to J max-stable measures ν j (.). Let(p 1,...p J ) be a vector of positive weights summing to one. If H corresponds to their weighted average H = J j=1 p jh j, then (i) H is a valid spectral measure for a multivariate max-stable random vector M with unit-fréchet margins, exponent measure ν([0, x] c )= J p j ν j ([0, x] c ) j=1 (ii) M has max-combination representation J M = d p j M j j=1

Choosing two spectral parametric densities Model 1 PB Pairwise Beta Cooley, Davis, Naveau, 2010 Model 2 NL Nested Asym Logistic Gumbel, 1960, Tawn, 1990

1.6 1.4 Motivation Data EVT BMA BMA+EVT Wrapping up Simulation from two spectral parametric densities Model 1 PB w2 Model 2 NL w2 2 0.8 2.2 1.2 0.4 2 2.8 2.6 2.4 2.6 1.8 0.8 2 1 10 0.5 5 0.1 1.6 0.8 1.4 1 2 2.4 2.2 3 10 1.8 1 0.6 0.2 0.4 10 5 0.001 w3 0.00 0.35 0.71 1.06 1.41 alpha = 0.9 beta[1] = 15 beta[2] = 8 beta[3] = 0.5 w1 0.00 0.35 0.71 1.06 1.41 w3 w1

Our two spectral parametric densities where with Model 1 PB hpb(w α, β) = 1 i<j d hi,j(w α, βi,j) hi,j(w α, βi,j) = Kd(α) wij 2α 1 (1 wij) (d 2)α d+2 Γ(2βij) Γ 2 (βij) wβi,j 1 i/ij w βi,j 1 j/ij wi wij = wi + wj, w i/ij = wi + wj where hnl(w1w2) = 1 α 3α u v Model 2 2 α α NL (1 w12) 1 α 1 (w1w2) 1 α 12 α 1 u2(α12 1) v α 3 + 1 α12 α12α uα12 2 v α 2 1 α = w 12 α 1 + w2 1 α 12 α = u α12 +(1 (w1 + w2)) 1 α. S

Simulation from two spectral parametric densities Algorithm 1. Model 1 PB (i) Choose uniformly a pair (i <j) (ii) Generate independently R ij Beta(2α +1, (d 2)α) Θ ij Beta(β i,j, β i,j) S Dirichlet d 2(1,...,1) (iii) Change variables back to define W via Model 2 NL Algorithm 2. (Stephenson, 2003) (i) Generate independently S PS(α) and S12 PS(α12). (ii) Simulate three independent standard exponentials E1,E2,E3 (iii) Set for i {1, 2}, Xi = S12S 1/α αα12 12 Ei and X3 = S α. E3 Then, X =(X1,X2,X3) has the desired distribution. Proof. If is generated according to the above algorithm, the cond W i = R ijθ ij W j = R ij(1 Θ ij) W [ (i,j)] =(1 R)S

Metropolis-Hasting at work Model 1 Marginal likelihood Z p(data model 1) = h 1(data θ 1) prior(θ 1) Computationally hard posterior weights p(model 1 data) = p(data model 1) p(model 1) p(data) Computationally Z easy Computationally easy Averaging Model 1 and 2 h(w data) =p(model 1 data) h 1 (w data)+p(model 2 data) h 2 (w data)

A few simulation results (with parameters tuned to our Leeds data) True model = PB Model Nsim p(data model) stdev(p(data model)) p(model data) PB 50 10 3 1.25 10 39 5 10 37 1 NL 50 10 3 4.76 10 17 6.5 10 16 6 10 22

A few simulation results (with parameters tuned to our Leeds data) True model = PB Model Nsim p(data model) stdev(p(data model)) p(model data) PB 50 10 3 1.25 10 39 5 10 37 1 NL 50 10 3 4.76 10 17 6.5 10 16 6 10 22 True model = NL Model Nsim p(data model) stdev(p(data model)) p(model data) PB 50 10 3 1.19 10 37 7 10 35 2 10 7 NL 50 10 3 6.23 10 43 3 10 42 1

A few simulation results (with parameters tuned to our Leeds data) True model = Mixture (PB + NL)/2 Model Nsim p(data model) stdev(p(data model)) p(model data) PB 100 10 3 1.85 10 18 8 10 16 0.45 NL 100 10 3 2.27 10 18 4 10 16 0.55 Kullback-Leibler divergence (small = good) KL(h true, h PB) 0.084 KL(h true, h NL) 0.187 KL(h true, h BMA) 0.014

Back to our example Environmental Data Analysis Extreme Value Theory Bayesian Model Averaging

Back to our example : priors (dotted) and posteriors (histo) for each parameter ModelPB Model NL log( ) log( [1]) alpha alpha12 0.0 1.0 2.0 3.0 0.0 0.5 1.0 1.5 3 2 1 0 1 2 3 log( [2]) 3 2 1 0 1 2 3 0.0 0.4 0.8 1.2 0.0 1.0 2.0 3 2 1 0 1 2 3 log( [3]) 3 2 1 0 1 2 3 0 5 10 15 0.0 0.4 0.8 0 2 4 6 8 10 12 0.0 0.4 0.8

0.5 0.8 Motivation Data EVT BMA BMA+EVT Wrapping up Back to our example : posteriors spectral densities ModelPB Model NL w2 PB w2 NL 0.4 0.1 0.2 0.3 0.6 1.3 1.1 0.6 0.9 1.4 1.7 0.6 1.3 2 1.1 5.8 1 0.9 4.6 0.9 1.7 0.8 1.6 1.4 1.3 1.1 1 0.8 0.6 0.9 1.2 2 1.2 0.2 1 02 2 0.3 0.1 1.7 0.4 1.6 0.5 1.4 0.6 0.5 0.4 0.1 1.1 0.8 0.3 0.9 1.2 1.3 1 0.8 1.2 4.6 1.6 0.00 0.35 0.71 1.06 1.41 w3 w1 0.00 0.35 0.71 1.06 1.41 w3 w1

Back to our example ModelPB Model NL

Back to our example BMA verdict Model Nsim p(data model) stdev(p(data model)) p(model data) PB 300 10 3 1.05 10 38 6.7 10 36 1.11 10 13 NL 300 10 3 9.53 10 50 1.6 10 49 1

Take home messages Feasibility of implementing BMA for multivariate extremes (in low dimensions) Computations can quickly become intensive The choice and number of parametric models are important Asymmetric nested logistic well tailored to represent bridges Pairwise beta model is flexible and be generalized (see Ballani and Schlater s extensions) More research needed to extend BMA to mixtures Going fully Bayesian non-parametric (Segers and colleagues, Boldi and Davison)