Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Similar documents
Non parametric modeling of multivariate extremes with Dirichlet mixtures

Bayesian model mergings for multivariate extremes Application to regional predetermination of oods with incomplete data

arxiv: v1 [stat.ap] 28 Nov 2014

Bayesian Model Averaging for Multivariate Extreme Values

COMBINING REGIONAL ESTIMATION AND HISTORICAL FLOODS: A MULTIVARIATE SEMI-PARAMETRIC PEAKS-OVER-THRESHOLD MODEL WITH CENSORED DATA

Combining regional estimation and historical floods: a multivariate semi-parametric peaks-over-threshold model with censored data

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Workshop Copulas and Extremes

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Bayesian Modelling of Extreme Rainfall Data

A Conditional Approach to Modeling Multivariate Extremes

Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

New Classes of Multivariate Survival Functions

Extreme Value Analysis and Spatial Extremes

Bayesian inference for multivariate extreme value distributions

Bivariate generalized Pareto distribution

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Semi-parametric estimation of non-stationary Pickands functions

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

ESTIMATING BIVARIATE TAIL

Spatial extreme value theory and properties of max-stable processes Poitiers, November 8-10, 2012

MULTIVARIATE EXTREMES AND RISK

Models for Spatial Extremes. Dan Cooley Department of Statistics Colorado State University. Work supported in part by NSF-DMS

Two practical tools for rainfall weather generators

Financial Econometrics and Volatility Models Extreme Value Theory

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Data. Climate model data from CMIP3

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Bernstein polynomial angular densities of multivariate extreme value distributions

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Some conditional extremes of a Markov chain

Statistics of Extremes

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

CTDL-Positive Stable Frailty Model

Nonparametric Bayesian Methods - Lecture I

Spatial Extremes in Atmospheric Problems

The extremal elliptical model: Theoretical properties and statistical inference

Estimating Bivariate Tail: a copula based approach

Modelling extreme-value dependence in high dimensions using threshold exceedances

Stat 5101 Lecture Notes

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Bivariate Analysis of Extreme Wave and Storm Surge Events. Determining the Failure Area of Structures

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Modeling of Conditional Distributions

Generalized additive modelling of hydrological sample extremes

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Estimation of the Angular Density in Multivariate Generalized Pareto Models

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

Bayesian Inference for Clustered Extremes

R&D Research Project: Scaling analysis of hydrometeorological time series data

Estimation of spatial max-stable models using threshold exceedances

CPSC 540: Machine Learning

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

arxiv: v1 [stat.me] 16 Apr 2012

High-frequency data modelling using Hawkes processes

Statistics of Extremes

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Models and estimation.

Modelling Operational Risk Using Bayesian Inference

ABC methods for phase-type distributions with applications in insurance risk problems

ON EXTREME VALUE ANALYSIS OF A SPATIAL PROCESS

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Introduction to Probabilistic Machine Learning

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

Multivariate generalized Pareto distributions

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING

Approximating the Conditional Density Given Large Observed Values via a Multivariate Extremes Framework, with Application to Environmental Data

Skew Generalized Extreme Value Distribution: Probability Weighted Moments Estimation and Application to Block Maxima Procedure

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Applications of Tail Dependence II: Investigating the Pineapple Express. Dan Cooley Grant Weller Department of Statistics Colorado State University

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Simulating Random Variables

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Bayesian nonparametric Poisson process modeling with applications

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Part 6: Multivariate Normal and Linear Models

A general mixed model approach for spatio-temporal regression data

Extreme Value Theory and Applications

Bayesian Multivariate Extreme Value Thresholding for Environmental Hazards

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

Modeling Spatial Extreme Events using Markov Random Field Priors

A spatio-temporal model for extreme precipitation simulated by a climate model

Financial Econometrics and Volatility Models Copulas

Bayes methods for categorical data. April 25, 2017

Use of a Gaussian copula for multivariate extreme value analysis: some case studies in hydrology

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Bayesian Models in Machine Learning

Learning Bayesian network : Given structure and completely observed data

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Prediction for Max-Stable Processes via an Approximated Conditional Density

Multivariate Non-Normally Distributed Random Variables

Introduction to Probabilistic Graphical Models: Exercises

Subject CS1 Actuarial Statistics 1 Core Principles

Transcription:

Bayesian nonparametrics for multivariate extremes including censored data Anne Sabourin PhD advisors: Anne-Laure Fougères (Lyon 1), Philippe Naveau (LSCE, Saclay). Joint work with Benjamin Renard, IRSTEA, France. EVT 2013, Vimeiro September 10, 2013 1

Censored Multivariate extremes: Why bother? ex: Daily streamow at 4 neighbouring sites (Gardons, Cévennes, France). Return levels for jointly extreme events? Recent, `clean' series very short Censored historical data availble from archives, variable `perception thresholds' for oods. Earliest: 1604. How to use all dierent kinds of data? 2

Censored data: pairwise plots Univariate series: Bivariate: Mialet 0 500 1000 1500 2000 2500 Anduze 0 1000 2000 3000 4000 5000 6000 Ales 0 500 1000 1500 2000 2500 0 1000 2000 3000 St Jean 0 1000 2000 3000 St Jean 0 1000 2000 3000 St Jean 3

Purposes Take into account as many data as possible Censored likelihood, integration problems Estimate marginal GPD parameters + dependence structure jointly additional Gibbs step in a MCMC algo. Probability of failure regions (return periods for jointly extreme events) Model for excesses (POT-Poisson). 4

Approach: use Dirichlet consistency properties Angular measure model on the simplex. Dirichlet distributions: Nice marginalization and conditioning properties. Dirichlet mixture model for angular measures Boldi & Davison, 2007 ; Sabourin & Naveau, 2013 : Flexible, non parametric. Reversible jumps - MCMC algorithm available. Adaptation needed: censored likelihood and variable threshold Data augmentation 5

0 1 5 10 1st component Excess Data: polar decomposition Vectorial obs Y t = (Y 1,t,..., Y d,t ); i.i.d. Margin Y j,t F j : Pareto above threshold v j. Dependence dened among the X j,t = 1/log F j,t (Y j, t): standard Fréchet Polar coordinates: R = X 1 = j X j : `radius' (norm), W = X R : `angle' X 2nd component 0 1 5 10 w B x S d W R B W simplex S d = {(w 1,... w d ) : w j = 1, w j 0}. 6

Angular measure models Joint distribution of excesses: product measure in polar coordinates. P(R > r, W B) 1 H(B) r H: angular measure, distribution of angles. Non parametric: Valid i w S i d dh(w) = 1 (1 i d) d Inference: Empirically, Einmahl et.al., 2001, Einmahl, Segers, 2009, Guillotte et al, 2011. : No explicit expression of asymptotic variance, Bayesian inference with d = 2 only. Restriction to parametric subclass Gumbel, Coles&Tawn, Cooley et.al.... : logistic family, Pairwise-Beta... Preferred here: Dirichlet Mixture model: Compromise Flexibility (weakly dense family) Uncertainty quantication through parameters, Bayesian implementation. 7

0.00 0.35 0.71 1.06 1.41 0.00 0.35 0.71 1.06 1.41 Dirichlet mixture model for angular measure Dirichlet distribution: generalization of Beta to higher dimensions. 1 location parameter (point on the simplex): `center' + 1 concentration parameter. w2 w2 w3 w1 w3 w1 Dirichlet mixture model, k components: h(w) = k m=1 p m diri m (w). Boldi, Davison, 2007 H valid center of mass of the location parameters = center of the simplex. 8

Issues: moments constraints and censored data (i) Sampling the posterior distribution with MCMC methods. Constraints Sampling issues Re-parametrization: No more constraint, sampling is manageable for d = 5: Sabourin, Naveau, 2013 (ii) Inference with censored data: Variable threshold: changing normalizing constants. Censored data = segments or boxes. Integrating the likelihood (density dr r 2 dh(w)) on rectangles? Sabourin, under review ; Sabourin, Renard, in preparation 9

Issues: moments constraints and censored data (i) Sampling the posterior distribution with MCMC methods. Constraints Sampling issues Re-parametrization: No more constraint, sampling is manageable for d = 5: Sabourin, Naveau, 2013 (ii) Inference with censored data: Variable threshold: changing normalizing constants. Censored data = segments or boxes. Integrating the likelihood (density dr r 2 dh(w)) on rectangles? Sabourin, under review ; Sabourin, Renard, in preparation 9

Poisson model for variable threshold {( t, X ) } t, 1 t n Poisson Process (Leb λ) n n on [0, 1] A u,n λ: ` exponent measure', with Dirichlet Mixture angular component dλ dr dw (r, w) = d r 2 h(w). `A i ' data overlapping threshold: included in Poisson likelihood as { } N ( t 2 t 1 ) 1 n n n A i = 0 10

`Censored' likelihood: model density integrated over boxes Ledford & Tawn, 1996: GEV parametric model, partially extreme data censored at threshold. Here: more general censoring framework, Poisson likelihood dλ, non parametric, no explicit expression. dx Two terms without closed form: Censored regions A i overlapping threshold: exp { (t 2 t 1 )λ(a i )} Classical censoring above threshold censored region dλ dx. 11

Data augmentation One more Gibbs step, no more numerical integration. Objective: sample [θ Obs]. (Parameter space: Θ) Additional variables (replace missing data component): Z Full conditionals [Z i Z j j, θ, Obs] explicit (Thanks Dirichlet): Gibbs sampling. Sample [z, θ Obs] + (augmented distribution) on Θ Z. 12

Censored regions above threshold Censored region dλ dx dx j1:j r : Generate missing components under univariate conditional distributions Z j 1:r [X missing X obs, θ] x2 Augmentation data Z j = [X censored X observed, θ] Censored interval u 2 /n Extremal region u 1 /n x1 Dirichlet Explicit univariate conditionals Exact sampling of censored data on censored interval 13

Censored regions overlapping threshold e (t 2,i t1,i )λ(a i ) augmentation Poisson process N i on E i A i. + Functional ϕ(n i ) X2 Augmentation Z i = PP(τ.λ) on E i φ(#{points in Ai}) U' 1 E i A i u 2 /n Censored region u 1 /n U' 2 X1 [z, θ Obs] }{{}... [N i ]ϕ(n i ) density terms, prior, augmented missing components 14

h Results on simulated data Angular measure: Dirichlet, d = 4, k = 3 mixture components Censoring: same pattern as real data. S3 S4 0 1000 2000 3000 4000 0 1 2 3 4 5 discharge : 0 3350 6700 10051 13401 dependent independent true 0 1000 2000 3000 4000 5000 6000 S3 0.0 0.2 0.4 0.6 0.8 1.0 X3/( X3 + X4 ) 0 1 2 3 return period (years, log scale) Pair (S3, S4): data, predictive a posteriori of the angular measure, quantile curve for S 3. 15

Conclusion Accounting for all types of censored data: manageable using Dirichlet consistency properties (marignalization, conditioning) High dimension (GCM grid, spatial elds)? Impose a reasonable structure (sparse) on Dirichlet parameters Dirichlet Process? Challenges : Discrete random measure continuous framework for GPP's. 16

Bibliographie I M.-O. Boldi and A. C. Davison. A mixture model for multivariate extremes. JRSS: Series B (Statistical Methodology), 69(2):217229, 2007. Sabourin, A., Naveau, P. Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization. CSDA, 2013. Ledford, A. and Tawn, J. (1996). Statistics for near independence in multivariate extreme values. Biometrika, 83(1):169187. Schnedler, W. (2005). Likelihood estimation for censored random vectors. Econometric Reviews, 24(2):195217. Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398):528540. Van Dyk, D. and Meng, X. (2001). The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):150. Gómez, G., Calle, M. L., and Oller, R. Frequentist and bayesian approaches for interval-censored data. Statistical Papers, 45(2):139173, 2004. 17

Bibliographie II Hosking, J.R.M. and Wallis, J.R.. Regional frequency analysis: an approach based on L-moments Cambridge University Press, 2005. Neppel, L., Renard, B., Lang, M., Ayral, P., Coeur, D., Gaume, E., Jacob, N., Payrastre, O., Pobanz, K., and Vinet, F. (2010). Flood frequency analysis using historical data: accounting for random and systematic errors. Hydrological Sciences JournalJournal des Sciences Hydrologiques, 55(2):192208. 18

Details: Augmentation Poisson process e (t 2,i t1,i )λ(a i )?? Proposal Z i PP( λ i ) ; dene f i s.t. { E e } j f i (Z i j ) = Lapl Z i (f i ) = e 1 e f d λ i = e n i λ(a i ) [Z, θ O] + [θ] j {[Z j1:r O, θ] } i { [Z i θ]e j f i (Z i j ) }, with [Z i θ]e j f i (Z i j ) dz i = e n i λ(a i ). 19