Generalized additive modelling of hydrological sample extremes

Similar documents
High-frequency data modelling using Hawkes processes

High-frequency data modelling using Hawkes processes

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Semi-parametric estimation of non-stationary Pickands functions

Modeling Operational Risk Depending on Covariates. An Empirical Investigation.

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Extreme Value Analysis and Spatial Extremes

Extreme Value Theory and Applications

RISK ANALYSIS AND EXTREMES

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Peaks-Over-Threshold Modelling of Environmental Data

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Sharp statistical tools Statistics for extremes

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

A Conditional Approach to Modeling Multivariate Extremes

Financial Econometrics and Volatility Models Extreme Value Theory

R&D Research Project: Scaling analysis of hydrometeorological time series data

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

Assessing Dependence in Extreme Values

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

Statistics for extreme & sparse data

Modelling geoadditive survival data

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Bayesian Inference for Clustered Extremes

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Parameter Estimation

Shape of the return probability density function and extreme value statistics

Quantitative Modeling of Operational Risk: Between g-and-h and EVT

Overview of Extreme Value Analysis (EVA)

Construction of confidence intervals for extreme rainfall quantiles

Lecture 26: Likelihood ratio tests

Bayesian covariate models in extreme value analysis

Nonparametric inference in hidden Markov and related models

Bayesian Modelling of Extreme Rainfall Data

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

Generalized Additive Models

Analysis methods of heavy-tailed data

Emma Simpson. 6 September 2013

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Extreme value modelling of rainfalls and

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

arxiv: v2 [stat.me] 25 Sep 2012

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

TREND AND VARIABILITY ANALYSIS OF RAINFALL SERIES AND THEIR EXTREME

Bivariate generalized Pareto distribution

Non stationary extremes

Statistical Assessment of Extreme Weather Phenomena Under Climate Change

Extreme value theory and high quantile convergence

n =10,220 observations. Smaller samples analyzed here to illustrate sample size effect.

Quantile-quantile plots and the method of peaksover-threshold

Bayesian trend analysis for daily rainfall series of Barcelona

Extreme Value Theory as a Theoretical Background for Power Law Behavior

Modelling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Modeling Real Estate Data using Quantile Regression

What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies

On Backtesting Risk Measurement Models

R.Garçon, F.Garavaglia, J.Gailhard, E.Paquet, F.Gottardi EDF-DTG

Introduction to Regression

Robust and Efficient Estimation for the Generalized Pareto Distribution

Models and estimation.

5.2 Annual maximum sea levels in Venice

Journal of Environmental Statistics

Stat 710: Mathematical Statistics Lecture 12

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Extreme Event Modelling

MULTIVARIATE EXTREMES AND RISK

Mathematical statistics

Payer, Küchenhoff: Modelling extreme wind speeds in the context of risk analysis for high speed trains

L-momenty s rušivou regresí

Introduction to Algorithmic Trading Strategies Lecture 10

Math 576: Quantitative Risk Management

Nonparametric Inference In Functional Data

The extremal elliptical model: Theoretical properties and statistical inference

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

Models for Spatial Extremes. Dan Cooley Department of Statistics Colorado State University. Work supported in part by NSF-DMS

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Normalized kernel-weighted random measures

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

Lecture 17: Likelihood ratio and asymptotic tests

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Transcription:

Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of Lausanne, Switzerland October 31, 2013 MFEW01, Isaac Newton Institute Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 1 / 28

Hydrological time series Hydrological time series are the result of complex dynamical processes (precipitation, snow accumulation and melt, evatranspiration,...) Daily maximum flow (1923:2008) 0 100 200 300 400 0 100 200 300 Day of year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 2 / 28

Point changes like moves of a station, changes in measuring instruments and hydro-electric installations may lead to discontinuities Daily maximum flow 0 100 200 300 400 1920 1940 1960 1980 2000 Year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 3 / 28

Presence of seasonality, trends,... Daily streamflow records and their extremes are often dependent and not identically distributed Non-stationarity within one year or over longer periods = In this case EVT is not directly applicable! Variation due to the specifications of the station may be summarized parametrically Changes in time do not need to have a specific parametric form Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 4 / 28

Aim We combine the point process for exceedances with smoothing methods to give a flexible exploratory approach to model changes in the high flow exceedances The data are not declustered as we aim to model both long term and short term dependence Uncertainty assessment is made through appropriate confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 5 / 28

The block maxima approach I Consider X 1,..., X q iid from F (x) Suppose there exists a point x 0 (perhaps + ) such that lim x x0 F (x) = 1. For any fixed x < x 0 we have P(max{X 1,..., X q } x) = P(X i x, i = 1,..., q) = F q (x) which tends to 0 as q Given suitable sequences {a q } and {b q } of normalizing constants leading to W q = aq 1 {max(x 1,..., X q ) b q }, the non-degenerate limiting distribution must be a generalized extreme value (GEV) distribution { ( H µ,ψ,ξ (w) = exp 1 + ξ w µ ) } 1/ξ ψ < µ <, ψ > 0, < ξ < Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 6 / 28 +

The block maxima approach II We fit the GEV distribution to the series of (typically) annual maximum data (the blocks) From m blocks of size q W = (M q (1),..., M q (m) ) Construct a log likelihood by assuming we have independent observations from a GEV with density h µ,ψ,ξ (w) ( m l(µ, ψ, ξ; W ) = log h µ,ψ,ξ (M q (i) i=1 = ξ, µ, ψ )1 (i) {1+ξ(M q µ)/ψ>0} ) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 7 / 28

Peaks-over-Threshold Method (I) Let X t1,..., X tn denote the exceedances over a high threshold u with corresponding excesses Y ti = X ti u, i {1,..., n} 1) the number of exceedances N t approximately follows a Poisson process with intensity λ, that is, N t Poi(λ(t)) with integrated rate function Λ(t) = λt 2) the excesses Y t1,..., Y tn over u approximately follow (independently t of N t ) a generalized Pareto distribution (GPD), denoted by GPD(ξ, β) for ξ R, β > 0, with distribution function G ξ,β (x) = { 1 ( 1 + ξx/β ) 1/ξ, if ξ 0, 1 exp( x/β), if ξ = 0, for x 0, if ξ 0, and x [0, β/ξ], if ξ < 0 Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 8 / 28

Peaks-over-Threshold Method (II) Asymptotic independence between Poisson exceedance times and GPD excesses (λt )n n L(λ, ξ, β; Y ) = exp( λt ) g ξ,β (Y ti ), n! i=1 where Y = (Y t1,..., Y tn ) and g ξ,β is the density of G ξ,β = l(λ, ξ, β; Y ) = l(λ; Y ) + l(ξ, β; Y ), l(λ; Y ) = λt + n log(λ) + log(t n /n!) and l(ξ, β; Y ) = with l(ξ, β; y) = n l(ξ, β; Y ti ) i=1 { log(β) (1 + 1/ξ) log(1 + ξy/β), if ξ 0, log(β) y/β, if ξ = 0, = Maximization can be carried out separately Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 9 / 28

A dynamic EVT approach Let θ R r be the vector of r EVT model parameters (r = 3 in both the GEV model and POT representation) Let x be the vector of covariates; t be the time, then a general model } θ i = g i {x T η i + h i (t), i = 1,..., r g i is a link function; η i R p are the parameter vectors; h i is smoothed nonparametric function of t A where A R is the subset on which t is observed θ R r can be estimated by maximizing the penalized log-likelihood r ] l(θ; y) [γ i h i (t) 2 dt A i=1 where l(θ; y) is the log-likelihood based on the EVT model (either block-maxima or POT) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 10 / 28

Dynamic POT Approach: Non-homogeneous Poisson model for number of exceedances For the number of exceedances, a non-homogeneous Poisson process rate λ = λ(x, t) = exp(x η λ + h λ (t)) This model is a standard generalized additive (GAM) model Embedded models are compared using LRS (parametric part) Degrees of freedom for the non-parametric part (smoothing spline) are chosen using AIC (cf below) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 11 / 28

Dynamic POT Approach: GAM GPD for the exceedance sizes For the GPD parameters, we replace β by which is orthogonal to ξ ν = log((1 + ξ)β) The corresponding reparameterized log-likelihood l r for the excesses is thus l r (ξ, ν; Y ) = l(ξ, exp(ν)/(1 + ξ); Y ) We assume that ξ and ν are of the form ξ = ξ(x, t) = x η ξ + h ξ (t) ν = ν(x, t) = x η ν + h ν (t) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 12 / 28

Penalized maximum likelihood estimator (I) In order to fit reasonably smooth functions h ξ, h ν we use the penalized likelihood l p (η ξ, h ξ, η ν, h ν ; z 1,..., z n ) = l r (ξ, ν; y) γ ξ T 0 h ξ (t)2 dt γ ν T 0 h ν(t) 2 dt where γ ξ, γ ν 0 denote smoothing parameters, y = (y t1,..., y tn ), and l r (ξ, ν; y) = n l r (ξ i, ν i ; y ti ) i=1 for l r (ξ i, ν i ; y ti ) = l(ξ i, exp(ν i )/(1 + ξ i ); y ti ) Larger values of the smoothing parameters lead to smoother fitted curves. A related quantity is the equivalent degrees of freedom Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 13 / 28

Penalized maximum likelihood estimator (II) Let 0 = s 0 < s 1 < < s m < s m+1 = T denote the ordered and distinct values among {t 1,..., t n } for a natural cubic spline h with knots s 1,..., s m T 0 h (t) 2 dt = h Kh, where h = (h s1,..., h sm ) = (h(s 1 ),..., h(s m )) and K is a symmetric (m, m)-matrix of rank m 2 only depending on the knots s 1,..., s m Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 14 / 28

Penalized maximum likelihood estimator (III) The penalized log-likelihood can thus be written as l p (η ξ, h ξ, η ν, h ν ; z 1,..., z n ) = l r (ξ, ν; y) γ ξ h ξ Kh ξ γ ν h ν Kh ν with h ξ = (h ξ (s 1 ),..., h ξ (s m )) and h ν = (h ν (s 1 ),..., h ν (s m )). Backfitting algorithm for estimating simultaneously ξ and ν (thus β) Confidence intervals calculated using post-blackend Bootstrap Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 15 / 28

Return level estimation Based on the fitted values ˆλ, ˆξ, and ˆβ for a fixed covariate vector x and time point t, one can compute (depending on x and t) estimates of the 1/p-year return level R 1 p = u + ˆβ (( ) ˆξ ) 1 ˆξ pˆλ Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 16 / 28

Demonstration for simulated data (I) Details are provided as a demo in the package QRM of R (with Marius Hofert, ETHZ) We generate a data set of exceedances over a time period of 10 years for two groups (Group A and Group B) The simulated losses are drawn from a (non-stationary) generalized Pareto distribution depending on the covariates year and group We then fit the (Poisson process) intensity λ and the two parameters ξ and β of the generalized Pareto distribution depending on year and Group We then calculate a 99.9% VaR (equivalent to the return revel) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 17 / 28

Demonstration for simulated data (II) λ^ with pointwise asymptotic two sided 0.95% confidence intervals 2000 2500 3000 3500 4000 A B 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 Year λ^ 0.95 CI true number of losses Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 18 / 28

Demonstration for simulated data (III) ξ^ with bootstrapped pointwise two sided 0.95% confidence intervals 0.2 0.3 0.4 0.5 0.6 0.7 0.8 A B 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 Year ξ^ 0.95 CI true ξ Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 19 / 28

Demonstration for simulated data (IV) 2004 2006 2008 2010 2012 0.6 0.8 1.0 1.2 1.4 A 2004 2006 2008 2010 2012 B Year β^ 0.95 CI true β β^ with bootstrapped pointwise two sided 0.95% confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 20 / 28

Demonstration for simulated data (V) 2004 2006 2008 2010 2012 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 A 2004 2006 2008 2010 2012 B Year VaR0.999 0.95 CI true VaR0.999 VaR0.999 with bootstrapped ptw. two sided 0.95% confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 21 / 28

Application to Muota-Ingenbohl data: AIC Exploratory purpose: use high degrees of freedom Nonparametric estimate of the Poisson intensity for the dependence on the day of year (degrees of freedom = 20) and for the dependence on year (degrees of freedom = 10) lambda 0.00 0.10 0.20 0.30 10Jun (161) 19Jul (200) 27Oct (300) lambda 0.00 0.10 0.20 0.30 1970 2008 1920 1960 2000 0 100 200 300 Year Day of year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 22 / 28

Application to Muota-Ingenbohl data: AIC Automatic procedure for selecting the smoothing parameter via AIC AIC for xi~s(dayofyear,df) 13560 13570 13580 AIC for nu~s(dayofyear,df) 13480 13520 13560 0 2 4 6 8 10 0 2 4 6 8 10 Df Df AIC for xi~s(year,df) 13570 13575 13580 13585 AIC for nu~s(year,df) 13550 13570 0 2 4 6 8 10 0 2 4 6 8 10 Df Df Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 23 / 28

Application to Muota-Ingenbohl data: Model and parameters estimates The selected model: log λ(d, t) = h (3) λ (3) (t) + g λ (d) ˆξ(d, t) = h (2) ξ (t) + g (2) ξ (d) ˆν(d, t) = h ν (2) (t) + g ν (2) (d) xi 0.05 0.00 0.05 0.10 0.15 0.20 nu 3.2 3.3 3.4 3.5 3.6 3.7 3.8 lambda 0.10 0.12 0.14 0.16 1940 1960 1980 2000 Year 1940 1960 1980 2000 Year 1940 1960 1980 2000 Year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 24 / 28

Application to Muota-Ingenbohl data: Goodness-of-fit If the excesses Y t1,..., Y tn (approximately) follow a GPD(ξ, β), then R i Exp(1), i {1,..., n}, where R i = log(1 + Y ti ξ i /β i )/ξ i, i {1,..., n} We can thus graphically check whether (approximately) r ti = log(1 + y ti ˆξ i / ˆβ i )/ˆξ i, i {1,..., n}, are distributed as independent standard exponential variables Residuals 0 2 4 6 8 0 2 4 6 8 Quantiles of Exponential Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 25 / 28

Application to Muota-Ingenbohl data: 20-years return level 1940 1960 1980 2000 50 100 150 20 years return level Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 26 / 28

Thank you! Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 27 / 28

References Chavez-Demoulin, V. and Davison, A.C., (2005), Generalized additive models for sample extremes. Applied Statistics 54(1), 207 222. Chavez-Demoulin V. and Embrechts P. (2004), Smooth extremal models in finance and insurance. Journal of Risk and Insurance, 71(2), 183 199. Chavez-Demoulin V. and Embrechts P. and Hofert, M.(2013) An extreme value approach for modeling Operational Risk losses depending on covariates. Submitted. Yee, T.W. and Stephenson, A.G. (2007). Vector generalized linear and additive extreme value models. Extremes,9, 1 19. Laurini, F. and Pauli, F. (2009). Smoothing sample extremes: The mixed model approach. Comp. Statist. Data Anal., 53, 3842 3854 Pauli, F. and Coles, S.G. (2001). Penalized likelihood inference in extreme value analyses. J. Appl. Statist., 28, 547 560. Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 28 / 28