Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of Lausanne, Switzerland October 31, 2013 MFEW01, Isaac Newton Institute Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 1 / 28
Hydrological time series Hydrological time series are the result of complex dynamical processes (precipitation, snow accumulation and melt, evatranspiration,...) Daily maximum flow (1923:2008) 0 100 200 300 400 0 100 200 300 Day of year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 2 / 28
Point changes like moves of a station, changes in measuring instruments and hydro-electric installations may lead to discontinuities Daily maximum flow 0 100 200 300 400 1920 1940 1960 1980 2000 Year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 3 / 28
Presence of seasonality, trends,... Daily streamflow records and their extremes are often dependent and not identically distributed Non-stationarity within one year or over longer periods = In this case EVT is not directly applicable! Variation due to the specifications of the station may be summarized parametrically Changes in time do not need to have a specific parametric form Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 4 / 28
Aim We combine the point process for exceedances with smoothing methods to give a flexible exploratory approach to model changes in the high flow exceedances The data are not declustered as we aim to model both long term and short term dependence Uncertainty assessment is made through appropriate confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 5 / 28
The block maxima approach I Consider X 1,..., X q iid from F (x) Suppose there exists a point x 0 (perhaps + ) such that lim x x0 F (x) = 1. For any fixed x < x 0 we have P(max{X 1,..., X q } x) = P(X i x, i = 1,..., q) = F q (x) which tends to 0 as q Given suitable sequences {a q } and {b q } of normalizing constants leading to W q = aq 1 {max(x 1,..., X q ) b q }, the non-degenerate limiting distribution must be a generalized extreme value (GEV) distribution { ( H µ,ψ,ξ (w) = exp 1 + ξ w µ ) } 1/ξ ψ < µ <, ψ > 0, < ξ < Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 6 / 28 +
The block maxima approach II We fit the GEV distribution to the series of (typically) annual maximum data (the blocks) From m blocks of size q W = (M q (1),..., M q (m) ) Construct a log likelihood by assuming we have independent observations from a GEV with density h µ,ψ,ξ (w) ( m l(µ, ψ, ξ; W ) = log h µ,ψ,ξ (M q (i) i=1 = ξ, µ, ψ )1 (i) {1+ξ(M q µ)/ψ>0} ) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 7 / 28
Peaks-over-Threshold Method (I) Let X t1,..., X tn denote the exceedances over a high threshold u with corresponding excesses Y ti = X ti u, i {1,..., n} 1) the number of exceedances N t approximately follows a Poisson process with intensity λ, that is, N t Poi(λ(t)) with integrated rate function Λ(t) = λt 2) the excesses Y t1,..., Y tn over u approximately follow (independently t of N t ) a generalized Pareto distribution (GPD), denoted by GPD(ξ, β) for ξ R, β > 0, with distribution function G ξ,β (x) = { 1 ( 1 + ξx/β ) 1/ξ, if ξ 0, 1 exp( x/β), if ξ = 0, for x 0, if ξ 0, and x [0, β/ξ], if ξ < 0 Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 8 / 28
Peaks-over-Threshold Method (II) Asymptotic independence between Poisson exceedance times and GPD excesses (λt )n n L(λ, ξ, β; Y ) = exp( λt ) g ξ,β (Y ti ), n! i=1 where Y = (Y t1,..., Y tn ) and g ξ,β is the density of G ξ,β = l(λ, ξ, β; Y ) = l(λ; Y ) + l(ξ, β; Y ), l(λ; Y ) = λt + n log(λ) + log(t n /n!) and l(ξ, β; Y ) = with l(ξ, β; y) = n l(ξ, β; Y ti ) i=1 { log(β) (1 + 1/ξ) log(1 + ξy/β), if ξ 0, log(β) y/β, if ξ = 0, = Maximization can be carried out separately Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 9 / 28
A dynamic EVT approach Let θ R r be the vector of r EVT model parameters (r = 3 in both the GEV model and POT representation) Let x be the vector of covariates; t be the time, then a general model } θ i = g i {x T η i + h i (t), i = 1,..., r g i is a link function; η i R p are the parameter vectors; h i is smoothed nonparametric function of t A where A R is the subset on which t is observed θ R r can be estimated by maximizing the penalized log-likelihood r ] l(θ; y) [γ i h i (t) 2 dt A i=1 where l(θ; y) is the log-likelihood based on the EVT model (either block-maxima or POT) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 10 / 28
Dynamic POT Approach: Non-homogeneous Poisson model for number of exceedances For the number of exceedances, a non-homogeneous Poisson process rate λ = λ(x, t) = exp(x η λ + h λ (t)) This model is a standard generalized additive (GAM) model Embedded models are compared using LRS (parametric part) Degrees of freedom for the non-parametric part (smoothing spline) are chosen using AIC (cf below) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 11 / 28
Dynamic POT Approach: GAM GPD for the exceedance sizes For the GPD parameters, we replace β by which is orthogonal to ξ ν = log((1 + ξ)β) The corresponding reparameterized log-likelihood l r for the excesses is thus l r (ξ, ν; Y ) = l(ξ, exp(ν)/(1 + ξ); Y ) We assume that ξ and ν are of the form ξ = ξ(x, t) = x η ξ + h ξ (t) ν = ν(x, t) = x η ν + h ν (t) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 12 / 28
Penalized maximum likelihood estimator (I) In order to fit reasonably smooth functions h ξ, h ν we use the penalized likelihood l p (η ξ, h ξ, η ν, h ν ; z 1,..., z n ) = l r (ξ, ν; y) γ ξ T 0 h ξ (t)2 dt γ ν T 0 h ν(t) 2 dt where γ ξ, γ ν 0 denote smoothing parameters, y = (y t1,..., y tn ), and l r (ξ, ν; y) = n l r (ξ i, ν i ; y ti ) i=1 for l r (ξ i, ν i ; y ti ) = l(ξ i, exp(ν i )/(1 + ξ i ); y ti ) Larger values of the smoothing parameters lead to smoother fitted curves. A related quantity is the equivalent degrees of freedom Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 13 / 28
Penalized maximum likelihood estimator (II) Let 0 = s 0 < s 1 < < s m < s m+1 = T denote the ordered and distinct values among {t 1,..., t n } for a natural cubic spline h with knots s 1,..., s m T 0 h (t) 2 dt = h Kh, where h = (h s1,..., h sm ) = (h(s 1 ),..., h(s m )) and K is a symmetric (m, m)-matrix of rank m 2 only depending on the knots s 1,..., s m Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 14 / 28
Penalized maximum likelihood estimator (III) The penalized log-likelihood can thus be written as l p (η ξ, h ξ, η ν, h ν ; z 1,..., z n ) = l r (ξ, ν; y) γ ξ h ξ Kh ξ γ ν h ν Kh ν with h ξ = (h ξ (s 1 ),..., h ξ (s m )) and h ν = (h ν (s 1 ),..., h ν (s m )). Backfitting algorithm for estimating simultaneously ξ and ν (thus β) Confidence intervals calculated using post-blackend Bootstrap Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 15 / 28
Return level estimation Based on the fitted values ˆλ, ˆξ, and ˆβ for a fixed covariate vector x and time point t, one can compute (depending on x and t) estimates of the 1/p-year return level R 1 p = u + ˆβ (( ) ˆξ ) 1 ˆξ pˆλ Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 16 / 28
Demonstration for simulated data (I) Details are provided as a demo in the package QRM of R (with Marius Hofert, ETHZ) We generate a data set of exceedances over a time period of 10 years for two groups (Group A and Group B) The simulated losses are drawn from a (non-stationary) generalized Pareto distribution depending on the covariates year and group We then fit the (Poisson process) intensity λ and the two parameters ξ and β of the generalized Pareto distribution depending on year and Group We then calculate a 99.9% VaR (equivalent to the return revel) Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 17 / 28
Demonstration for simulated data (II) λ^ with pointwise asymptotic two sided 0.95% confidence intervals 2000 2500 3000 3500 4000 A B 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 Year λ^ 0.95 CI true number of losses Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 18 / 28
Demonstration for simulated data (III) ξ^ with bootstrapped pointwise two sided 0.95% confidence intervals 0.2 0.3 0.4 0.5 0.6 0.7 0.8 A B 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 Year ξ^ 0.95 CI true ξ Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 19 / 28
Demonstration for simulated data (IV) 2004 2006 2008 2010 2012 0.6 0.8 1.0 1.2 1.4 A 2004 2006 2008 2010 2012 B Year β^ 0.95 CI true β β^ with bootstrapped pointwise two sided 0.95% confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 20 / 28
Demonstration for simulated data (V) 2004 2006 2008 2010 2012 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 A 2004 2006 2008 2010 2012 B Year VaR0.999 0.95 CI true VaR0.999 VaR0.999 with bootstrapped ptw. two sided 0.95% confidence intervals Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 21 / 28
Application to Muota-Ingenbohl data: AIC Exploratory purpose: use high degrees of freedom Nonparametric estimate of the Poisson intensity for the dependence on the day of year (degrees of freedom = 20) and for the dependence on year (degrees of freedom = 10) lambda 0.00 0.10 0.20 0.30 10Jun (161) 19Jul (200) 27Oct (300) lambda 0.00 0.10 0.20 0.30 1970 2008 1920 1960 2000 0 100 200 300 Year Day of year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 22 / 28
Application to Muota-Ingenbohl data: AIC Automatic procedure for selecting the smoothing parameter via AIC AIC for xi~s(dayofyear,df) 13560 13570 13580 AIC for nu~s(dayofyear,df) 13480 13520 13560 0 2 4 6 8 10 0 2 4 6 8 10 Df Df AIC for xi~s(year,df) 13570 13575 13580 13585 AIC for nu~s(year,df) 13550 13570 0 2 4 6 8 10 0 2 4 6 8 10 Df Df Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 23 / 28
Application to Muota-Ingenbohl data: Model and parameters estimates The selected model: log λ(d, t) = h (3) λ (3) (t) + g λ (d) ˆξ(d, t) = h (2) ξ (t) + g (2) ξ (d) ˆν(d, t) = h ν (2) (t) + g ν (2) (d) xi 0.05 0.00 0.05 0.10 0.15 0.20 nu 3.2 3.3 3.4 3.5 3.6 3.7 3.8 lambda 0.10 0.12 0.14 0.16 1940 1960 1980 2000 Year 1940 1960 1980 2000 Year 1940 1960 1980 2000 Year Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 24 / 28
Application to Muota-Ingenbohl data: Goodness-of-fit If the excesses Y t1,..., Y tn (approximately) follow a GPD(ξ, β), then R i Exp(1), i {1,..., n}, where R i = log(1 + Y ti ξ i /β i )/ξ i, i {1,..., n} We can thus graphically check whether (approximately) r ti = log(1 + y ti ˆξ i / ˆβ i )/ˆξ i, i {1,..., n}, are distributed as independent standard exponential variables Residuals 0 2 4 6 8 0 2 4 6 8 Quantiles of Exponential Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 25 / 28
Application to Muota-Ingenbohl data: 20-years return level 1940 1960 1980 2000 50 100 150 20 years return level Valérie Chavez-Demoulin Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) (Lausanne) Generalized additive modelling of hydrological sample extremes 26 / 28
Thank you! Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 27 / 28
References Chavez-Demoulin, V. and Davison, A.C., (2005), Generalized additive models for sample extremes. Applied Statistics 54(1), 207 222. Chavez-Demoulin V. and Embrechts P. (2004), Smooth extremal models in finance and insurance. Journal of Risk and Insurance, 71(2), 183 199. Chavez-Demoulin V. and Embrechts P. and Hofert, M.(2013) An extreme value approach for modeling Operational Risk losses depending on covariates. Submitted. Yee, T.W. and Stephenson, A.G. (2007). Vector generalized linear and additive extreme value models. Extremes,9, 1 19. Laurini, F. and Pauli, F. (2009). Smoothing sample extremes: The mixed model approach. Comp. Statist. Data Anal., 53, 3842 3854 Pauli, F. and Coles, S.G. (2001). Penalized likelihood inference in extreme value analyses. J. Appl. Statist., 28, 547 560. Valérie Chavez-Demoulin Joint work with A.C. Generalized Davison (EPFL) additive and modelling Marius Hofert hydrological (ETHZ) sample (Lausanne) extremes 28 / 28