Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Similar documents
Bayesian Modelling of Extreme Rainfall Data

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

Bayesian Inference for Clustered Extremes

Financial Econometrics and Volatility Models Extreme Value Theory

Extreme Value Analysis and Spatial Extremes

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Generalized additive modelling of hydrological sample extremes

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Overview of Extreme Value Analysis (EVA)

What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Extreme value modelling of rainfalls and

Emma Simpson. 6 September 2013

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

Statistics for extreme & sparse data

Peaks-Over-Threshold Modelling of Environmental Data

A Conditional Approach to Modeling Multivariate Extremes

Inference for clusters of extreme values

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Extreme Event Modelling

Sharp statistical tools Statistics for extremes

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Bivariate generalized Pareto distribution

Statistical Methods for Clusters of Extreme Values

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Outline of GLMs. Definitions

High-frequency data modelling using Hawkes processes

Assessing Dependence in Extreme Values

Estimation of spatial max-stable models using threshold exceedances

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

High-frequency data modelling using Hawkes processes

Regional Estimation from Spatially Dependent Data

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Models and estimation.

2.6.3 Generalized likelihood ratio tests

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Extreme Value Theory and Applications

Graduate Econometrics I: Maximum Likelihood II

Theory of Maximum Likelihood Estimation. Konstantin Kashin

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz

Future extreme precipitation events in the Southwestern US: climate change and natural modes of variability

Empirical Likelihood

Bayesian Model Diagnostics and Checking

RISK ANALYSIS AND EXTREMES

Journal of Environmental Statistics

Semi-parametric estimation of non-stationary Pickands functions

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Climate predictability beyond traditional climate models

New Classes of Multivariate Survival Functions

The Spatial Variation of the Maximum Possible Pollutant Concentration from Steady Sources

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Zambia. General Climate. Recent Climate Trends. UNDP Climate Change Country Profiles. Temperature. C. McSweeney 1, M. New 1,2 and G.

Robust Backtesting Tests for Value-at-Risk Models

Math 576: Quantitative Risk Management

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

R.Garçon, F.Garavaglia, J.Gailhard, E.Paquet, F.Gottardi EDF-DTG

Graduate Econometrics I: Maximum Likelihood I

R&D Research Project: Scaling analysis of hydrometeorological time series data

Modelling Risk on Losses due to Water Spillage for Hydro Power Generation. A Verster and DJ de Waal

Review of existing statistical methods for flood frequency estimation in Greece

Normalising constants and maximum likelihood inference

Non-Stationary Time Series and Unit Root Testing

Accurate directional inference for vector parameters

ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING

Parameter Estimation

Change Point Analysis of Extreme Values

Quasi-likelihood Scan Statistics for Detection of

Accounting for Choice of Measurement Scale in Extreme Value Modeling

1 Degree distributions and data

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

Some conditional extremes of a Markov chain

Accurate directional inference for vector parameters

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

Spatial extreme value theory and properties of max-stable processes Poitiers, November 8-10, 2012

Modelação de valores extremos e sua importância na

4.5.1 The use of 2 log Λ when θ is scalar

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

DA Freedman Notes on the MLE Fall 2003

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Automated, Efficient, and Practical Extreme Value Analysis with Environmental Applications

Non stationary extremes

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

5.2 Annual maximum sea levels in Venice

The extremal elliptical model: Theoretical properties and statistical inference

Estimating AR/MA models

Extending clustered point process-based rainfall models to a non-stationary climate

New Bayesian methods for model comparison

Analysing River Discharge Data for Flood Analysis

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

1. Fisher Information

New design rainfalls. Janice Green, Project Director IFD Revision Project, Bureau of Meteorology

Modelling spatially-dependent non-stationary extremes with application to hurricane-induced wave heights

Two practical tools for rainfall weather generators

Transcription:

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Kate R. Saunders Supervisors: Peter Taylor & David Karoly University of Melbourne April 8, 2015

Outline 1 Extreme Value Theory 2

Problem What are the climate processes that drive extreme rainfall? (El Niño Southern Oscillation, Interdecadal Pacific Oscillation) How do these drivers differ at different timescales; sub-daily, daily, consecutive day totals?

Data Extreme Value Theory

Extreme Value Theory

Block Maxima Extreme Value Theory

Block Maxima Let X 1, X 2,..., X n be a sequence of i.i.d. random variables with distribution function F. Define M n = max{x 1, X 2,..., X n }. (X i might be daily rainfall observations and M 365 the annual maximum rainfall.) Pr(M n x) = Pr(X 1 x,..., X n x) = Pr(X 1 x) Pr(X n x) = F (x) n. As n, the distribution of the M n converges to a generalised extreme value distribution.

Generalized Extreme Value Theorem (Fisher-Tippett-Gnendenko) If there exists sequences of constants {a n > 0} and {b n } such that ( ) Mn b n Pr z G(z) as n a n for a non-degenerate distribution function G, then G is a member of the Generalized Extreme Value family [ ( )] 1 z µ G(z) = exp 1 + ξ ξ σ defined on {z : 1 + ξ(z µ)/σ > 0}, where < µ <, σ > 0 and < ξ <.

Leveraging more data Extreme Value Theory

Generalized Pareto Distribution Let X 1, X 2,..., X n be a sequence of iid random variables with marginal distribution function F. Pr{X > u + y X > u} = 1 F (u + y) 1 F (u) y > 0. If F satisfies Generalized Extreme Value Theorem then for a large enough threshold u, the distribution function of (X u) conditional on X > u is the GPD. Generalized Pareto Distribution - Picklands (1975) ( H(y) = 1 1 + ξỹ ) 1/ξ σ defined on {y : y > 0} and (1 + ξy/ σ > 0) where, σ = σ + ξ(u µ).

Dependence Rainfall observations are dependent Heavy rainfall yesterday effects the probability of heavy rain today Heavy rainfall a year ago doesn t Extreme Value Theory extends to stationary series with weak long range dependence However, for processes with short range dependence extremes occur in clusters

Clusters Extreme Value Theory

Dependent Series Let {X i } i 1 be a stationary series and {X i } i 1 be an independent series of variables with the same marginal distribution. Define M n = max{x 1,..., X n } and Mn = max{x1,..., X n }. Under suitable regularity conditions, { } (M Pr n b n ) z G(z), a n as n for normalizing sequences {a n > 0} and {b n }, where G is a non-degenerate distribution functions, if and only if { } (Mn b n ) Pr z G θ (z), a n for a constant θ such that 0 < θ 1.

Extremal Index θ = {Limiting mean cluster size} 1 (0, 1] θ = 0.5 2 observations per cluster on average.

Select a threshold Decluster the data for independent observations

Declustering Blocks Partition the observation sequence into blocks of length, b Assume extreme observations within the same block belong to the same same cluster. Runs Specify a run length, K Assume extreme observations with an inter-exceedance time of less than K belong to the same cluster.

Intervals The limiting process of exceedance times is compound Poisson for stationary series (Hsing et al. 1988). Ferro and Segers (2003) showed the limiting distribution of inter-exceedance times is a mixture distribution with weight θ, T θ (t) = (1 θ)ɛ 0 + θ θ exp( θt), where ɛ 0 is a degenerate distribution, T θ is the distribution of arrival times of exceedances at threshold u. By equating moments a non-parametric estimator can be found for θ. The largest θ(n 1) inter-exceedance times can be interpreted as between cluster arrivals.

Select a threshold Decluster the data for independent observations

Mean Residual Life Plots For sufficiently high thresholds, as the threshold increases the expected exceedance above the threshold should grow linearly.

Parameter Stability Plots Parameter estimates of (modified) scale and shape parameters should be constant for the range of valid thresholds.

Alternative Set the threshold according to a high quantile of non-zero observations Eg. 90 th percentile. Is this an appropriate threshold? Is our model is misspecified? Suggested approach by Süveges and Davison et al. (2010) is to test the threshold, u, and run parameter, K pair for model misspecification.

Log-Likelihood Limiting distribution of inter-exceedance times: T θ (t) = (1 θ)ɛ 0 + θ 2 exp( θt), Log-Likelihood (strictly positive inter-exceedance times): N 1 i=1 log ( (1 θ) I(t i =0) (θ 2 exp(θt i ) I(t i >0) ) = N 1 i=1 [2I(t i > 0) log(θ) θt i ], where t i = NT i n, n is the total number of observations and N is the number of exceedances. However as n gets large our estimate, ˆθ, tends to 1 suggesting independence.

Log-Likelihood Adjustment of the inter-exceedance times using the run parameter K: Log-likelihood: l(θ; c i ) = N 1 i=1 c i = max{t i K, 0} [I(c i = 0) log(1 θ) + 2I(c i > 0) log(θ) θc i ] Approach used in Fukutome et al. (2014) and Süveges and Davison (2010). Test combinations of threshold, u, and run parameter, K, for misspecification of the likelihood function. Select the (u, K) pair that maximizes the number of independent clusters.

Model Misspecification If a parametric model is misspecified then there is no θ such that g = f (θ), where g is the true model and f is the misspecified parametric model. For a well specified model, the Fisher s information matrix, I (θ) = E{l (θ; c j } is equal to the variance of the score vector, J(θ) = Var{l (θ; c j )}. Test the hypothesis: D(θ) = J(θ) I (θ), where H 0 : D(θ) = 0 and H 1 : D(θ) 0.

Empirically: I N 1 (ˆθ) = 1 N 1 l (ˆθ; c j ) (N 1) j=1 J N 1 (ˆθ) = N 1 1 l (ˆθ; c j ) 2 (N 1) j=1 D N 1 (ˆθ) = J N 1 (ˆθ) I N 1 (ˆθ) V N 1 (ˆθ) = 1 (N 1) N 1 j=1 [ (dj (ˆθ; c j ) D N 1 (ˆθ)I N 1 (ˆθ) 1 l (ˆθ; c j ) ) ] 2 where V N 1 (ˆθ) is the sample variance of D N 1 (ˆθ).

Model Misspecification Theorem: (Information Matrix Test - Whyte 1982) If the assumed model l(θ; c i ) contains the true model for some θ = θ 0, then as n, (i) (N 1)D N 1 (ˆθ) w N(0, V (θ 0 )), (ii) V N 1 ( θ N 1 ˆ ) a.s. V (θ 0 ), and V N 1 (ˆθ) is non-singular for sufficiently large N, (iii) Then the Information Matrix Test statistic, (N 1)D N 1 (ˆθ) V N 1 (ˆθ) 1 D N 1 (ˆθ) is asymptotically χ 2 1 distributed.

Example: AR(2) Y i = 0.95Y i 1 0.89Y i 2 + Z i where Z i GP(1, 1/2) and n = 8000. 100 simulations

Adjusting inter-exceedance times Common to assume stationarity by enforcing seasonal blocking. Collapse inter-exceedance times across seasonal blocks using the memoryless property of the exponential for fitting.

Results: Gatton, South East Queensland

Results: Oenpelli, Northern Territory

Summary Shown how to check if the threshold and run parameter selected violate the assumptions of the model Given confidence to threshold selection in the absence of a hard and fast rule and in the presence of subjectivity

References Ferro, C. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2), pp.545-556. Fukutome, S., Liniger, M. and Sveges, M. (2014). Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theoretical and Applied Climatology. Hsing, T., Hüsler, J. and Leadbetter, M. (1988). On the exceedance point process for a stationary sequence. Probability Theory and Related Fields, 78(1), pp.97-112. Süveges, M. and Davison, A. (2010). Model misspecification in peaks over threshold analysis. The Annals of Applied Statistics, 4(1), pp.203-221. White, H. (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50(1), p.1.

ANZAPW 2015: Barossa Valley, South Australia This work has been supported by the ARC through the Laureate Fellowship FL130100039. Questions?

Results: Kalamia, Far North Queensland

Results: Yamba, New South Wales