Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Kate R. Saunders Supervisors: Peter Taylor & David Karoly University of Melbourne April 8, 2015

Outline 1 Extreme Value Theory 2

Problem What are the climate processes that drive extreme rainfall? (El Niño Southern Oscillation, Interdecadal Pacific Oscillation) How do these drivers differ at different timescales; sub-daily, daily, consecutive day totals?

Data Extreme Value Theory

Extreme Value Theory

Block Maxima Extreme Value Theory

Block Maxima Let X 1, X 2,..., X n be a sequence of i.i.d. random variables with distribution function F. Define M n = max{x 1, X 2,..., X n }. (X i might be daily rainfall observations and M 365 the annual maximum rainfall.) Pr(M n x) = Pr(X 1 x,..., X n x) = Pr(X 1 x) Pr(X n x) = F (x) n. As n, the distribution of the M n converges to a generalised extreme value distribution.

Generalized Extreme Value Theorem (Fisher-Tippett-Gnendenko) If there exists sequences of constants {a n > 0} and {b n } such that ( ) Mn b n Pr z G(z) as n a n for a non-degenerate distribution function G, then G is a member of the Generalized Extreme Value family [ ( )] 1 z µ G(z) = exp 1 + ξ ξ σ defined on {z : 1 + ξ(z µ)/σ > 0}, where < µ <, σ > 0 and < ξ <.

Leveraging more data Extreme Value Theory

Generalized Pareto Distribution Let X 1, X 2,..., X n be a sequence of iid random variables with marginal distribution function F. Pr{X > u + y X > u} = 1 F (u + y) 1 F (u) y > 0. If F satisfies Generalized Extreme Value Theorem then for a large enough threshold u, the distribution function of (X u) conditional on X > u is the GPD. Generalized Pareto Distribution - Picklands (1975) ( H(y) = 1 1 + ξỹ ) 1/ξ σ defined on {y : y > 0} and (1 + ξy/ σ > 0) where, σ = σ + ξ(u µ).

Dependence Rainfall observations are dependent Heavy rainfall yesterday effects the probability of heavy rain today Heavy rainfall a year ago doesn t Extreme Value Theory extends to stationary series with weak long range dependence However, for processes with short range dependence extremes occur in clusters

Clusters Extreme Value Theory

Dependent Series Let {X i } i 1 be a stationary series and {X i } i 1 be an independent series of variables with the same marginal distribution. Define M n = max{x 1,..., X n } and Mn = max{x1,..., X n }. Under suitable regularity conditions, { } (M Pr n b n ) z G(z), a n as n for normalizing sequences {a n > 0} and {b n }, where G is a non-degenerate distribution functions, if and only if { } (Mn b n ) Pr z G θ (z), a n for a constant θ such that 0 < θ 1.

Extremal Index θ = {Limiting mean cluster size} 1 (0, 1] θ = 0.5 2 observations per cluster on average.

Select a threshold Decluster the data for independent observations

Declustering Blocks Partition the observation sequence into blocks of length, b Assume extreme observations within the same block belong to the same same cluster. Runs Specify a run length, K Assume extreme observations with an inter-exceedance time of less than K belong to the same cluster.

Intervals The limiting process of exceedance times is compound Poisson for stationary series (Hsing et al. 1988). Ferro and Segers (2003) showed the limiting distribution of inter-exceedance times is a mixture distribution with weight θ, T θ (t) = (1 θ)ɛ 0 + θ θ exp( θt), where ɛ 0 is a degenerate distribution, T θ is the distribution of arrival times of exceedances at threshold u. By equating moments a non-parametric estimator can be found for θ. The largest θ(n 1) inter-exceedance times can be interpreted as between cluster arrivals.

Select a threshold Decluster the data for independent observations

Mean Residual Life Plots For sufficiently high thresholds, as the threshold increases the expected exceedance above the threshold should grow linearly.

Parameter Stability Plots Parameter estimates of (modified) scale and shape parameters should be constant for the range of valid thresholds.

Alternative Set the threshold according to a high quantile of non-zero observations Eg. 90 th percentile. Is this an appropriate threshold? Is our model is misspecified? Suggested approach by Süveges and Davison et al. (2010) is to test the threshold, u, and run parameter, K pair for model misspecification.

Log-Likelihood Limiting distribution of inter-exceedance times: T θ (t) = (1 θ)ɛ 0 + θ 2 exp( θt), Log-Likelihood (strictly positive inter-exceedance times): N 1 i=1 log ( (1 θ) I(t i =0) (θ 2 exp(θt i ) I(t i >0) ) = N 1 i=1 [2I(t i > 0) log(θ) θt i ], where t i = NT i n, n is the total number of observations and N is the number of exceedances. However as n gets large our estimate, ˆθ, tends to 1 suggesting independence.

Log-Likelihood Adjustment of the inter-exceedance times using the run parameter K: Log-likelihood: l(θ; c i ) = N 1 i=1 c i = max{t i K, 0} [I(c i = 0) log(1 θ) + 2I(c i > 0) log(θ) θc i ] Approach used in Fukutome et al. (2014) and Süveges and Davison (2010). Test combinations of threshold, u, and run parameter, K, for misspecification of the likelihood function. Select the (u, K) pair that maximizes the number of independent clusters.

Model Misspecification If a parametric model is misspecified then there is no θ such that g = f (θ), where g is the true model and f is the misspecified parametric model. For a well specified model, the Fisher s information matrix, I (θ) = E{l (θ; c j } is equal to the variance of the score vector, J(θ) = Var{l (θ; c j )}. Test the hypothesis: D(θ) = J(θ) I (θ), where H 0 : D(θ) = 0 and H 1 : D(θ) 0.

Empirically: I N 1 (ˆθ) = 1 N 1 l (ˆθ; c j ) (N 1) j=1 J N 1 (ˆθ) = N 1 1 l (ˆθ; c j ) 2 (N 1) j=1 D N 1 (ˆθ) = J N 1 (ˆθ) I N 1 (ˆθ) V N 1 (ˆθ) = 1 (N 1) N 1 j=1 [ (dj (ˆθ; c j ) D N 1 (ˆθ)I N 1 (ˆθ) 1 l (ˆθ; c j ) ) ] 2 where V N 1 (ˆθ) is the sample variance of D N 1 (ˆθ).

Model Misspecification Theorem: (Information Matrix Test - Whyte 1982) If the assumed model l(θ; c i ) contains the true model for some θ = θ 0, then as n, (i) (N 1)D N 1 (ˆθ) w N(0, V (θ 0 )), (ii) V N 1 ( θ N 1 ˆ ) a.s. V (θ 0 ), and V N 1 (ˆθ) is non-singular for sufficiently large N, (iii) Then the Information Matrix Test statistic, (N 1)D N 1 (ˆθ) V N 1 (ˆθ) 1 D N 1 (ˆθ) is asymptotically χ 2 1 distributed.

Example: AR(2) Y i = 0.95Y i 1 0.89Y i 2 + Z i where Z i GP(1, 1/2) and n = 8000. 100 simulations

Adjusting inter-exceedance times Common to assume stationarity by enforcing seasonal blocking. Collapse inter-exceedance times across seasonal blocks using the memoryless property of the exponential for fitting.

Results: Gatton, South East Queensland

Results: Oenpelli, Northern Territory

Summary Shown how to check if the threshold and run parameter selected violate the assumptions of the model Given confidence to threshold selection in the absence of a hard and fast rule and in the presence of subjectivity

References Ferro, C. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2), pp.545-556. Fukutome, S., Liniger, M. and Sveges, M. (2014). Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theoretical and Applied Climatology. Hsing, T., Hüsler, J. and Leadbetter, M. (1988). On the exceedance point process for a stationary sequence. Probability Theory and Related Fields, 78(1), pp.97-112. Süveges, M. and Davison, A. (2010). Model misspecification in peaks over threshold analysis. The Annals of Applied Statistics, 4(1), pp.203-221. White, H. (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50(1), p.1.

ANZAPW 2015: Barossa Valley, South Australia This work has been supported by the ARC through the Laureate Fellowship FL130100039. Questions?

Results: Kalamia, Far North Queensland

Results: Yamba, New South Wales