What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies

Similar documents
Extreme Value Analysis and Spatial Extremes

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Bayesian Modelling of Extreme Rainfall Data

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Bayesian Inference for Clustered Extremes

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Overview of Extreme Value Analysis (EVA)

RISK ANALYSIS AND EXTREMES

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Sharp statistical tools Statistics for extremes

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

Peaks-Over-Threshold Modelling of Environmental Data

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

Extreme Value Theory and Applications

Challenges in implementing worst-case analysis

Bivariate generalized Pareto distribution

Estimation of spatial max-stable models using threshold exceedances

Efficient Estimation of Distributional Tail Shape and the Extremal Index with Applications to Risk Management

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

Spatial and temporal extremes of wildfire sizes in Portugal ( )

Modelação de valores extremos e sua importância na

MULTIVARIATE EXTREMES AND RISK

FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY

A Conditional Approach to Modeling Multivariate Extremes

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Financial Econometrics and Volatility Models Extreme Value Theory

Statistics for extreme & sparse data

New Classes of Multivariate Survival Functions

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING

Monthly Overview Rainfall

Inference for clusters of extreme values

Data. Climate model data from CMIP3

Predicting wildfire ignitions, escapes, and large fire activity using Predictive Service s 7-Day Fire Potential Outlook in the western USA

High-frequency data modelling using Hawkes processes

Generalized additive modelling of hydrological sample extremes

Semi-parametric estimation of non-stationary Pickands functions

Fire Weather Drivers, Seasonal Outlook and Climate Change. Steven McGibbony, Severe Weather Manager Victoria Region Friday 9 October 2015

Emma Simpson. 6 September 2013

Fire frequency in the Western Cape

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

TREND AND VARIABILITY ANALYSIS OF RAINFALL SERIES AND THEIR EXTREME

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

Abstract: In this short note, I comment on the research of Pisarenko et al. (2014) regarding the

Extreme value modelling of rainfalls and

High-frequency data modelling using Hawkes processes

By: J Malherbe, R Kuschke

A class of probability distributions for application to non-negative annual maxima

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Introduction. Chapter 1

Statistical Methods for Clusters of Extreme Values

Regional Estimation from Spatially Dependent Data

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

Downscaling Extremes: A Comparison of Extreme Value Distributions in Point-Source and Gridded Precipitation Data

Introduction to Algorithmic Trading Strategies Lecture 10

Assessing Dependence in Extreme Values

Contributions for the study of high levels that persist over a xed period of time

ESTIMATING BIVARIATE TAIL

Some conditional extremes of a Markov chain

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz

Does k-th Moment Exist?

Modeling daily precipitation in Space and Time

Classical Extreme Value Theory - An Introduction

Accounting for Choice of Measurement Scale in Extreme Value Modeling

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Probabilistic risk assessment for wildfires z

DOWNSCALING EXTREMES: A COMPARISON OF EXTREME VALUE DISTRIBUTIONS IN POINT-SOURCE AND GRIDDED PRECIPITATION DATA

The Behavior of Multivariate Maxima of Moving Maxima Processes

C4-304 STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS

Models and estimation.

Measurement And Uncertainty

Extreme Value Theory.

Reduced Overdispersion in Stochastic Weather Generators for Statistical Downscaling of Seasonal Forecasts and Climate Change Scenarios

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Statistical Assessment of Extreme Weather Phenomena Under Climate Change

Modelling trends in the ocean wave climate for dimensioning of ships

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

SEVERE WEATHER UNDER A CHANGING CLIMATE: LARGE-SCALE INDICATORS OF EXTREME EVENTS

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Precipitation Extremes in the Hawaiian Islands and Taiwan under a changing climate

Mozambique. General Climate. UNDP Climate Change Country Profiles. C. McSweeney 1, M. New 1,2 and G. Lizcano 1

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

DROUGHT RISK EVALUATION USING REMOTE SENSING AND GIS : A CASE STUDY IN LOP BURI PROVINCE

Construction of confidence intervals for extreme rainfall quantiles

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

Monthly overview. Rainfall

On the Estimation and Application of Max-Stable Processes

Rainfall variability and uncertainty in water resource assessments in South Africa

Transcription:

What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies Sibusisiwe Khuluse, Sonali Das, Pravesh Debba, Chris Elphinstone Logistics & Quantitative Methods CA, CSIR (Built Environment) South Africa E-mail:- skhuluse@csir.co.za African Digital Scholarship and Curation Conference 2009 Abstract When events occur outside the range of acceptable fluctuations, they may result in either (a) the events being more favourable than usual, or (b) the events being less favourable than usual. The latter has serious implications if their occurrences trigger a chain of subsequent negative events. Such events are termed risk events. Extreme Value Theory (EVT), is tool that attempts to best estimate the probability of adversarial risk events. There are several environmental studies where extreme value methods have been used. In this paper, the behaviour of very high levels of the McArthur Fire Danger Index (FDI) at four sites in the Kruger National Park, is described using the threshold exceedance approach in EVT. There is particular interest in whether there is dependence at high levels of the FDI series, seasonality and trend at each site. We will review how the model for threshold excesses, the Generalized Pareto distribution, has to be modified to incorporate these features and the effect this has on the parameter estimates. 1 Introduction Extreme Value Theory is concerned with probabilistic and statistical questions pertaining to very large or very small values of a sequence of random variables (Smith, 2003). Extreme events have a low probability of occurrence, thus the risk associated with the frequency of occurrence is of practical importance. Quantifying this risk entails estimating the probabilities in the tail of the underlying distribution of the process of interest. Extreme value theory serves as a tool in an attempt to find the best estimate of the tail area of the distribution, and consequently the probabilistic risk associated with an extreme event. In environmental studies, probabilistic risk is often defined in terms of the return interval of natural hazards. Such information is often an input in decision support systems which provide guidance on the allocation of sufficient resources for disaster management. An example is in wildfire risk management. Knowledge about the risks of severe fire events in advance is important to fire managers, as they need to ensure the efficiency of their suppression efforts in anticipation 1

of severe fire seasons. Fire Danger Indices are indicators of fire potential at a particular site and particular time. Individually they are direct predictors of fire occurrence and spread, but they do provide useful input in understanding fire behaviour, which is important in wildfire risk management systems, such as the National Fire Danger Rating System (NFDRS) (Wybo et al., 1995; Preisler and Westerling, 2007). Fire Danger Indices are commonly defined as being the resultant of constant and variable danger factors (Wybo et al., 1995). Constant danger factors include fuel types and topography, whilst variable danger factors are mainly related to meteorological conditions which affect the ignition, spread and difficulty to control fires and the damage they cause. The objective of this study is to apply the threshold exceedance approach within the Extreme Value Theory (EVT) framework, using calculated McArthur Fire Danger Index (FDI) values from four sites in the Kruger National Park (KNP). The McArthur Fire Danger Index was developed in Australia in 1966 from statistical analysis of large quantities of field data. This index has a closed scale ranging from zero to 100, and is a measure of the level of fire danger, the potential speed of fire as well as the difficulty of extinguishing the fire (Wybo et al., 1995). The McArthur FDI is calculated from a combination of meteorological variables. Non-stationarity in the form of seasonality and long-term climate trends, as well as the temporal dependence are inherent features in meteorological processes, hence it is anticipated that these feature will also exist in FDI series. This would be a violation of the i.i.d.assumption, which forms the theoretical basis of univariate extreme value models. This paper demonstrates methods for treating temporal dependence and incorporating seasonality in a model for high values of FDI. Section 2 contains an outline of Extreme Value Theory, followed by a detailed description of the data and preliminary data analysis in preparation for fitting the generalized Pareto distribution to exceedance of the threshold. The results are presented in Section 3 with a discussion and Section 4 states the conclusions of this research. 2 Methods of Study The objective of this study is to demonstrate the application of the threshold exceedance approach in Extreme Value Theory (EVT) using the FDI data. It is not an exercise to quantify or predict the probability of wildfire occurrence. Focus will be on the resulting parameter estimates, looking at how these change as temporal dependence and seasonality is taken into account. This section contains a brief outline of Extreme Value Theory (EVT), the methods employed in this study, as well as a description of the data. 2.1 Extreme Value Theory in Brief The fundamentals assumptions in univariate Extreme Value Theory (EVT) is that the random variables are independently and identically distributed (i.i.d.). Initially, the theory was developed by (Fisher and Tippett, 1928) to quantify the 2

behaviour of the smallest or largest member of the sample. Their formulation was that, if X 1, X 2,... is a sequence of i.i.d. random variables, with cumulative marginal distribution function F, then the behaviour of the maximum of a sample of length n 1 (n is called the block size), i.e.m n = max(x 1, X 2,..., X n ), can be asymptotically characterized by only three types of distributions: the Gumbel, Fréchet and Weibull type distributions. This is known as the extremal types theorem. Due to mathematical justification by Gnedenko (1943), it follows that for large n, the three types may be combined into a single general distribution, namely the Generalized Extreme Value (GEV) distribution, which is defined as: ( ) { ( Mn a n P x exp 1 + ξ x µ ) 1 } ξ (1) σ b n provided that (y + = max(y, 0)), where < µ < is the location parameter, σ > 0 is the scale parameter and ξ is the shape parameter. The limit ξ 0 corresponds to the Gumbel distribution, ξ > 0 to the Fréchet distribution and ξ < 0 to the negative Weibull distribution. The shape parameter ξ is important as it describes how the tail of the distribution of extremes propagates. Thus for the Fréchet, Gumbel and Weibull types of distributions, the GEV has: long, medium and short tail (with finite upper end point), respectively. The approach based on fitting a generalized Pareto distribution (GPD) to threshold exceedance is an alternative to the classical approach of modelling the optima. The advantage of this method instead of the block maxima approach, is the economical usage of data. An important condition in this method, is that a suitably high threshold u is chosen so that excesses over that threshold can be approximated by the GPD. Consider the series X 1, X 2,..., construct exceedances Y = X u, such that Y > 0. As u approaches the maximum value of the series, that is u sup{x : F (x) < 1}, the limit often found is, F u (y) G(y; σ u, ξ) (2) where G is the Generalized Pareto Distribution (GPD) given by ) 1 G(y; σ u, ξ) = 1 (1 + ξ yσu ξ + ) The existence of this limit is subject to y > 0 and (1 + ξ yσu + (3) > 0. In equation (2), σ u is used to imply that the scale parameter is dependent on the threshold u, that is: σ u = σ + ξ(u µ) (4) James Pickands III was the first to formulate the idea of a generalized Pareto upper tail of a continuous distribution function. Pickands (1975) made precise the connection that, if the maxima/minima (M n ) of a process can be approximated by the GEV distribution, then for sufficiently large thresholds u, the 1 The block size n could be a day, week, month, year, etc. However, use of an annual block is common because of the ease of interpretation. Care must be taken in selecting the block size as this affects the trade-off between bias and variance of the estimator. 3

threshold excesses {Y} have a corresponding approximate distribution within the generalized Pareto family. Further, Pickands showed that the parameter ξ of the GPD is the same as the shape parameter in the GEV family of distributions. The main assumption in EVT is that the observed values are independently and identically distributed (i.i.d.), however dependence and non-stationarity is inherent in many physical processes. In the case of temporal dependence the extremal types theorem can be modified, by assuming the existence of a condition that limits the extent of long-range dependence at extreme levels of a process. This is termed the D(u n ) condition, formulated and rigorously justified by Leadbetter (Leadbetter, 1983; Leadbetter et al., 1983). As a consequence of the modification of the extremal types theorem the relationship between the extreme value model of the dependent and the independent sequence is as follows: G dep (z) = (G indep (z)) θ (5) where 0 < θ 1 is termed the extremal index. The extremal index is a measure of the tendency of the process to cluster at extreme levels. For independent sequences θ = 1, but the converse is not necessarily true. In the case of the block maxima, provided the assumption of long-range independence is satisfied, inference is similar to that of the i.i.d. case. The parameter estimates are not affected, however the number of unique observation per block is reduced from n to nθ which has the effect of reducing the quality of the estimation. For threshold exceedances, the GPD remains the appropriate approximating distribution, however the tendency of dependent observations to cluster, means that the joint distribution of neighbouring excesses cannot be specified. Various methods to deal with the clustering have been proposed, but runs declustering methods are the simplest and most popular. Here, clusters of exceedances of the threshold u are defined, with the objective that clusters would be some distance apart, such that the cluster maxima will be independent. The GPD is then fit to the independent cluster maxima. Non-stationarity is another feature common in most practical situations. This refers to the marginal distribution of the process not remaining the same as time changes. This could be due to seasonality, long-term climate trends etc. When extremes are temporally dependent, subject to special restrictions, extreme value models are still applicable without loss of generality. In contrast, in the case of non-stationarity, extreme value models are used as basic templates upon which model parameters can be statistically modelled to derive problemspecific solutions. In the threshold exceedance approach the generalized Pareto is assumed to be the common distribution for excesses, with the parameters assumed to have parametric forms as guided by the data or the nature of the problem. The basic formulation is given by ( Y t GP ˆσ(t), ˆξ(t) ) (6) where t is usually (but not restricted) to the time index. 4

2.2 Study Area and Data The Kruger National Park (KNP), which spans approximately 1.95 million hectares, is situated northeast of South Africa (between longitudes 22 S - 26 S and latitudes 31 E - 32 E) covering parts of the Limpopo and Mpumalanga provinces, with Zimbabwe and Mozambique found on the park s northern and eastern boundaries. It is rich in the quantity and diversity of plant and animal species. Altitude varies from 260 m to 839 m above sea-level, with the average height being 300 m. The park s climate consists of warm and dry cycles which meteorologists have recorded to be between 8 and 12 years. Average rainfall varies from 375 mm in the north to 750 mm in the south, whilst the average temperature is estimated to be 18 C in winter and 30 C in summer, with maximums capable of rising beyond 40 C. The data under study consists of McArthur Forest Fire Danger Index values from four sites in the Kruger National Park (KNP). Figure 1 shows the location of the KNP in relation to South Africa, with the magnified map of the park itself showing where the four sites are located. These are from north to south: Shingwedzi, Letaba, Satara and Skukuza. The index was calculated using local weather variables 2 including rainfall, temperature and relative humidity. The period under study is from 1960 to 2007; that is 48 years of non-missing daily values of the McArthur FDI values for each site. Figure 2 suggests that for each site the distribution of the data is skewed to the left. Shingwedzi has higher FDI values than those at the other sites. The summary statistics presented in Table 1 support this view. Interestingly during the period of study, FDI values at Letaba never exceeded 40, whilst it was exceeded four times at Shingwedzi, twice at Satara and once at Skukuza. Further, from Figure 2 and Table 1, the mean and the median values are similar for Shingwedzi and Letaba; and for Satara and Skukuza. This indicates that the behaviour in the center of the distribution of the FDI process is similar for regions in the north of the KNP and similar for regions in the south. Interestingly, when the uppermost percentiles and the maxima are considered, these differ across sites. In most statistical methods based on the normality assumption, the values beyond the bounds of the plot would be considered as outliers and possibly even discarded from the analysis. In contrast, in Extreme Value (EV) methods, these values that are beyond the normal range of values observed for the process, are the focus of the analysis. According to the McArthur Fire Danger Index rating system, very high fire danger values are classified as those between 24 and 50 (Willis et al., 2001). Table 1 suggests that these were the top 5% of the observed values across the four sites, however over the 48 years none of the sites had experienced extreme fire danger, since 50 was never exceeded during this period. The components of the Fire Danger Index are meteorological factors. Inherent features in meteorological variables are seasonality, evidence of long-term trends as well as short-range correlations. Since the FDI is a combination of 2 The weather variables were obtained from different sources including the South African Weather Service (http://www.weathersa.co.za) and the European Centre for Medium-Range Weather Forecasts (http://www.ecmwf.int/) 5

Figure 1: Map of the KNP in relation to South Africa as well as the location of: Shingwedzi, Letaba, Satara and Skukuza. meteorological factors, it is anticipated that it will exhibit some of these features. Figure 3 shows a monthly scatter plot of the daily FDI series, which clearly exhibit seasonal features. It seems that high values of the index occur frequently during the July to December period, in all four sites. 6

Figure 2: Box plot showing the empirical distribution of FDI values for: (1) Shingwedzi, (2) Letaba, (3) Satara and (4) Skukuza Summary statistics Shingwedzi Letaba Satara Skukuza Minimum 0.03 0.03 0 0.02 Median 10.42 10.12 9.1 8.83 Mean 11.13 10.91 9.99 9.9 95 th Percentile 23.4 22.45 21.5 22.28 99 th Percentile 29.61 28.13 27.3 28.42 Maximum 46.74 39.41 44.3 43.32 Table 1: Summarized description of the observed FDI values (1960-2007) for 4 sites 2.3 Exploratory Analysis The aim of the study is not to model the entire series of FDI values, but rather to model values exceeding a certain threshold. Once a suitable threshold has been chosen, the assumption is that the generalized Pareto distribution is a suitable model for the independent excesses. Finding a suitable threshold is still an 7

Figure 3: Scatter plot of the FDI values per month for the 4 sites area of ongoing research in the domain of Extreme Value (EV) modelling. The methods described by Davison and Smith (1990) will be used in this study. The first is based on constructing the mean excess plot (also known as the mean residual life plot). This plot is a description of the behaviour of the mean excess given that an exceedance has occurred. This is based on E(X u X > u) = σ u 1 + ξ ξ 1 + ξ u (7) where the estimates are expected to change linearly with the threshold u, at levels of u for which the GPD is appropriate. The suitable threshold is therefore selected from the mean excess plot as that value of u which is at the onset of linearity. The mean excess plots for the four sites are given in Figure 4. The dotted lines in these plots are the confidence bounds, which are based on the approximate normality of sample means. The onset of linearity in the mean excess plot seems to be between 29 and 31 for the four sites. The second method for 8

(a) Mean excess plot: Shingwedzi (b) Mean excess plot: Letaba (c) Mean excess plot: Satara (d) Mean excess plot: Skukuza Figure 4: Mean excess plots of the daily FDI values for the four sites in the KNP selecting an appropriate threshold is described by Davison and Smith (1990) as an assessment of the stability of the parameter estimates, based on fitting the GPD across a range of thresholds. The idea is that too high a threshold will result in too few excesses, hence a decline in the stability of the GPD parameter estimates. Figure 5 shows a plot of the scale and shape parameter estimates against values of the threshold in the range 25 to 35 for Shingwedzi and Skukuza. Similar plots for Letaba and Satara were computed, however these were not included in the interest of space. From these plots the stability of the parameter estimates deteriorate beyond the threshold value of 30. Based on the mean excess plots and the stability plots (viz.figures 4 and 5), 30 was selected as the threshold for fitting the GPD. As mentioned in Section 2.2, one of the features to be aware of when modelling data that are related to meteorology, is the possible presence of long-term trends. A plot of the daily excesses FDI values over the years, Figure 6, shows no evidence of long-term temporal trend, for all four sites. 9

(a) Parameter stability plot: Shingwedzi (b) Parameter stability plot: Skukuza Figure 5: Mean excess plots of the daily FDI values for the four sites in the KNP Another feature that is common amongst meteorological data, is the presence of short-range temporal dependence. At high levels this appears as clustering of values of the series. In extreme value analysis the degree of clustering of the data at high levels is measured by the extremal index θ. The extremal index is loosely defined by Coles (2001) as the reciprocal of the limiting mean cluster size. Limiting is in the sense of an increase in the extent of clustering as the threshold increases towards the maximum observed value. Two approaches of estimating the extremal index based on run length r will be discussed in this paper. In the first instance, clusters are formed by arbitrarily specifying run length r, such that a cluster is considered active until r consecutive values fall below the threshold u. The extremal index is then estimated as the quotient of the number of clusters over the number of exceedances of the threshold u. In the second instance, optimal run length is obtained through the estimation of the extremal index assuming the exceedance times to be observed values of a point process whose limit is the Poisson process distribution. This latter method was formulated by Ferro and Segers (2003), extending on the work of Hsing et al. (1988) and Smith and Weissman (1994), amongst others. The choice of r affects the bias-variance trade-off. A value of r that is too small raises concerns over the validity of the assumption of independent of cluster maxima. Conversely, large values of r could result in too few cluster maxima, hence, raising concern over the precision of the GP distribution s parameter estimates. In this study, using the threshold 30, the extremal index was estimated for the FDI data at each site using the method of Ferro and Segers (2003). Consequently, the optimal run lengths was determined. The results are presented in Table 2. From Table 2, the results suggest that clustering at high levels could be expected at all sites besides Letaba. The estimate of the extremal index for Letaba is close to 1, with the confidence interval that contains 1 within it s range. This suggests that there is weak dependence between FDI values that are far beyond the threshold 30. Satara has the highest optimal run length, however 10

Figure 6: Plot of the daily Mcarthur FDI excess values for the four sites at KNP the potential consequence of high run length is reduction in the precision of the GPD parameter estimates. Site ˆθ 95% C.I for θ r Shingwedzi 0.54 (0.42; 0.74) 19 Letaba 0.84 (0.69; 1.02) 4 Satara 0.60 (0.44; 0.88) 26 Skukuza 0.65 (0.52; 0.80) 13 Table 2: Estimates of the extremal indices and run lengths for FDI site-wise data, with u = 30. The acronym C.I.refers to Wald-type confidence intervals Finally the FDI series, as shown in Figure 3 of Section 2.2, exhibits seasonality. It is anticipated that the exceedances will behave similarly. Figure 7 shows a monthly scatter plot of exceedances of 30. It is evident that FDI values higher than 30 occur mainly during the July-December period. The aim in fitting a 11

generalized Pareto model that accounts for seasonality would be to capture this systematic variation due to seasonality. In EVT, there are two approaches to incorporating seasonality. The first is the separate seasons approach based on splitting the data into the respective seasons and then fitting a GP model to each season s data. The second approach is based on fitting continuous parametric functions to capture the seasonality. Across the four sites, Figure 7 shows most exceedances occurring in the second Figure 7: Monthly scatter plot of McArthur FDI values beyond 30 for each of the 4 sites half of the year. The high values occur more frequently in September in all sites, but Satara has also experienced frequent high FDI values in January. In this study seasonality will be incorporated into the model by fitting a generalized Pareto distribution which has a scale parameter in the form of a sinusoidal function. Model comparison will be based on the likelihood ratio statistic, to determine if there is sufficient evidence in support of incorporating seasonality in the model of high FDI values. 12

3 Results and Discussion The most important consideration in modelling threshold exceedances is the selection of the threshold. Details of the procedure followed in selecting the threshold for the FDI series in each site are given in Section 2.3. Thirty was selected as the threshold in all four sites. In the first case (a), the generalized Pareto distribution was fitted to the exceedances at each site, with the assumption of independent observations. Comparing the log-likelihoods and the 10-year return level estimates across the four sites (see Table 3), the characteristics of very high fire danger in Shingwedzi, differs markedly from those of the other sites. Interestingly, the maximums that were observed at each site are beyond the range of the 95% confidence intervals for the 10-year return level estimates. The N-year return level estimate refers to the level which is expected to be exceeded on average once in N years. It was found that the maximums are within the range of values that occur on average once in 50 years. There is strong evidence that for Letaba, the distribution of high FDI has an upper bound, however there is doubt for the other regions since zero falls within the range over which there is 95% confidence that it contains the true value of the shape parameter. The hypothesis that the shape parameter may be zero was tested by fitting an exponential model to the data and comparing the log-likelihood obatined from this fit with that of the GPD model. The obtained deviance statistics (0.3 for Shingwedzi, 0.53 for Satara and 1.08 for Skukuza) are all below χ 2 0.95,1. Hence, there is no strong evidence, at 5% significance level to support the suitability of an upper bounded distribution ξ < 0, for very high FDI values in Shingwedzi, Satara and Skukuza. In the second case (b), the assumption of independence of observations was re- Sites log-lik. ˆσ(s.e.(ˆσ)) ˆξ(95% prof. c.i.) 10-yr r.l.est.(95% prof. c.i.) Shingwedzi (a) -337.94 3.23 (0.39) -0.03 (-0.18; 0.18) 40.65 (39.07; 43.76) (b) -189.64 3.61 (0.63) -0.03 (-0.03; 0.31) 37.87 (36.43; 40.09) Letaba (a) -194.73 4.06 (0.54) -0.37 (-0.55; -0.16) 37.33 (36.66; 38.48) (b) -164.17 4.30 (0.61) -0.41 (-0.55; -0.18) 36.96 (36.23; 38.01) Satara (a) -153.44 3.21 (0.55) -0.07 (-0.26; 0.26) 38.01 (36.53; 40.81) (b) -98.56 4.41 (0.91) -0.19 (-0.59; 0.19) 36.34 (34.60; 38.22) Skukuza (a) -194.59 2.72 (0.37) -0.08 (-0.21; 0.15) 37.41 (36.24; 39.48) (b) -137.39 3.62 (0.57) -0.17 (-0.33; 0.09) 36.56 (35.38; 38.15) Table 3: Summary of results from fitting the GPD to FDI excesses, where in: (a) Exceedances were assumed to be independent (b) The FDI series was assumed to be stationary laxed, assuming instead that observations at any point in time are dependent on previously observed values. To filter out clustered observations, the FDI series at each site was declustered using optimal run lengths as given in Table 2. The GPD was fit to the declustered series (independent cluster maxima) using 30 as the threshold. There is substantial improvements in GPD fit after clustered observations were filtered from the analysis. The 10-year return level estimates decreased as well as the uncertainty about these estimates. Similar to findings in (a) there is no evidence in support of GPD with ξ < 0 for exceedance at 13

Shingwedzi as the confidence interval is mostly non-negative. In contrast, there is strong evidence for a negative shape parameter for Satara and Skukuza. It was discovered in Figure 3 that seasonality is inherent in the FDI series Sites log-lik. ˆσ(s.e.(ˆσ)) ˆξ (95% prof. c.i.) 10-yr r.l.est. (95% prof. c.i.) Shingwedzi -181.69 3.09 (0.56) 0.06 (0.06; 0.41) 37.65 (36.13; 40.21) Letaba -150.38 4.37 (0.64) -0.42 (-0.56; -0.18) 37.01 (36.67; 38.10) Satara -82.26 4.68 (1.08) -0.43 (-0.55; 0.01) 35.51 (35.10; 36.81) Skukuza -134.98 3.33 (0.56) -0.13 (-0.30; 0.17) 36.79 (35.53; 38.67) Table 4: Summary of results from fitting the GPD to FDI excesses for the months July-December, under the assumption of stationarity of the series at each site. Further, the suggestion in Figure 7 is that values of the McArthur FDI that are above 30, occur more frequently in the period July to December for each year. One approach to describe the variation due to seasonality in exceedances could be a separate seasons analysis. This approach could not be applied in this study because there were too few exceedances in the first half of the year. However, seeing that most exceedances are during the July-December period, the GPD was fitted to declustered exceedances in this period. The estimates for the July-December period are very similar to case (b) in Table 4. This implies that irrespective of whether the annual series of observed daily FDI values is used, or the July-December period where exceedances are most frequent, there is no impact on the level that is expected to be exceeded on average once every 10 years. The alternative approach to modelling seasonality is to regress continuous Sites log-lik. ˆσ 0 (s.e.( ˆσ 0 )) ˆσ 1 (s.e.( ˆσ 1 )) ˆσ 2 (s.e.( ˆσ 2 )) ˆξ(s.e.(ˆξ)) p-value Shingwedzi -189.44 1.25 (0.21) -0.07 (0.20) -0.10 (0.21) -0.04 (0.14) 0.82 Letaba -163.71 1.40 (0.16) -0.11 (0.11) 0.03 (0.21) -0.43 (0.11) 0.63 Satara -93.81 1.90 (0.22) 0.02 (0.16) 0.45 (0.14) -0.62 (0.23) 0.01 Skukuza -134.19 0.91 (0.26) -0.57 (0.19) -0.18 (0.30) -0.26 (0.10) 0.04 Table 5: Summary of results from fitting the GPD to FDI excesses with the systematic variation due to seasonality described by the covariates in the scale parameter sinusoidal functions of time to the parameter estimates. Due to the importance, the difficulty to model and interpret, the shape parameter is usually left as a constant. From equation (6), the shape parameter in this study is assumed to be constant, that is ˆξ(t) = ξ, whilst for t = 1, 2,..., 17532 ( ) ( ) 2π 2π log (ˆσ(t)) = σ 0 + σ 1 sin 365.25 t + σ 2 cos 365.25 t Working with logarithm of the scale parameter, ensures that the estimates of the scale parameter is non-negative for all values of t. Comparing the log-likelihood values in Table 5 to those presented in Table 3 case (b), there is an increase in log-likelihood after seasonality has been incorporated. According to the p-values 14

of the likelihood ratio comparison between the GPD model obtained in case (b) and the GPD model where seasonality was incorporated (see Table 5), the significance of this increase is negligible for Shingwedzi and Letaba. This means that there is strong evidence that the GPD model adequately estimates the tail behaviour of the distribution of the FDI series at Shingwedzi and Letaba, once temporal dependence has been dealt with. In contrast, accounting for seasonal effects in the GPD of FDI excesses at Satara and Skukuza lead to an improvement in the reliability of the estimation. 4 Conclusion There is merit, especially when working with environmental data, to investigate whether there is temporal dependence and non-stationarity. In this study accounting for temporal dependence, by declustering the data, resulted in the reduction of the estimated 10-year return levels. In studies where the occurrence of an extreme event could mean the endangerment of society, property or the livelihood of flora and fauna, an inflated estimate of the return level can lead managers to unnecessarily over invest in mitigation and prevention efforts. In this study only a description of the temporal dependence was presented by the extremal index, however, in practical cases knowledge about the dependence structure between observations might be important. This involves using multivariate extreme value methods in conjunction with statistical modelling techniques, as discussed by Smith (1988); Tawn (1988); Coles (1993) amongst others. Non-stationarity in extremes increases the complexity in modelling. This is mainly due to the subjectivity involved in choosing appropriate parametric functions for seasonality and long-term trends. Further, more complex methods have to be used to obtain return level estimates as they are unattainable under maximum likelihood inference. Recent research has sought to find solutions to these problems, such as using Markov Chain Monte Carlo methods to obtain return level estimates for non-stationary extreme. However, many such interventions are largely problem-specific. It was interesting to find that the return level estimates obtained using data from the July-December period were similar to those using the entire 12 month period. Further, for the northern regions of the park, incorporating seasonality in the form of sinusoidal functions did not improve the fit of the GPD over the case where seasonality was ignored. In contrast, in the southern regions of the park accounting for seasonality did lead to significant improvement in model fit. An important factor which was not pursued in this study, is the possibility that very high FDI values are spatially dependent. From the model results, very high FDI values at Shingwedzi have characteristics that are distinct from the rest of the sites. From Section 2.2, in moving from north to south, there is a variation in climate, hence it is expected that the fire potential would also vary across space. However, the significance of the spatial variation, if present, is not expected to be significant. An extension of this research would be to describe the extent of spatial variation and it s impact on the estimates of the 10-year return level, to gain a more complete understanding of fire potential over the 15

Kruger National Park. Acknowledgements We wish to acknowledge Sally Archibald and Brian van Wilgen for the data. They are from the Natural Resources and the Environment unit of the CSIR. References Coles, S. G. (1993). Regional modelling of extreme storms via max-stable processes. Journal of the Royal Statistical Society, 55:797 816. Coles, S. G. (2001). An Introduction to Statistical Modeling of Extreme Values. Statistics. Springer Verlag, London. Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds (with discussion). Journal of the Royal Statistical Society, 52:393 442. Ferro, C. and Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal statistical Society, 65:545 556. Fisher, R. A. and Tippett, L. H. C. (1928). Limiting forms of the frequency distributions of the largest or smallest member of the sample. Proceedings of the Cambridge Philosophical Society, 24:180 190. Gnedenko, B. V. (1943). Sur la distribution limite du terme maximum d une série aléatoire. Annals of Mathematics, 44:423 453. Hsing, T., Hüsler, J., and Leadbetter, M. R. (1988). On the exceedance point process for stationary sequences. Probability Theory and Related Fields, 78(1):97 112. Leadbetter, M. R. (1983). Extremes and local dependence in stationary sequences. Probability Theory and Related Fields, 65(2):291 306. Leadbetter, M. R., Lindgren, G., and Rootzén, H. (1983). Extremes and Related Properties of Random Sequences and Series. Springer Verlag, New York. Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3:119 131. Preisler, H. and Westerling, A. (2007). Statistical model for forecasting monthly large wildfire events in western united states. Journal of Applied Meteoorology and Climatology, 46:1020 1030. Smith, R. L. (1988). Extreme value for dependent sequences via the Stein-Chen method of Poisson approximation. Stochastic Processes and Applications, 30:317 327. Smith, R. L. (2003). Statistics of extremes with applications in environment, insurance and finance. Technical report, Department of Statistics University of North Carolina Chapel Hill NC 27599 3260 USA. 16

Smith, R. L. and Weissman, I. (1994). Estimating the extremal index. Journal of the Royal Statistical Society, 56:515 528. Tawn, J. A. (1988). Bivariate extreme value theory: Models and estimation. Biometrika, 75:397 415. Willis, C., van Wilgen, B., Tolhurst, K., Everson, C., DAbreton, P., Pero, L., and Fleming, G. (2001). The development of a national fire danger rating system for South Africa. Technical report, Prepared for the Department of Water Affairs and Forestry Pretoria by CSIR Water, Environment and Forestry Technology. Wybo, J., Guarniéri, F., and Richard, B. (1995). Forest fire danger assessment methods and decision support. Saftey Science, 20:61 70. 17