Likelihood inference for threshold exceedances

Size: px

Start display at page:

Download "Likelihood inference for threshold exceedances"

Myles King
5 years ago
Views:

1 31 Spatial Extremes Anthony Davison École Polytechnique Fédérale de Lausanne, Switzerland Raphaël Huser King Abdullah University, Thuwal, Saudi Arabia Emeric Thibaud École Polytechnique Fédérale de Lausanne, Switzerland CONTENTS 31.1 Introduction Max-stable and related processes Poisson process Classical results Spectral representation Exceedances Models General Brown Resnick process Extremal-t process Other models Asymptotic independence Exploratory procedures Inference General Likelihood inference for maxima Likelihood inference for threshold exceedances Examples Saudi Arabian rainfall Spanish temperatures Discussion Computing Acknowledgement Bibliography Introduction Climate change is perceptible in shifting patterns of rainfall, sea ice, temperatures and other phenomena. Although this affects entire distributions of observations, the largest impacts 711

2 712 Handbook of Environmental and Ecological Statistics on humans and on the environment that sustains us are likely to be due to extreme events. The fact that such events take place over time and space has led to a surge of research on modelling complex extreme events over the past decade, with many developments in statistical theory and methods that have already influenced applications. The purpose of this chapter, which should be read in conjunction with Chapter 8, is to summarise these developments. Space limitations and its rapid development make it impossible to fully describe this area of research. Other recent summaries are [13], [18], [20] and [67]. Spatial analysis of extremes may be performed for a variety of reasons, including: (a) the the estimation of changes in extremes, e.g., increases in daily maximum air temperatures, by combining data from different sites while accounting for the dependence among them; (b) the attribution of particular rare events, such as the 2003 European heatwave, to possible causes, such as human impacts on climate; (c) the estimation of risk at a single important location, such as the site of a nuclear installation. Borrowing strength by including data from elsewhere will often reduce estimation uncertainty, particularly when the data available at the site itself are limited which is often the case; (d) the estimation of overall risk for single large events, such as hurricanes or floods, which may be key to assessing potential losses for insurance companies or for planning public security interventions. Likewise risk of crop failure due to drought is a crucial element of food security. These different settings involve different emphases in modelling. Any spatial dependence in cases (a), (b) and (c) needs to be accounted for, but the details are not usually critical, whereas in case (d) the pattern of dependence will be a key feature. An individual farmer seeking insurance against loss of income will need the probability of disastrous weather at a specific spatial site, whereas failure of an important food crop due to a prolonged heatwave will depend on its spatial extent, so an overall risk assessment for public authorities must take into account the probability of simultaneous crop failure throughout a large region. In the former case a model providing accurate spatial interpolation to a single point will be adequate, whereas the second case will entail accurate modeling of a complex joint distribution, which is much harder and requires specialized models such as max-stable processes. An obvious question is why specialized models are needed in such settings. The key issue is that, just as the univariate Gaussian distribution provides a good model for averages but a poor one for extremes, leading to inaccurate estimates of tail probabilities, joint tail probabilities may be badly mis-estimated by standard geostatistical models. The Gaussian distribution has no shape parameter analogous to the degrees of freedom in the Student t distribution, and thus cannot encompass different rates of tail probability decay. Moreover the rate of decay of Gaussian probabilities for joint events is determined by the correlation coefficient, and, in addition to being insufficiently flexible, this model implies that the variables become ever more independent as the events become rarer, which is not always true in applications. This motivates the study of special models particularly adapted to multivariate and spatial extremes. We consider some quantity Y (x), say, to be concrete, annual maximum rainfall, where the point x lies in a domain X. Usually x = (s, t) has a spatial component s S and a temporal component t T, and for simplicity we suppose that X = S T. We wish to model the properties of {Y (x) : x X }, in order to estimate the probabilities of rare events, Pr{Y (x) R}, where the set R is extreme in some suitable sense. We might, for example, attempt to represent rainfall liable to lead to flooding at some point on a river by taking R to represent very large aggregate rainfall over a short period in a catchment area upstream.

3 Spatial Extremes 713 Although the goal is to understand the properties of Y (x) within X, data are generally available only at a finite subset X = S T of X. One basic classification of such problems depends on X : if S consists of a few sites, each with a long series of measurements at times t T, as might be the case with long-term temperature measurements, then the data may be time-rich but space-poor ; whereas if S consists of a grid of thousands of points but the set T is rather limited then the data may be space-rich but time-poor. Space- and time-rich data, such as five-minute radar observations of rainfall on a detailed spatial grid, in principle allow rich modelling of complex phenomena. In these three cases the observation times and sites are non-random, but in others they may be haphazard: for example, large forest fires appear at random points in space and time, and it is essential to model this in addition to the areas of the fires. A second classification depends on whether extrapolation from X to X is needed. The purpose of analysing temperature, rainfall or wind observations at measurement stations in S is often to make predictions for the entire set S, and this must be allowed by the model. The relationship between gridded data and the underlying phenomenon, e.g., between climate model rainfall reanalysis data and observed point rainfall, is often much less clear, and often in this setting it makes sense to take S = S, so only temporal extrapolation may be required. In Section 31.2 we set out a framework for complex extremes that generalizes the discussion in Chapter 8 and which allows a subsequent treatment of models both for maxima and for threshold exceedances. Section 31.3 describes some of prominent extremal models, and discusses how they may be extended to encompass an important phenomenon known as asymptotic independence. The following two sections concern inference: in Section 31.4 we sketch some ideas useful for initial data analysis and for the assessment of model fit, and in Section 31.5 we discuss the fitting of models by likelihood methods. Section 31.6 illustrates the earlier ideas and techniques in the context of Saudi Arabian rainfall in the region of Jeddah, and of extreme summer temperatures around the Spanish capital, Madrid. The chapter ends by outlining some topics that we have been unable to treat here Max-stable and related processes Poisson process Max-stable and related processes are the space- and space-time analogues of the extremevalue distributions arising in scalar and multivariate settings, and play a key role below. A general discussion is possible in terms of the Poisson process [51]. A Poisson process is a stochastic model for a set of random points P lying in a state space E, and is defined by two properties of the random variables N(A) = {x : x P A}, A E, that count how many points of P lie in a set A: for any collection of disjoint subsets A 1,..., A n E, the N(A 1 ),..., N(A n ) are independent; and N(A) has the Poisson distribution with mean µ(a), where µ is called the mean measure of the process. The measure µ must be non-atomic, i.e., µ({x}) = 0 for any singleton x E, and moreover µ( ) = 0, whereas µ(e) may be infinite, in which case N(E) is infinite with probability one. For technical reasons, below we only consider sets A for which µ(a) <.

4 w w Handbook of Environmental and Ecological Statistics r r FIGURE 31.1 Poisson process example. Left panel: first 1000 points (r, w) of a Poisson process sequentially generated on R2 with intensity function (31.3). Right panel: mapping of the points shown in the left panel to q = rw, shown as on the diagonal, with the mapping function shown by the curved grey lines. Consider, for example, a Poisson process with E = R2, that generates points x = (r, w) according to the mean measure µ{(r, ) (w, )} = 1 Φ( σ 1 log w σ/2), r r, w > 0, (31.1) where σ > 0 and Φ is the standard normal distribution function. This corresponds to setting P = {(Ri, Wi ) : i = 1, 2,...}, where R1 > R2 > > 0 are generated sequentially by setting iid iid Ri = (E1 Ei ) 1, with Ei exp(1), and Wi = exp(σεi σ 2 /2), where εi N (0, 1), independent of the Ei ; note that E(Wi ) = 1. The first 1000 points of a realisation of such a process are shown in the left-hand panel in Figure 31.1; the full realisation would have an infinity of points at the left-hand edge of the panel, because µ{(r, ) (0, )} = 1/r as r 0. The mean measure has an intensity function µ given by its derivative at the upper right corner of a rectangle (r0, r) (w0, w), i.e., µ (r, w) = = = 2 µ{(r0, r) (w0, w)} r w 2 {µ(r0, w0 ) µ(r, w0 ) µ(r0, w) µ(r, w)} r w 1 1 φ( σ 1 log w σ/2), r, w > 0, r2 σw (31.2) (31.3) where we have written µ(r, w) = µ{(r, ) (w, )} and so forth, and φ denotes the standard normal density function. Note that µ(a) equals the integral of µ over A. A key role in constructing models for spatial extremes is played by the mapping theorem, which under mild conditions states that if a function g does not create atoms, then P = g(p) also follows a Poisson process. As a simple example we might take g(r, w) = rw, corresponding to setting Qi = Ri Wi, which amounts to collapsing the points shown in the

5 Spatial Extremes 715 left-hand panel of Figure 31.1 onto the diagonal line shown in the right-hand panel. Clearly µ[{(r, q/r) : r > 0}] = 0 for each q > 0, so this transformation does not create atoms. We obtain the mean measure of this new Poisson process by noting that Q = RW > q if and only if R > q/w, and the corresponding set A q = {(r, w) : rw > q} has measure µ(a q ) = = σw φ( σ 1 log w σ/2) 1 σw φ( σ 1 log w σ/2) r=q/w ] [ 1 r 1 dr dw r2 q/w dw = 1 q E(W ) = 1, q > 0. (31.4) q Hence the process with points Q i = R i W i is also Poissonian, with the same mean measure as the R i. Note that the calculation leading to (31.4) requires only that W is positive and satisfies E(W ) = 1. The restriction of P to a subset E of E clearly also follows a Poisson process, with mean measure µ (A) = µ(e A). For example, if we let E = (0, ), consider R 1, R 2,... and let E = (z, ) for some z > 0, then we retain only those points R i exceeding z. As µ(e ) = 1/z is finite, these R i can be generated by first simulating a Poisson variable N with mean 1/z, and if N = n, simulating n independent variables on the interval (z, ) with survivor function z /z; these Pareto variables have probability density function z /z 2 (z > z ) Classical results To connect the above rather abstract discussion with classical results for maxima of a random sample X 1,..., X n, note for any real b n and positive a n that max(x 1,..., X n ) b n a n y N n {(y, )} = 0, (31.5) where N n (A) = n j=1 I{(X j b n )/a n A} for A R; N n (A) has a binomial distribution. If the extremal types theorem applies to X 1,..., X n, then there exist sequences {b n } and {a n } > 0 such that limiting probability of (31.5) is of generalized extreme-value (GEV) form, and this implies that the binomial random variables N n {(y, )} satisfy lim Pr [N n{(y, )} = 0] = exp n { ( 1 ξ y η τ ) 1/ξ }, (31.6) where a = max(a, 0) and ξ, η R and τ > 0 are respectively shape, location and scale parameters. Thus as n the point processes P n = {(X j b n )/a n : j = 1,..., n} converge to a limiting Poisson process P with mean measure µ{(y, )} = {1 ξ(y η)/τ} 1/ξ. The limiting generalized extreme-value distribution for the maximum corresponds to the probability that the set (y, ) P contains no points, a void probability of P. This distribution arises as a limit, and thus provides an approximation that should improve as n increases. Likewise the Poisson process approximation may be poor unless n is sufficiently large. The process P can be transformed to have mean measure 1/z on R by setting z = {1 ξ(y η)/τ} 1/ξ, and fitting (31.6) to sample maxima can be regarded as finding the η, τ and ξ that best achieve this. On this transformed scale the maximum has limiting distribution function exp( 1/z) (z > 0); this, the standard Fréchet distribution, corresponds to (31.6) with η = τ = ξ = 1.

6 716 Handbook of Environmental and Ecological Statistics The generalized extreme-value distribution is max-stable; in the case of independent standard Fréchet variables Z, Z 1,..., Z n, this means that n 1 max(z 1,..., Z n ) D = Z, n = 1, 2,..., (31.7) where = D means has the same distribution as ; these equalities correspond to setting b n = 0 and a n = n in (31.5). The generalized Pareto distribution (GPD) emerges on noting that large values of the transformed original variables Z j = {1 ξ(x j η)/τ} 1/ξ approximately follow a Poisson process on R with mean measure 1/z. Those points for which X j > u, or equivalently Z j > {1 ξ(u η)/τ} 1/ξ, for some high threshold u, satisfy Pr(X j > y X j > u) = Pr(X j > y) Pr(X j > u). = = {1 ξ(y η)/τ} 1/ξ {1 ξ(u η)/τ} 1/ξ ( 1 ξ y u ) 1/ξ, y > u, σ u where σ u = τ ξ(u η); this yields the generalized Pareto distribution, which is therefore the limiting model for threshold exceedances that corresponds to fitting (31.6) to maxima. This distribution is threshold-stable: it is easy to check that the distribution of each X j, conditional on the event X j > u > u, is also generalized Pareto, with the same shape parameter ξ. Like the GEV and Poisson process, the GPD approximation stems from a limit and may be poor if the threshold u is too low. As we shall now see, these results for scalar extremes, which are also derived in , extend to multivariate and spatial settings Spectral representation The Poisson process allows a general discussion of the classical extreme-value models in terms of a spectral representation, to be described below. This applies to the univariate setting, the multivariate setting, and to the functional setting, which we now describe. Max-stability is the key property of the generalized extreme-value distribution that underpins its use for the estimation of rare event probabilities. On the standard Fréchet scale this may be expressed as (31.7), and the analogous equation for max-stable processes, which generalize the GEV to multivariate and functional settings, is n 1 max{z 1 (x),..., Z n (x)} D = Z(x), x X, n = 1, 2,... ; (31.8) we require (31.8) to hold for the entire process {Z(x) : x X }. When X is a finite set this corresponds to the max-stability of the multivariate extreme-value distributions. To generalize our previous discussion, we extend the Poisson process (R, W ) so that {W (x) : x X } is a random process taking values in a suitable space of functions, with W (x) 0 and E{W (x)} = 1 for each x. The topology on the function space, and the distribution of W, must be sufficiently rich that the measure of (R, W ) remains non-atomic. Despite this added complexity, we continue to use the term points for (R, W ) and Q(x) = RW (x).

7 Spatial Extremes 717 FIGURE 31.2 Construction of a max-stable process. Left panel: first 100 points of a Poisson process {(ri, wi ) : i N}, where the wi (x) are realizations of a log-gaussian process on (0, 1) and the ri are shown at the left of the panel. Right panel: construction of the resulting realization of the max-stable process, z(x) = supi qi (x), (heavy) as the pointwise supremum of individual processes qi (x) = ri wi (x). Most of the qi are tiny because of the small associated ri. Any max-stable process may then be constructed through the spectral representation [24] Z(x) = sup Qi (x), x X, (31.9) i=1 where Qi (x) = Ri Wi (x), with the Ri defined as before and the Wi (x) independent replicates of W (x) independent of the Ri. Then the point Qi (x) may be interpreted as the ith event, whose overall scale and profile are respectively Ri and Wi (x). The left-hand panel of Figure 31.2 shows an example in which X = (0, 1) and W (x) = exp{σε(x) σ 2 /2}, where ε(x) is a stationary Gaussian process with mean zero, unit variance and correlation function corr{ε(x), ε(x h)} (x, x h X ); in the plot σ = 1 and the correlation function is exp( h/β), with β = 0.5. Since W (x) 0 and E{W (x)} = 1 for each x, the mean measure for Q(x) = RW (x) is (31.4), so Pr{Z(x) z} = exp( 1/z) for each x; the max-stable process is then called simple. It may be shown that any max-stable process may be constructed via (31.9), though the representation is not unique recall that in deriving (31.4) we saw that the same mean measure, and hence the same Poisson process, would have been obtained for any positive random variable W with unit expectation. This non-uniqueness may be exploited to provide efficient simulation algorithms [27, 28, 57, 59, 81]. To obtain the joint distribution of Z(x) for values of x lying in a subset D of X, note that the event {Z(x) z(x) : x D} occurs if and only if Ri Wi (x) z(x) for all x D and all i, and this is equivalent to Ri inf x D z(x)/wi (x) (i = 1, 2,...). In terms of the

8 718 Handbook of Environmental and Ecological Statistics corresponding Poisson process this implies that the set { } {(r, w) : rw(x) z(x), x D} c = (r, w) : r > inf z(x)/w(x), x D where superscript c denotes complement, is void. The measure of this set may be expressed as [ { }] dr W (x) ν(dw) inf x D {z(x)/w(x)} r 2 = E sup = V {z(x) : x D}, (31.10) x D z(x) say, where the expectation is over the measure ν of W, called the angular measure by [14]. Thus the required probability for Z(x) is a void probability of the Poisson process defined by Q(x) = RW (x), for x D, and so Pr {Z(x) z(x), x D} = exp [ V {z(x) : x D}]. (31.11) If D is finite, then this probability equals expression (8.5); it equals the standard Fréchet distribution exp{ 1/z(x)}, z(x) > 0, for a singleton D = {x}. The exponent function V plays a central role in inference and modelling. Since Z(x) z(x) (x D) if and only if all the Q i (x) = R i W i (x) fall into the set {(r, w) : rw(x) z(x), x D}, the measure of the Poisson process with points Q i (x) is determined by setting µ[{q : q(x) z(x), x D} c ] = V {z(x) : x D}; thus (31.11) is the void probability of {q : q(x) z(x), x D} c for the Poisson process. The case D = {x 1,..., x D } is particularly important, because in practice data are observed only on finite sets, and inference must therefore be based on observations of Z(x 1 ),..., Z(x D ). Writing z(x d ) = z d for simplicity, we let V (z 1,..., z D ) = µ(a z ), z 1,..., z D > 0, where A z is the complement of [0, z 1 ] [0, z D ] in E = [0, ) D \ {0}; the origin cannot be included because the limiting Poisson process gives infinite measure for sets that contain it. Expression (31.10) implies that V is homogeneous of order 1, i.e., V {az(x) : x D} = a 1 V {z(x) : x D}, a > 0. The max-stability property (31.8) is easily established, because the event n 1 max{(z 1 (x),..., Z n (x)} z(x) is equivalent to Z j (x) nz(x) (j = 1,..., n), and since the Z j (x) are independent, this occurs with probability (exp [ V {nz(x) : x D}]) n = ( exp [ n 1 V {z(x) : x D} ]) n = exp [ V {z(x) : x D}]. If we set z(x) z, then the homogeneity of V yields { } Pr {Z(x) z, x D} = Pr max Z(x) z = exp ( θ D /z), z > 0, (31.12) x D where the extremal coefficient θ D equals V evaluated with z(x) 1 (x D).Hence

9 Spatial Extremes 719 max x D Z(x) has a Fréchet distribution; its parameter θ D can be shown to lie between 1 and D. These bounds respectively correspond to perfect dependence and complete independence of the components of Z(x). In particular, if D = {x 1,..., x D }, then 1 θ D D, with smaller values indicating stronger dependence of the extremes. Like the variogram in equation (5.8), empirical estimates of θ D are useful for exploratory purposes and for checking the adequacy of fitted models; see 31.4 and Figure Exceedances Max-stable processes provide models for quantities such as annual maximum rainfall at a number of sites in a region, but in some applications it is preferable to model individual extreme events. In terms of the discussion above, this requires deciding which of the Q i (x) are to be regarded as extreme, and then basing inference on them, so we need to restrict the Poisson process {Q i (x)} to some suitable set E, for example by taking E = {q : ρ(q) > 1} for some risk functional ρ [23, 81]. We might, for example, take ρ 1 (Q) = sup Q(x)/z(x), x D ρ 2 (Q) = inf x D Q(x)/z(x), ρ 3(Q) = D Q(x)/z(x) dx, corresponding respectively to events that exceed a threshold function z(x) at at least one site in D, those that exceed z(x) everywhere in D and those for which the average of Q(x)/z(x) over D is sufficiently large. Thus the type of rare event selected can be tailored to the problem, as in Some care is needed in the choice of E, since likelihood inference involves the integral µ(e ), which must be both finite and capable of being computed rapidly. If the risk functional satisfies ρ(aq) = aρ(q) for a > 0, then ρ(q) > 1 implies that Rρ(W ) > 1, and then the argument leading to (31.4) can be extended to show that µ(e ) = E{ρ(W )} depends only on the distribution of W Models General By determining their profiles, and thus their extent, their shape, their roughness, and their degree of spatial dependence, the random processes W (x) play a key role in the construction of extreme events using (31.9). The W (x) must be non-negative and satisfy E{W (x)} = 1 (x X ), but many processes, both stationary and non-stationary, satisfy these minimal requirements. In applications it will usually be important to consider the scale of extreme events, their roughness, and their orientation, if any, and all of these may vary over the spatial domain X if it is large or very heterogeneous. For example, extreme rainfall events are typically smaller in size and more variable than major heat waves, and they may have a directionality given by particular weather patterns. Below we describe some widely-used forms for W (x) based on an underlying Gaussian or Student-t process; ideas from the extensive literature on Gaussian-based geostatistics may then be ported to the extremal setting. Anisotropic and non-stationary variograms can be used [4, 45].

10 720 Handbook of Environmental and Ecological Statistics Throughout the following discussion {ε(x)} represents a zero-mean Gaussian process with variogram γ(x 1, x 2 ) = var{ε(x 1 ) ε(x 2 )}, x 1, x 2 X, and {ε(x)} may be either stationary or intrinsically stationary. In the stationary case var{ε(x)} = σ 2 is constant on X, so the variogram is bounded, 0 γ(x 1, x 2 ) = 2[σ 2 cov{ε(x 1 ), ε(x 2 )}] 2σ 2, and we may define a correlation function c(x 1, x 2 ) = corr{ε(x 1 ), ε(x 2 )}. In the intrinsically stationary case the increments ε(x 1 ) ε(x 2 ) are stationary, and then the variogram may be unbounded. In this case we may define ε (x) = ε(x) ε(x ), where x X is a fixed site, so that ε (x ) = 0 with probability one, and then it can be shown that cov{ε (x 1 ), ε (x 2 )} = 1 2 {γ(x 1, x ) γ(x 2, x ) γ(x 1, x 2 )}, thereby expressing the covariance function of the process {ε (x)} in terms of the variogram of the original process. In many applications the variogram is taken to be stationary, i.e., it depends only on h = x 1 x 2, and isotropic, i.e., it depends only on the length h of h. A common choice is the so-called stable variogram, γ(h) = ( h /λ) κ, with λ > 0 and κ (0, 2], as variation in κ and λ yields max-stable processes with quite different roughnesses and scales for spatial dependence. The corresponding correlation function is c(x 1, x 2 ) = exp{ γ(h)}. Use of a stationary or intrinsically stationary process may impact the properties of the resulting extremal model: roughly speaking, models with bounded variance cannot represent independence of extremes at long distances, whereas those with unbounded variance can. As mentioned in 5.2.1, new variograms and covariance functions can be constructed from existing ones. For example, [1] found that a linear combination of a variogram representing meteorological dependence and a covariance function representing flow-dependence along a river network gave a good spatial model for extreme river flows Brown Resnick process A simple idea is to set W (x) = exp[ε (x) var{ε (x)}/2], where {ε (x)} is based on an intrinsically stationary Gaussian process. Clearly W (x) is non-negative and has unit expectation throughout X, and it turns out that (31.9) is a strictly stationary simple max-stable process, known as a Brown Resnick process [6, 49, 50], whose distribution depends only on γ. Such processes are popular because a wide range of variograms can be employed with them, they are relatively easily simulated [28, 29, 57], and likelihoods for them have an explicit form; see If the definition of W (x) is modified by replacing ε (x) by a stationary Gaussian process, then γ(x 1, x 2 ) is bounded and the resulting max-stable process will be dependent even at very long ranges, which is often unrealistic in applications. Hence an intrinsically stationary process is often preferred. The bivariate distribution function for such models may be written as Pr {Z(x 1 ) z 1, Z(x 2 ) z 2 } = exp { V (z 1, z 2 )}, z 1, z 2 > 0, x 1, x 2 X, with V (z 1, z 2 ) = 1 { a Φ z a log ( z2 z 1 )} 1 { a Φ z a log ( z1 z 2 )}, (31.13) where Φ is the standard normal cumulative distribution function and a is the positive

11 Spatial Extremes 721 square root of γ(x 1, x 2 ). In this case the pairwise extremal coefficient, corresponding to taking D = {x 1, x 2 }, may be written as { } θ(x 1, x 2 ) = V (1, 1) = 2Φ γ 1/2 (x 1, x 2 )/2, x 1, x 2 X, so small and large values of γ(x 1, x 2 ) respectively correspond to strong and to weak dependence; as γ(x 1, x 2 ) 0 and γ(x 1, x 2 ), we see that V (z 1, z 2 ) max(1/z 1, 1/z 2 ), V (z 1, z 2 ) 1/z 1 1/z 2, corresponding respectively to complete dependence of Z(x 1 ) and Z(x 2 ) and to their independence. The generalization of (31.13) to D variables involves the (D 1)-dimensional multivariate normal distribution function; see [56, Remark 2.5]. One drawback with using it is that repeated computation of this multivariate normal distribution as part of an iterative estimation algorithm can be costly, so in realistic cases one is constrained to D 50 or so, at least for maximum likelihood estimation [23] Extremal-t process Extremal-t processes [60] arise as limits of renormalized maxima of elliptical processes, which include Student t processes. In this case W (x) = m α ε(x) α, where α > 0, m α = π 1/2 2 1 α/2 /Γ{(α 1)/2} with Γ( ) the gamma function, and ε(x) is a stationary Gaussian process with mean zero, unit variance and correlation function corr{ε(x 1 ), ε(x 2 )} = c α (x 1, x 2 ). When α = 1 this yields the so-called Schlather process [72]. Under certain conditions the extremal-t process converges to a Brown Resnick process as α. For example, if c α (x 1, x 2 ) = exp{ (2α) 1 ( x 2 x 1 /λ) κ } for some λ > 0 and κ (0, 2], then the limiting Brown Resnick process has γ(x 1, x 2 ) = ( x 2 x 1 /λ) κ. Hence an extremal-t process will typically fit data at least as well as a Brown Resnick process. Its bivariate exponent function V (z 1, z 2 ) equals 1 z 1 T α1 { c b 1 b ( z2 z 1 ) 1/α } 1z2 T α1 { c b 1 b ( z1 z 2 ) 1/α }, z 1, z 2 > 0, (31.14) where T ν ( ) is the cumulative distribution function of the Student t distribution with ν degrees of freedom, c = c α (x 1, x 2 ) and b 2 = (1 c 2 )/(α 1). It is straightforward to see that θ(x 1, x 2 ) = 2T α1 [{(1 c)(1α)/(1c)} 1/2 ], which has upper bound 2T α1 {(1α) 1/2 }; thus independence can only be attained as α. The higher-order exponent functions are given by [56, Theorem 2.3]; see also [81] Other models A different class of max-stable models [65, 66, 76] is based on a hierarchical specification of the joint distribution of generalized extreme-value variables in terms of latent positive stable random variables that account for spatial dependence of the extremes; it can be viewed as a noisy version of the Smith model described below. One advantage of this formulation over those described above is that it allows Bayesian inference based on standard Markov chain Monte Carlo methods, and a second advantage is that the latent variable construction can itself have a hierarchical structure, though Monte Carlo algorithms for this model can be slow. Max-stable processes may be modified in various ways. For example, skew-normal and skew-t processes can be used, rather than Gaussian or t processes, and these may be useful

12 722 Handbook of Environmental and Ecological Statistics when asymmetries are present [2]. Another extension is multiplication by the indicator of a random compact set B independent of W ( ), resulting in W B (x) = W (x)i(x B)/E( B ), x X, where the expectation E( B ) must be finite in order to ensure that E{W B (x)} 1. If we write α(x 1, x 2 ) = Pr(x 1, x 2 B), then it is straightforward to check that the corresponding exponent function is ( 1 {1 α(x 1, x 2 )} 1 ) α(x 1, x 2 )V (z 1, z 2 ), z 1, z 2 > 0. z 1 z 2 Thus if, for example, B is a disk of fixed diameter L centred at a point uniformly distributed on X, then x 1 and x 2 cannot both lie in B when x 1 x 2 > L, and Z(x 1 ) and Z(x 2 ) must be independent, whereas if x 1 x 2, then the exponent function will almost be that of the max-stable process based on W. [43] use this idea to model extreme hourly rainfall; their random sets are stylized representations of the space-time extent of rainfall cells, each containing an independent process W (x). Yet more processes may be constructed by letting W (x) = f(x; Y )/f y (Y ), where Y has density f y on some space Y, and f(x; y) dy = 1 for each x. The simplest such model, Y the Smith process [77], has Y = X = R k, f y (y) an arbitrary density supported on R k and f(x; y) = φ k (x y; Ω), where φ k ( ; Ω) is the k-variate Gaussian density with covariance matrix Ω. The corresponding random functions Q(x) = RW (x) consist of k-variate Gaussian densities centered at random points of R k, whose shape is determined by Ω and whose maximum height is determined by R and Ω. These and similar processes obtained by using other standard probability density functions [25] are typically too smooth to be realistic in applications, and, though [85] proposed a rougher variant, we do not recommend them for use in practice Asymptotic independence Most of the models described above presuppose that the transformed extremes of the phenomenon under investigation are asymptotically dependent, i.e., lim Pr {Z(x 2) > z Z(x 1 ) > z} = 2 θ(x 1, x 2 ) > 0, x 1, x 2 X, (31.15) z where θ(x 1, x 2 ) = V (1, 1) is the exponent function corresponding to Z(x 1 ) and Z(x 2 ). This implies that dependence between Z(x 1 ) and Z(x 2 ) persists even for very rare events, since the conditional probability of a rare event at x 2, given the occurrence of an equally rare event at x 1, remains non-zero as z. An alternative, which arises if V (z 1, z z ) = 1/z 1 1/z 2, is that extremes are exactly independent, but this is not a useful model because data almost always show some degree of dependence. An intermediate possibility is needed, whereby the limit in (31.15) is zero, but the conditional probability decreases as z. This is often plausible from physical arguments and is common in applications, of which it can be an important feature. Such asymptotic independence models are a topic of current research [e.g., 46, 48, 61, 85]. Earlier models that have this property are based on copulas [38, 70, 71]. Spatial applications of asymptotic independence models may also be found in [19] and [80].

13 Spatial Extremes Exploratory procedures Simple nonparametric estimation procedures are valuable for initial analysis and for checking the validity of fitted models. Most are based on pairs of extremal observations, which offer direct insight into the dependence structure of the data, in analogy with quantities such as the variogram of classical geostatistics, which is defined in terms of the average squared difference between observations. Since extremal observations need not possess moments, a rank-based approach is preferable. Suppose that observations Z 1 Z(x 1 ) and Z 2 Z(x 2 ) at sites D = {x 1, x 2 } X arise from a simple max-stable process. A natural measure of the dependence between Z 1 and Z 2 is the extremal coefficient θ D defined in (31.12); we have θ D [1, 2], with smaller values corresponding to stronger dependence, so we can expect θ D to increase as the distance x 1 x 2 grows. Estimation of θ D may be based on the F -madogram [15] ψ D = 1 2 E { F (Z 1) F (Z 2 ) }, where F (z) = exp( 1/z) (z > 0) is the standard Fréchet distribution. Since a b = 2 max(a, b) a b, and the distribution functions of Z 1, Z 2 and max(z 1, Z 2 ) are respectively F (z), F (z) and exp( θ D /z) (z > 0), it is straightforward to check that ψ D = 1 2 (θ D 1)/(θ D 1) and therefore that θ D = 1 2ψ D 1 2ψ D. An estimator θ D based on independent pairs (Z 1,1, Z 2,1 ),..., (Z 1,n, Z 2,n ) may be obtained by replacing F (Z 1 ) by the rank estimator R 1,j /(n 1) (j = 1,..., n), where R 1,j is the rank of Z 1,j among Z 1,1,..., Z 1,n, and likewise with F (Z 2 ), and then replacing ψ D by {2n(n 1)} 1 n j=1 R 1,j R 2,j, its empirical counterpart. If the underlying process is stationary, then θ D depends only on h = x 1 x 2, so a plot of the θ D for all possible pairs of points D can be smoothed or binned and used for diagnostic or confirmatory purposes. As in the classical case, some care must be taken in interpreting such plots, since data from D underlying sites yield a cloud of D(D 1)/2 correlated estimates θ D ; moreover these empirical estimates may be very variable. [54] extend the F -madogram to estimation of the function V D (a, 1 a) (0 < a < 1). The anticipated increase in θ D with x 1 x 2 makes a natural analogy with the variogram, but since Pr(Z 2 > z Z 1 > z) 2 θ D as z, the quantity 2 θ D may be interpreted as the probability of a large event at x 2 given a correspondingly large event at x 1. A plot of 2 θ D against distance is known as an extremogram, analogous to the correlogram of classical time series analysis, and has been proposed for use particularly with time series [17]; resampling can be used to gauge which probabilities are indistinguishable from zero. Cross-extremograms or extremes of two or more series can also be defined, but like cross-correlograms, they are vulnerable to spurious correlations for which allowance must be made. The conditional probability 2 θ D has a natural interpretation for asymptotically dependent processes, but equals zero for asymptotically independent processes, whatever their rate of tail decay, so it is natural to seek a measure of asymptotic independence. See [11] and [52] for examples and further discussion. As many standard models are based on underlying Gaussian processes, which are defined in terms of their first and second moments, standard fitting procedures will tend to match these to the data, either explicitly or implicitly. We might therefore expect theoretical and

14 724 Handbook of Environmental and Ecological Statistics empirical estimates of pairwise quantities, such as the extremal coefficient θ(h) of two sites a distance h apart, to match well enough to make model failure more difficult to detect. This suggests using higher-order, in addition to pairwise, quantities for checking model fit. For example, if D = {x 1,..., x D } is a subset of sites, then it is easy to see that max x D Z(x) has the Fréchet distribution exp( θ D /z). Thus we might compare quantiles of the observed maxima for D with those from a fitted model, for sets D that are spatially close or spatially far-flung; if the fitted model is adequate, the empirical and simulated distributions should match. Similarly, empirical probabilities of certain spatial events involving more than two sites could be compared with estimates based on the model Inference General Although there is a growing literature on non-parametric inference for extremal processes [e.g., 32], parametric modelling is most used in applications. One reason for this is that it is usually more straightforward to interpret a parametric model, and a second is that a common goal of fitting such processes to data is the estimation of rare event probabilities. Typically this is performed by Monte Carlo simulation from one or more fitted models, which may be easier for a parametric model. Simulation from max-stable processes is discussed by [28] and references therein. [29] give an algorithm for simulation conditional on the observed values of the max-stable process at certain points. Statistical estimation for extreme values is always subject to mis-specification bias, since the classes of models fitted are typically based on asymptotic arguments that do not apply to the data themselves. It can be hard to detect lack of fit, because extremal models tend to be rather flexible and power for goodness-of-fit assessment is limited owing to the small number of extreme events. Moreover the uncertainty surrounding extrapolation to events even rarer than those already observed is typically so large that bias is a secondary consideration. It is essential to assess the sensitivity of conclusions both to the choice of the model and to the degree of extremeness of the data: this entails checking that the fitted models do not change greatly when the threshold or block size is varied over a reasonable range. Below we focus on likelihood-based estimation, but other proposals include robust estimation based directly on the joint distribution function [89], approximate Bayesian computation [35] and score-based methods [23]. Most of the discussion above has presupposed that the data have been transformed to the standard Fréchet scale, but in practice this transformation should be built into the data analysis. For example, if annual maxima at points D = {x 1,..., x D } X are available, then it will often be reasonable to suppose that their marginal distributions are generalized extreme-value with location, scale and shape parameters η(x), τ(x) and ξ(x) that vary smoothly over X as functions of parameters ϕ. Often we express η(x) and logτ(x) as linear combinations of basis functions and explanatory variables such as altitude, but assume that ξ(x) is constant, since variation in the shape parameter can be hard to detect and the value of ξ may be seen as an intrinsic aspect of the phenomenon being modelled. Since we can write Z(x; ϕ) = [1 ξ(x; ϕ){y (x) η(x; ϕ)}/τ(x; ϕ)] 1/ξ(x;ϕ), where Y (x) is the observed maximum at s and Z(x; ϕ) has a standard Frechét distribution, the joint probability density function of Y (x 1 ),..., Y (x D ) can be written in terms of the

15 Spatial Extremes 725 standardized variables as z(x d ; ϕ) = [1 ξ(x d ; ϕ){y d η(x d ; ϕ)}/τ(x d ; ϕ)] 1/ξ(x d;ϕ), d = 1,..., D, f Z(x1),...,Z(x D ) {z(x 1 ; ϕ),..., z(x D ; ϕ); ϑ} D d=1 dz(x d ; ϕ) dy d, (31.16) where ϑ represents parameters in the joint distribution of the Zs and the second term is a Jacobian. Maximum likelihood estimation will involve maximising a product of expressions of the form (31.16), for example with a term for each different year of observed maxima, over both ϕ and ϑ. Computational considerations may entail the estimation of ϕ separately from ϑ in a two-step procedure, though in principle this is undesirable because an overall measure allowing for the uncertainty of both ϑ and ϕ is then awkward to obtain. In the applications below we found it necessary to use this two-step approach, but then obtained a combined assessment of uncertainty by applying it to bootstrap datasets obtained by resampled subsets of the original data that could be taken as independent. Initial values for ϕ can be obtained by maximising an independence likelihood, which uses the same smooth functions for spatial variation of the marginal parameters η, τ and ξ as in (31.16), but treats the Zs as independent standard Fréchet variables. Similar computations are needed for likelihoods based on copulas Likelihood inference for maxima We now discuss the computation of the first term in (31.16) in the case of inference based on maxima. The key problem is that this involves D-fold differentiation of the joint cumulative distribution function (31.11), resulting in a combinatorial explosion. For example, with D = 3 we have f(z 1, z 2, z 3 ; ϑ) = exp( V ) ( V 123 V 1 V 23 V 2 V 13 V 3 V 12 V 1 V 2 V 3 ), (31.17) using shorthand notation in which V = V (z 1, z 2, z 3 ; ϑ), V d = V (z 1, z 2, z 3 ; ϑ) z d, V d1d 2 = 2 V (z 1, z 2, z 3 ; ϑ) z d1 z d2, and so forth. The number of terms in the density is the number of partitions of the set D; with D = {x 1, x 2, x 3 } there are five such partitions, corresponding to the terms on the right-hand side of (31.17), but with D = 10 there are around 10 5 terms. Thus in realistic settings it is infeasible to compute the full likelihood, and a number of ways to avoid doing so have been proposed. Composite likelihood inference [53, 82] uses simpler likelihood components. The most common is a pairwise likelihood, in which the components involve only pairs of observations, for example replacing (31.17) by f(z 1, z 2 ; ϑ) f(z 1, z 3 ; ϑ) f(z 2, z 3 ; ϑ). (31.18) Here only bivariate exponent functions such as (31.13) and (31.14) and their first and second derivatives are needed. For this approach to be useful, ϑ must be identifiable from the chosen marginal distributions, as is generally the case with Gaussian-based models. Although maximising a composite likelihood yields consistent estimators, ϑ C, say, measures of uncertainty and tools for model comparison must compensate for the re-use of observations in

16 726 Handbook of Environmental and Ecological Statistics terms such as (31.18). For example, if z 1, z 2, z 3 were independent, then (31.18) would equal {f(z 1 ; ϑ)f(z 2 ; ϑ)f(z 3 ; ϑ)} 2, i.e., the square of the correct likelihood contribution, and although ϑ C would then equal the usual maximum likelihood estimate, standard errors based on the observed information matrix Ĵ would be too small by a factor 2. Standard errors for ϑ C can be obtained by resampling or using a sandwich variance matrix Ĵ 1 KĴ 1. Suppose that the data fall into independent blocks indexed by i I, and that the ith block consists of dependent observations z i,d. In many cases I will correspond to data from different years or extreme events, assumed to be independent, whereas the data z i,1,..., z i,d for block i are likely to be dependent. Then with l i;d,d(ϑ) = logf(z i,d, z i,d ; ϑ) and pairwise likelihood i d <d l i;d,d(ϑ), we have K = i I d <d l i;d,d(ϑ) ϑ l i;d,d(ϑ) ϑ T ϑ= ϑ C, Ĵ = i I d <d 2 l i;d,d(ϑ) ϑ ϑ T ϑ= ϑ C. Data are often missing from some blocks, and then the internal sum is taken over only those pairs available for that block. The matrix K can be unstable, and then resampling of blocks to obtain standard errors is preferable, even though it can be slow. [9], [39] and [42] investigate the potential increase in estimator variance due to use of composite rather than full likelihoods, and conclude that pairwise likelihoods will often provide an acceptable compromise between statistical efficiency and computational complexity. In some cases the computational burden can be reduced and the statistical properties of ϑ C can be enhanced by weighting likelihood contributions l i;d,d(ϑ), for example downweighting pairs that are spatially distant, since these might be expected to contribute less information. Models are compared using the composite likelihood information criterion, CLIC = 2{tr(Ĵ 1 K) lc }, where l C is the maximum log composite likelihood; low values of CLIC are preferred. The re-use of observations can lead to very large values of CLIC, and it may be useful to rescale it for comparability with standard information criteria such as AIC; in (31.18), for example, the log likelihood contribution should be halved, and if there were D maxima, it would be divided by D 1. See [20] and [63] for more details. A second solution to the explosion of terms in expressions such as (31.17) is due to [78] and stems from noticing that the terms in expressions such as the right-hand side of (31.17) correspond to different ways in which the extremes can occur: the first corresponds to a single event leading to all three maxima, the last to maxima from three separate events, and the others to the ways in which three maxima might arise from two separate events. Thus if it was known that z 1 arose from one event, whereas z 2 and z 3 occurred together, then the appropriate likelihood contribution would be V 1 V 23 exp( V ) rather than the whole of (31.17). If the required derivatives of V are available then the resulting likelihood is computationally feasible even in high-dimensional cases, but the resulting maximum likelihood estimators depend on the choice of partition and can be badly biased if this choice is wrong [84]. In practice an empirical declustering rule is typically used to estimate the partition, and the approach does not seem to be very robust. It can be improved by estimating the partition in a more formal way, either using a stochastic EM algorithm [44] or, in a Bayesian context, by including the partition in a Monte Carlo sampling algorithm, and then integrating over it [79] Likelihood inference for threshold exceedances Analysis of individual events, typically those exceeding some suitable threshold, allows more detailed modelling and more precise inferences than analysis of maxima, and is particularly useful when assessing simultaneous risks. In principle the approach based on the Poisson

17 Spatial Extremes 727 process approximation using a parametric measure µ ϑ ( ) is straightforward: a set E having finite measure is chosen to select rare events, and then estimation is based on a Poisson process likelihood. Suppose, for example, that events q 1,..., q n have been observed at D points x 1,..., x D, so that q j {q j (x 1 ),..., q j (x D )}, and that E = {q : ρ(q) > 1} for some functional ρ. Then the selected data will be those q j for which ρ(q j ) > 1, say q 1,..., q n, and the likelihood is L Pois (ϑ) = exp { µ ϑ (E )} n j=1 µ ϑ (q j), (31.19) where µ ϑ (q) denotes the Poisson process intensity evaluated at q. If ρ(q) = D d=1 q(x d)/u d for some u 1,..., u D > 0, then E = {(r, w) : D d=1 rw(x d)/u d > 1} and µ ϑ (E ) is a constant, so only the second term on the right-hand side of (31.19) is needed; it is known as the spectral likelihood [12, 34], though perhaps angular likelihood would be a better term. Its drawback is that the events q j thus chosen may have some very small components, making the extremal approximation questionable. An alternative ensuring that all events are far from the axes would be to take ρ(q) = min d q(x d )/u d, but this can greatly reduce the number of events available for estimation, especially if extremal dependence is weak. We therefore often take ρ(q) = max d q(x d )/u d, but use a censored likelihood to avoid including the exact values of small components, as we now describe. This uses less precise information and thus decreases the precision of our estimates, but it reduces the bias due to inclusion of non-extreme values. Censored likelihood is most easily explained in the bivariate case shown in Figure 31.3, where R 2 is partitioned into four regions R I1,I 2, corresponding to the values of the indicators I 1 = I(Q 1 > u 1 ) and I 2 = I(Q 2 > u 2 ) of the events that the Q d exceed thresholds u d. If I 1 = I 2 = 1, i.e., both variables exceed the thresholds, then q R 1,1, the extremal model is regarded as valid and the likelihood contribution from q = (q 1, q 2 ) is µ ϑ (q). If I 1 = 1, I 2 = 0, then q R 1,0, so the precise value of q 1 is used but q 2 is left-censored at u 2, giving likelihood contribution u 2 µ 0 ϑ {(q 1, y)} dy. Likewise when I 1 = 0, I 2 = 1, the likelihood contribution is u 1 µ 0 ϑ {(x, q 2 )} dx. If I 1 = I 2 = 0, then q R 0,0, where the extremal model cannot be trusted, so we use a Poisson process likelihood with region E = R 2 \R 0,0 = E u, say, thus having censored contributions from R 0,1 and R 1,0 and uncensored contributions from R 1,1. This likelihood involves computation of µ(e u ) = V (u 1, u 2 ), where V is the exponent function corresponding to W (x); see (31.11). One can check that µ ϑ (q) = 2 V (q 1, q 2 )/ q 1 q 2 and that the terms needed for the censored contributions have forms such as V (u 1, q 2 )/ q 2, with obvious extensions to higher dimensions. Two important examples of the computations above correspond to the Brown Resnick and extremal-t processes, for which µ θ is given in [86] and [81] respectively. The corresponding censored likelihoods require integrals of µ θ, convenient forms for which are also given in these papers Examples Saudi Arabian rainfall To illustrate some of the challenges arising when modeling spatial extremes, we analyze block maxima of Saudi Arabian rainfall data, collected by the Tropical Rainfall Measuring Mission from 1 January 1998 to 31 December Over this 17-year period, three-hourly rainfall satellite measurements (in mm/hr) were gathered over the globe at a resolution

18 728 Handbook of Environmental and Ecological Statistics q(x) q(0.6) R01 R00 R11 R x q(0.2) FIGURE 31.3 Censored likelihood inference for individual events q i (x). Left panel: 100 realizations q i (x) with threshold u = 2 shown by the horizontal line, and dashed vertical lines at x 1 = 0.2 and x 2 = 0.6. Right panel: pairs (q i (x 1 ), q i (x 2 )) corresponding to the intersections of the dashed lines in the left-hand panel with the q i (x), and threshold u = 2 shown by the black lines. Points lying in R 1,1 have q i (x 1 ), q i (x 2 ) > u, and are treated as uncensored, those lying in R 0,1 have likelihood contributions corresponding to the event q i (x 1 ) u but q i (x 2 ) known exactly, those in R 0,0 correspond to q i (x 1 ), q i (x 2 ) u, etc. of 0.25, corresponding to around 28km on the equator. Here we consider a tropical arid region consisting of 750 of these grid cells near the Red Sea in Saudi Arabia; see Figure We used these rainfall data in our illustration because they are freely available online and provide fairly good spatio-temporal coverage with almost no missing values, although their spatial resolution is quite coarse and might not accurately represent the most localized rainfall events. Data with higher spatial resolution would clearly be desirable, but using them would increase the associated computational burden. In the last decade, Jeddah, the second largest city in Saudi Arabia, has been hit several times by convective storms, with short but intense rainfall causing extensive flash-floods, damage and deaths [26, 88]. In order to distinguish independent storms and to reduce the number of zero observations, we compute the daily cumulative rainfall at each grid cell, and then extract the 17 annual daily rainfall maxima for each cell, yielding a space-rich but time-poor dataset comprising a total of = annual maxima though spatial dependence means that the number of equivalent independent maxima each year is lower than 750. Basic statistics for the daily rainfall totals are reported in Table 31.1, and clearly suggest that the distribution of positive rainfall intensities is highly right-skewed and heavy-tailed. Because the study region is arid, some grid cells experienced no rain in certain years, so six annual maxima, in different grid cells, are exactly equal to zero. We deal with this by censoring of low maxima, as described below. An alternative would be to take maxima over multiple years, but this could significantly reduce the precision of our estimates, due to the low number of temporal replicates.

Spatial Extremes 729 40 N 26 N 30 N 23 N 20 N 20 N 10 N 17 N 0 N 20 E 30 E 40 E 50 E 60 E 36 E 39 E 42 E 45 E FIGURE 31.

1 Basic statistics for the daily rainfall totals (measured in mm/day), combined over all grid cells in the study region in Western Saudi Arabia.

19 Spatial Extremes N 26 N 30 N 23 N 20 N 20 N 10 N 17 N 0 N 20 E 30 E 40 E 50 E 60 E 36 E 39 E 42 E 45 E FIGURE 31.4 Satellite map and limits (white square) of the study region (left-hand panel), shown in detail in the right-hand panel for the Saudi Arabian rainfall data example. TABLE 31.1 Basic statistics for the daily rainfall totals (measured in mm/day), combined over all grid cells in the study region in Western Saudi Arabia. The central moments (first row) and empirical quantiles (second row) concern positive rainfall intensities (i.e., non-zero observations). Prop. wet days 4.2% Minimum 0.3 Mean 5.2 Variance 56.2 Skewness 3.9 Kurtosis % % % % % % 36.0 Maximum We assume that any annual rainfall maxima exceeding u = 3mm/day follow the generalized extreme-value (GEV) distribution with spatially-varying location, scale and shape parameters, η(s), τ (s) and ξ(s), where s S 0 indicates a specific grid cell and S 0 is the collection of all grid cells in the study region. That is, " 1/ξ(s) # z η(s) Pr{Z(s) z} = exp 1 ξ(s), z > u = 3mm/day, τ (s) where a = max(0, a) and Z(s) denotes an annual maximum observed at site s S 0. As there are only 17 temporal replicates, it is important to borrow strength across sites for accurate estimation of marginal parameters, but on a topographically and climatically diverse region of this size ( km2, see Figure 31.4), it is difficult to find simple relationships that well capture the spatial variation of the marginal parameters. We therefore adopt a local censored likelihood approach and, neglecting spatial dependence at this stage, we maximize the locally-weighted log likelihood [3, 8, 21] `s0 (η, τ, ξ) = 17 X X i=1 d N (s0 ) ω(ksd s0 k)log gu (zi,d ; η, τ, ξ) (31.20)

20 ^ Estimated shape parameter ξ(s) Handbook of Environmental and Ecological Statistics Index of grid cell 600 FIGURE 31.5 b Estimated GEV shape parameter ξ(s) at all grid cells (black dots) with 95% confidence intervals (grey segments), from a local likelihood fit using a biweight function with bandwidth b = 80km. Estimates are highly correlated across grid cells. The horizontal lines show ξb (solid) when assumed to be constant over the study region, with 95% confidence intervals (dashed). for each grid cell s0 S 0 ; here zi,d is the ith observed annual maximum observed at the dth nearby station, N (s0 ) is an index set corresponding to grid cells within a small neighborhood of s0, ω(h) is a weight function that depends on the distance h = ksd s0 k 0 between s0 and sd, and gu (z; η, τ, ξ) is the censored GEV likelihood contribution, i.e., gu (z; η, τ, ξ) = g(z; η, τ, ξ) if z > u, and gu (z; η, τ, ξ) = G(u; η, τ, ξ) if z u, where g and G respectively denote the GEV density and distribution functions. To obtain smooth marginal surfaces, we weight the log likelihood contributions by the biweight function ω(h) = {1 (h/b)2 }2 with bandwidth b > 0. The choice of b entails a bias-variance tradeoff, with larger values producing smoother surfaces with less local detail. Some experimentation showed that taking b = 80km yields reasonable marginal fits, while greatly reducing parameter estimation uncertainty by increasing the number of observations used to estimate the margins. The uncertainty of our estimates was assessed using a non-parametric bootstrap, whereby we resampled the 17 years of data, in order to retain the spatial structure of the observations, but assuming that the years are independent replicates. The estimated location parameter η(s) and, to some extent, the scale parameter τ (s), are quite well-estimated, but the shape parameter estimates (corresponding standard errors) vary from 0.28 (0.30) to 0.49 (0.13), implying completely different tail behavior for the extremes, which is harder to interpret. The shape parameter is typically difficult to estimate, see Figure 31.5, and here it is plausible that there are too few years to obtain reliable results. To deal with this, we fix ξ(s) ξ over the entire region S 0, and use a profile likelihood based on (31.20) to estimate ξ over the grid Gξ = { 0.50, 0.49,..., 1.00}, i.e., P ξb = arg maxξ Gξ s S `s (b ηξ (s), τbξ (s), ξ), where ηbξ (s) and τbξ (s) are the estimated location and scale parameters for fixed ξ. We obtain ξb = 0.14 (0.03), which implies that the rainfall distribution is slightly heavy-tailed. Figure 31.5 suggests that a constant shape parameter is reasonable for most grid cells, while being estimated with much lower uncertainty. The estimated location and scale parameters, ηbξb(s) and τbξb(s), are displayed in Figure 31.6 and reveal interesting spatial patterns reflecting the topography of the study region. The middle

21 Spatial Extremes 731 and bottom panels of Figure 31.6 show the M-year return levels, z M (s) = G 1 {1 1/M; η(s), τ(s), ξ(s)}, s S, for M = 10, 20, 50 and 100 years, estimated at all grid points of the study region. The city of Jeddah appears to be at high risk, which corroborates empirical evidence, although our results must be interpreted with care, because of the large estimation uncertainty due to the heavy tails and the relatively short time series, and the rather coarse spatial resolution. After transforming the data to a common unit Fréchet scale, we used four stationary isotropic max-stable processes to assess the spatial dependence of extreme rainfall events over the entire study region. The models used were the Brown Resnick process with variogram γ(s 1, s 2 ) = ( s 1 s 2 /λ) κ, λ > 0, κ (0, 2], the Schlather process with powered exponential correlation function c(s 1, s 2 ) = exp{ γ(s 1, s 2 )}, the extremal-t process with same correlation c(s 1, s 2 ) and degrees of freedom α > 0, and the Smith process with Gaussian density kernels defined through the diagonal covariance matrix Ω = λ 2 I 2, which corresponds to the Brown Resnick process with κ = 2. We also considered geometrically anisotropic models, obtained by replacing the Euclidean distance s 1 s 2 in the models above by the Mahalanobis distance h M, where ( h 2 cos(θ) M = (s 1 s 2 ) T sin(θ) ) ( ) ( ) T sin(θ) 1 0 cos(θ) sin(θ) cos(θ) 0 a 2 (s sin(θ) cos(θ) 1 s 2 ), with a > 0 and θ [ π, π]. The parameter a reflects the degree of anisotropy, as it corresponds to the ratio of the principal axes of dependence contours, while θ is the angle with respect to the west-east direction; see [4]. All models were fitted by pairwise likelihood, using a random selection of 5100 pairs of sites less than 800km apart with pairwise distances being approximately uniform in [25, 800]km, and conditioning on the observed maxima exceeding the threshold u = 3mm/day. The pairwise conditional likelihood may be expressed in shorthand notation as l(ϑ) = 17 { exp( V w d,di(z i,d > u )(V1 V 2 V 12 ) d, z i,d > u d )log p(u d, u d ) i=1 d <d where V, V 1, V 2 and V 12 are the bivariate exponent function and its partial and mixed derivatives of the corresponding max-stable model, evaluated at (z i,d, z i,d ), u d and u d denote the threshold u = 3mm/day transformed to the unit Fréchet scale, p(u d, u d ) = 1 exp( 1/u d ) exp( 1/u d ) exp{ V (u d, u d )} is the probability that the threshold u is exceeded at both sites simultaneously, and w d,d is a binary weight determined by the selection of pairs. In less arid climates it would often be reasonable to assume that all the annual maxima were large enough for the extremal model to apply, in which case one would effectively set u d = u d = 0 and p(u d, u d ) = 1 in the expression above, so that all the indicator functions equal unity. Table 31.2 reports the parameter estimates and 95% confidence intervals obtained using a non-parametric bootstrap to resample years of data and thus reflect the overall uncertainty from estimating both the margins and the dependence structure. It also reports the composite likelihood information criterion, CLIC, rescaled to be comparable to the AIC in the independence case, which may be used to compare fitted models. As the confidence interval for the anisotropy parameter a always includes unity, there is no strong evidence against isotropy. Overall, the CLIC values are only very slightly in favor of anisotropic models, except for the extremal-t model, for which the isotropic model seems to perform better. The CLIC suggests that the isotropic extremal-t model is the best, but the estimated degrees of freedom α is quite unstable, which results in large and asymmetric },

22 732 Handbook of Environmental and Ecological Statistics 26 N 26 N Medina Medina 23 N 23 N Makkah Makkah Jeddah 20 N 20 N Location Scale N 36 E E 42 E 45 E 26 N 17 N 36 E 39 E 42 E 45 E 42 E 45 E 42 E 45 E 26 N Medina Medina 23 N 23 N Makkah Makkah Jeddah Jeddah 20 N 20 N [mm/d] [mm/d] N 36 E E 42 E 45 E 26 N 17 N 36 E 39 E 26 N Medina Medina 23 N 23 N Makkah Makkah Jeddah Jeddah 20 N 20 N [mm/d] [mm/d] N 36 E Jeddah E 42 E 45 E 17 N 36 E 39 E FIGURE 31.6 Saudi Arabian rainfall analysis. Top: location η(s) (left) and scale τ (s) (right) parameters of the GEV distribution estimated from a local likelihood fit using a biweight function with bandwidth b = 80km. The estimated shape parameter, constant over the region, is ξb = Middle and bottom: Estimated 10-year (middle left), 20-year (middle right) and 50-year (bottom left) and 100-year (bottom right) return levels (in mm/day) at each grid point of the study region, plotted on a common logarithmic scale.

23 Spatial Extremes 733 TABLE 31.2 Estimated dependence parameters for all max-stable processes fitted to the Saudi Arabian rainfall annual maxima. Subscripts denote 95% confidence intervals obtained using a nonparametric bootstrap procedure. Results in the top rows are for isotropic models, while the bottom row are for models with geometric anisotropy. The last column reports the difference in composite likelihood information criterion with respect to the best model, rescaled to be comparable to the AIC in the independence case; lower values are better. Isotropic max-stable models Model λ [km] κ α a θ CLIC Smith 34 [26,39] 124 Schl. 44 [34,53] 1.46 [1.19,1.84] 362 B. R. 13 [8,16] 0.71 [0.52,0.94] 23 Ext.-t 333 [165,1357] 0.90 [0.63,1.13] 5.9 [3.9,13.1] 0 Anisotropic max-stable models Model λ [km] κ α a θ CLIC Smith 31 [28,37] 0.82 [0.72,1.33] 0.19 [ 1.48,2.02] 119 Schl. 42 [31,54] 1.47 [1.20,1.82] 0.89 [0.70,1.28] 0.23 [ 0.32,1.26] 362 B. R. 12 [7,21] 0.72 [0.53,0.95] 0.71 [0.52,1.81] 0.12 [ 0.25,1.41] 21 Ext.-t 424 [176,1352] 0.90 [0.64,1.10] 6.2 [4.1,14.8] 1.37 [0.55,1.66] 1.37 [ 0.30,1.42] 41 confidence intervals for α and the range λ. Therefore, the isotropic Brown Resnick model, which provides a good balance of fit and parsimony, may be preferred. Its estimated shape parameter κ suggests that the fitted process is fairly rough, as might be expected, and the scale parameter λ seems to be both small and quite well-estimated. The Schlather model does not allow independence at long ranges, and gives the worst fit, followed by the Smith model, which is too smooth to be realistic. The relative small difference between the CLIC values for the Brown Resnick and extremal-t model suggests that their fits are similar. Figure 31.7 plots the fitted bivariate extremal coefficients θ(s 1, s 2 ) as a function of distance h = s 1 s 2 for all isotropic models, compared to the empirical counterpart, binned by distance class. The Brown Resnick and extremal-t max-stable processes provide reasonable fits, the Smith model is too rigid to appropriately capture the decay of dependence with distance, and the Schlather model is unable to capture the long-range independence. These problems are common in applications, where these models are rarely better than the others. To further explore how the fitted isotropic max-stable models capture higher-dimensional distributions, Table 31.3 compares empirical and fitted extremal coefficients θ D [1, D] for sets of sites D = {s 1,..., s D } around Jeddah. All the fitted models tend to overestimate the strength of dependence, but the Brown Resnick process provides the best fit and the Schlather model the worst fit, as one might expect from Figure Surprisingly, the fit of the extremal-t model in high dimensions differs slightly from the Brown Resnick model, despite their similar results for pairs of sites. Figure 31.8 compares the map of annual rainfall maxima for 2009, a year with intense rainfall in Jeddah, with data simulated from the fitted isotropic Brown Resnick model, using the exact simulation algorithm of [28]. Overall, the spatial patterns and forms of dependence observed in the simulations tend to agree with the data, although the 2009 annual maxima seem to be slightly smoother. To illustrate the ability of this modeling framework to assess spatial aggregated risk, we generate 10 5 independent simulations from the fitted isotropic

24 734 Handbook of Environmental and Ecological Statistics Extremal coefficient Empirical Smith Schlather Brown Resnick Extremal t Distance [km] FIGURE 31.7 Saudi Arabian rainfall analysis. Empirical and fitted bivariate extremal coefficients θ(s 1, s 2 ), plotted as functions of distance s 1 s 2, for the isotropic Smith, Schlather, Brown Resnick and extremal-t models. The empirical extremal coefficients are binned by distance class, and the grey shaded area is a bootstrap 95% pointwise confidence band. Brown Resnick model, which we then use to compute the probability p(v) that the annual maximum averaged over the 14 grid cells less than 50km from Jeddah or Makkah (the second and third largest cities in Saudi Arabia with about 4 and 2 million inhabitants, respectively) exceeds a given high threshold v, i.e., p(v) = Pr{ S 1 s S Z(s) > v} with S = N (s J ) N (s M ) S for some neighborhoods N (s J ) and N (s M ) of Jeddah and Makkah, respectively. We obtain p(50mm/day) = (return period 13.9 years), p(71.1mm/day) = (return period 53.6 years) and p(100mm/day) = (return period years), where 71.1mm/day corresponds to the daily rainfall level observed on November 25, 2009, during the Jeddah floods, which caused 122 fatalities Spanish temperatures To illustrate challenges in the modeling of threshold exceedances, we analyze extreme temperatures measured at 12 stations near Madrid in Spain, see Figure 31.9; five stations are located in the Madrid conurbation. The station elevations vary from 540m for Toledo to 1894m for Navacerrada, resulting in different climates over this region. Daily maximum temperatures for the years were obtained from the European Climate Assessment and Dataset website. To avoid having to treat seasonality we focus on the months from June to September. Most of the time series have missing values; just one is complete, and only five stations have records before Missing values are typical of environmental applications based on observational data. This application is an example of modeling a time-rich but space-poor dataset, as estimating extremal dependence based on 12 stations is challenging. Space-time modeling of extreme temperatures must take into account two aspects: the space-time variability of marginal distributions (climate) and the space-time dependence of individual observations resulting from heatwaves over periods of a few days (weather). In this application we focus on modeling the spatial dependence at both the marginal and observation levels. We assume that the marginal distributions are constant over time

25 Spatial Extremes 735 TABLE 31.3 Saudi Arabian rainfall analysis. Empirical and fitted extremal coefficients θd [1, D] for sets of sites D = {s1,..., sd } around Jeddah. The fitted coefficients are for the four isotropic models and are calculated using the exact expression of the exponent function V in dimension D. Subscripts denote 95% bootstrap confidence intervals. Region D = {s1,..., sd } [39, 40] E [21, 22] N [39, 41] E [21, 23] N [39, 42] E [21, 24] N 26 N D Smith N Medina 23 N Makkah Makkah Jeddah Makkah Jeddah 20 N [mm/day] [mm/day] E 42 E 45 E 17 N 36 E Jeddah 20 N [mm/day] Ext.-t N 20 N B. R Medina 23 N Schl N Medina 17 N 36 E Empirical θbd 4.17[1.90,6.44] 14.25[6.50,22.00] 20.90[9.54,32.26] E 42 E 45 E 17 N 36 E 39 E 42 E 45 E FIGURE 31.8 Saudi Arabian rainfall analysis. Annual rainfall maxima for 2009 (left), and two simulated maps (middle and right) based on the fitted isotropic Brown Resnick max-stable model. represents values below the threshold of u = 3mm. and attempt to identify clusters of spatial extreme high temperatures to avoid modeling temporal dependence. We identified extreme observations by retaining only observations exceeding thresholds taken to equal the 0.95 quantiles of each of the 12 series. These observations are grouped, due to the temporal dependence resulting from hot spells, and to avoid modeling this we identified clusters of extremes over space and time and reduced each cluster to one observation at each site. Two extreme observations possibly occurring at different site were assumed to be in the same cluster if they occurred within five days of each other. Applying this procedure gave 311 separate spatial extreme events with observations at all 12 stations, which we used for our modeling. This is equivalent to selecting cluster extreme observations using the risk functional ρ(q) = maxd d=1 q(sd )/ud where the ud are the thresholds at the D = 12 sites. As with the rainfall example, the marginal and dependence structures were estimated separately. We first fitted the extreme observations at each site using the generalized Pareto distribution with spatially varying parameters, and then used the fitted marginal model to transform the data to the unit Fre chet scale and used the censored Poisson process approach to estimation. For the marginal model we used an independence likelihood to fit generalized Pareto distributions with constant shape parameter and with a scale parameter that varies linearly with the station elevation. The estimated shape parameter was ξb = 0.39[ 0.43, 0.37] ; the

736 Handbook of Environmental and Ecological Statistics 4400 4450 4500 4550 4600 Soria Valladolid Segovia Navacerrada Avila Madrid Cuenca Toledo 4400 4450 4500 4550 4600 Soria Valladolid Segovia

Left panel: the study region around Madrid. Right panel: tenyear return level for summer maximum temperatures. The axes are easting and northing coordinates in km.

26 736 Handbook of Environmental and Ecological Statistics Soria Valladolid Segovia Navacerrada Avila Madrid Cuenca Toledo Soria Valladolid Segovia Navacerrada Avila Madrid Cuenca Toledo Elevation [m] Temperature [Celsius] FIGURE 31.9 Spanish temperature analysis. Left panel: the study region around Madrid. Right panel: tenyear return level for summer maximum temperatures. The axes are easting and northing coordinates in km. subscripts here and below are 95% confidence intervals obtained by bootstrapping the 311 multivariate extreme events. The estimated shape parameter for extreme temperatures is often negative, though here the uncertainty is particularly small. This model gave a reasonably good fit at the 12 sites of our dataset. The right-hand panel of Figure 31.9 shows the ten-year maximum summer temperature return levels given by this marginal model. Extremal models for threshold exceedances presuppose that the data are asymptotically dependent and threshold-stable. To assess this we computed conditional exceedance probability curves, Pr{F (Y 2 ) > u F (Y 1 ) > u}, u (0, 1), for all pairs of stations. The left-hand panel of Figure shows the conditional probability curve for one of the Madrid stations and Avila. This curve supports asymptotic dependence, as the estimate of (31.15) seems to be strictly positive. Moreover, taking into account the large uncertainty close to u = 1, these showed no evidence of non-stability above the 0.95 quantile, and thus cast no doubt on the suitability of the Poisson process model. Exploratory analysis shows that extremal dependence is strong and barely weakens with distance over this small region; see the right-hand panel of Figure Heatwaves affect large regions simultaneously, resulting in strong spatial dependence. A process q(s) corresponding to a Brown Resnick model with variogram γ(s 1, s 2 ) = ( s 1 s 2 /λ) κ, λ > 0, κ (0, 2], was fitted to the transformed data using the method described in Section We used the censored full likelihood approach of [86] to obtain λ = 139 [49,303] km and κ = 0.17 [0.13,0.21]. The large value and uncertainty for λ is due to the strong dependence in extreme temperatures, which makes the likelihood flat. The small value of κ is due to the small-scale variability in the data here the fitted log-gaussian random fields in the Poisson process model are non-differentiable. Figure shows the fitted extremogram for our Brown Resnick model, in terms of distances between any two sites. The model seems to underestimate the dependence somewhat. The empirical exceedance probability at very small distances is less than unity, which forces the value of κ to be very small.

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models