Geophysical Journal International

Geophysical Journal International Geophys. J. Int. (2011) 184, 759 776 doi: 10.1111/j.1365-246X.2010.04857.x Global earthquake forecasts Yan Y. Kagan and David D. Jackson Department of Earth and Space Sciences, University of California, Los Angeles, CA 90095-1567, USA. E-mail: ykagan@ucla.edu Accepted 2010 October 17. Received 2010 September 24; in original form 2010 June 17 SUMMARY We have constructed daily worldwide long- and short-term earthquake forecasts. These forecasts specify the earthquake rate per unit area, time and magnitude on a 0.5 degree grid for a global zone region between 75N and 75S latitude (301 by 720 grid cells). We use both the Global Centroid Moment Tensor (GCMT) and Preliminary Determinations of Epicenters (PDE) catalogues. Like our previous forecasts, the new forecasts are based largely on smoothed maps of past seismicity and assume spatial and temporal clustering. The forecast based on the GCMT catalogue, with the magnitude completeness threshold 5.8, includes an estimate of focal mechanisms of future earthquakes and of the mechanism uncertainty. The forecasted tensor focal mechanism makes it possible in principle to calculate an ensemble of seismograms for each point of interest on the Earth s surface. We also introduce a new approach that circumvents the need for focal mechanisms. This permits the use of the PDE catalogue that reliably documents many smaller quakes with a higher location accuracy. The result is a forecast at a higher spatial resolution and down to a magnitude threshold below 5.0. Such new forecasts can be prospectively tested within a relatively short time, such as a few years, because smaller events occur with greater frequency. The forecast s efficiency can be measured by its average probability gains per earthquake compared to the spatially or temporally uniform Poisson distribution. For the short-term forecast the gain is about 2.0 for the GCMT catalogue and 3.7 for the PDE catalogue relative to a temporally random but spatially localized null hypothesis. Preliminary tests indicate that for the long-term global spatial forecast the gain is of the order 20 25 compared to the uniform event distribution over the Earth s surface. We can also prospectively test the long-term forecast to check whether it can be improved. Key words: Probabilistic forecasting; Probability distributions; Earthquake interaction, forecasting, and prediction; Seismicity and tectonics; Statistical seismology; Dynamics: seismotectonics. GJI Seismology 1 INTRODUCTION The importance of earthquake forecasting for seismic hazard and risk estimation and the difficulty of resolving basic differences in forecast models have motivated an international effort. Giardini (1999) and Giardini et al. (1999) presented a global seismic hazard assessment program and a hazard map. The hazard map was an assemblage of many regional maps constructed by local geologists and seismologists. A recent editorial (Anonymous 2008) in the journal Nature Geoscience proposes a program of global seismic risk evaluation using a unified standard. Stillwell (2009) discussed The Global Earthquake Model (GEM), a five-year, public private partnership launched in 2009 to create a global database of earthquake information and reliable earthquake catalogues for the magnitude >5 events. Marzocchi & Lombardi (2008) analysed two worldwide catalogues in different time-magnitude windows to investigate potential earthquake forecasting performance. The Southern California Earthquake Center (SCEC) recently organized the Collaboratory for the Study of Earthquake Predictability (CSEP). A major focus of CSEP is to develop international collaborations between the regional testing centres and to accommodate a wide-ranging set of prediction experiments involving geographically distributed fault systems in different tectonic environments. The CSEP is extending the tests to several natural laboratories around the globe. CSEP also tests long- and short-term forecasts updated daily. Our forecasting technique is to establish a statistical model which fits the catalogue of earthquake times, locations and seismic moments, and subsequently to base forecasts on this model. While most components of the model have been tested (Kagan & Knopoff 1987; Kagan 1991; Jackson & Kagan 1999; Kagan & Jackson 2000; Kagan et al. 2010), some require further exploration and can be modified as our research progresses. Our purpose in this work is to extend the clustering model that we used earlier to make testable forecasts worldwide. Our previous forecast model was based on constructing a map of smoothed rates of past earthquakes. We used the Global Centroid Moment Tensor catalogue (GCMT) explained by Ekström et al. C 2010 The Authors 759

760 Y. Y. Kagan and D. D. Jackson (2005), because it employs relatively consistent methods and reports tensor focal mechanisms. The focal mechanisms allow us to estimate the fault plane orientation for past earthquakes, through which we can identify a preferred direction for future events. Using the forecasted tensor focal mechanism, it may be possible to calculate an ensemble of seismograms for each point of interest on the Earth s surface. The GCMT catalogue imposes some restrictions: it began only in 1977, and it is complete only for earthquakes with magnitudes of about 5.8 and larger (Kagan 2003). We also experiment with a forecast based on the U.S. Geological Survey (2008) PDE (Preliminary Determinations of Epicenters) catalogue. While until recently the catalogue did not report focal mechanisms routinely, it uses relatively stable methods and reliably reports earthquakes about one magnitude unit smaller than does the GCMT. As we see later, the PDE catalogue yields results similar to the GCMT and can thus enable credible forecasts for tests in the CSEP effort. Both the GCMT and PDE are global catalogues employing data from worldwide seismic networks. We advocate the use of global catalogues even in the CSEP test laboratories where regional catalogues with lower magnitude completeness thresholds are available. The argument in favour of using global catalogues is that they are more homogeneous than local catalogues, and they lack spatial boundary effects which greatly complicate the analysis of local catalogues. Moreover, local seismicity may be dominated by a few aftershock sequences of strong events, such as the m 7.5 1952 Kern County and the m 7.3 1992 Landers earthquakes in southern California. Parameter estimation in local earthquake catalogues is usually biased by lack of account for seismicity outside of a catalogue space time window. Global catalogues do not have the spatial boundary effects and the past seismicity influence is averaged out by many independent earthquake sequences present in a catalogue. Explosions and earthquakes caused by volcanic and geothermal activity are more likely to contaminate earthquake records in local and regional catalogues. Expanding our forecast program worldwide presents some new challenges. Until now the earthquake forecast technique was applied to the rapid tectonic deformation areas: subduction zones and other plate boundaries. In applying the forecast globally we extend it to plate interiors and continental areas (active and non-active) where the earthquake rate is low, but human population is large and more exposed to seismic risk. Our long-term forecast is based on averaging and smoothing an earthquake record. In areas of high seismicity, the available catalogues are adequate to forecast future earthquake activity. For example, aftershock sequences in subduction zones last on average 10 15 yr (see fig. 2b by Kagan & Jackson 1999), thus a catalogue with the duration of 30 40 yr may be sufficient to evaluate the longterm earthquake rate with an insignificant bias due to aftershocks. In slowly deforming plate interiors, the recurrence time of large earthquakes is hundreds to thousands of years and the places of relatively high activity are likely to be aftershock sequences of long past earthquakes (Ebel et al. 2000; Parsons 2009; Stein & Liu 2009). Averaging the seismic history of the last 30 40 yr may exaggerate the long-term seismic hazard in aftershock zones of past large events. For a correct forecast we need to evaluate the decaying earthquake rate in an aftershock sequence. It is possible to estimate the rate more correctly in such areas by using older instrumental as well as historic data. The computation technique employed in this paper can be adjusted in principle for such a task. On the other hand, our short-term forecast is estimated on the basis of recent earthquake activity. Ebel (2009) suggests that aftershock sequences in stable continental regions have similar statistical properties as those of California. The results of the likelihood search in different regions (Kagan 1991) and in various tectonic zones (Kagan et al. 2010) suggest that our short-term model should work reasonably well in slowly deforming plate boundaries and in plate interiors. The program we propose in this paper should be considered as a scientific demonstration project; implementing this technique for the real-life earthquake forecast would require additional work. The global earthquake forecasts described in this paper are based on two worldwide catalogues and have common properties: (1) they cover 75 < latitude < 75 area; (2) they are based on smoothed seismicity; (3) they use separable number, magnitude, temporal and spatial distributions. The present forecast is uniform over the Earth, but we briefly comment on its regionalization by the tectonic style of deformation (Kagan et al. 2010). 2 METHODS AND DATA For our long-term model, we factor the rate density into spatial and magnitude distributions, assuming they are independent of one another. We estimate the spatial distribution using all earthquakes above the assumed completeness threshold in the catalogue. For the magnitude distribution, we assume a tapered Gutenberg-Richter relation (Bird & Kagan 2004; Kagan et al. 2010). At any location the spatial rate serves as a multiplicative constant, proportional to the a-value of the magnitude frequency relation. Thus, any forecast based on a catalogue with a given lower magnitude threshold applies to larger earthquakes as well. We test the models over all magnitudes above the threshold, and because large earthquakes are expected to be less frequent, they count more than smaller ones for the likelihood test. We could forecast earthquakes smaller than the threshold magnitude. We choose not to do so because the smoothing kernels that determine our spatial distribution are influenced indirectly by earthquake magnitudes. Thus, forecasts with smaller earthquakes provide a higher spatial resolution. In our previous papers (Kagan & Jackson 1994; Jackson & Kagan 1999; Kagan & Jackson 2000) we studied earthquake distributions and clustering for the GCMT catalogue of the moment tensor inversions compiled by Ekström (2007) and Ekström et al. (2005). The present catalogue contains more than 30 000 earthquake entries for the period 1977 January 1 to 2008 December 31. This catalogue characterizes earthquake size by the scalar seismic moment M.Here we use the scalar seismic moment directly, but for easy comparison we convert it into an approximate moment magnitude using the relationship m W = 2 3 (log 10 M 9.0), (1) (Hanks & Kanamori 1979; Hanks 1992), where M is measured in Newton m (Nm). Note that various authors use slightly different values for the final constant (here 9.0) so the magnitude values we use in the tables and diagrams may not be fully consistent with the values used by other researchers. The PDE worldwide catalogue is published by the USGS (U.S. Geological Survey 2008). In spite of its name (Preliminary Determinations of Epicenters), the catalogue is distributed in its final form with a few months latency. However, truly preliminary information on earthquakes in the PDE catalogue is usually available with a delay of just a few minutes. The PDE catalogue reports the earthquake size using several magnitude scales, providing the body-wave (m b )andsurfacewave(m S )

Global earthquake forecasts 761 magnitudes for most of the moderate and large events since 1965 and 1968, respectively. From the start the PDE catalogue also listed two additional magnitudes contributed by other seismic stations or institutions. In early parts of the catalogue these contributed magnitudes were local magnitudes (M L ) or surface wave magnitudes determined by California stations. The moment magnitude (m W ) estimate has been added recently. This magnitude is taken either from the GCMT catalogue, or is estimated by the USGS (up to three various USGS m W values are available currently). To construct our smoothed seismicity maps, we need a single standard magnitude estimate for each earthquake. Ideally, we would like to convert all other magnitude estimates into moment magnitude and use that as a standard. Unfortunately, we have not found a reliable way to do that (see Kagan 2003). Kagan (1991) used a weighted average of several different magnitude scales but found the results not fully satisfying, especially since several additional magnitudes have been added lately. In this work we use the maximum magnitude among those shown as a standard for each earthquake. This choice is imperfect but apparently it is the only possibility to use the PDE catalogue for our study. For smaller and moderate earthquakes the largest magnitude is usually m b, because other magnitude estimates are unavailable. ForlargereventsM S would be selected; for even larger recent earthquakes the maximum magnitude is likely to be m W.Iftwo additional contributed magnitudes are of the same type (M S or m W ), we use their average before calculating the maximum magnitude. The PDE catalogue has substantial advantages over the GCMT one. The PDE has a longer observation period (the surface wave magnitude M S was determined starting from the middle of 1968), and a lower magnitude threshold (m t ). Depending on the time period and the region, the threshold is of the order 4.5 4.7 (Kagan 2003), that is, much lower than the GCMT catalogue threshold (around 5.4 5.8). This means that the forecast estimates can be obtained for all global seismic areas. The PDE reports earthquake hypocentres, which can be estimated much more precisely than the moment centroid locations reported by the GCMT catalogue (ibid.). On the other hand, the PDE catalogue has a few drawbacks when compared to the GCMT data set. For one, the PDE catalogue generally lacks the focal mechanism solutions. Also, the PDE reports a somewhat inconsistent mix of different magnitudes (local, body wave, surface wave, moment, etc.) with less accuracy than the moment magnitude inferred from the GCMT catalogue. Moreover, some of the PDE magnitudes are influenced by strong systematic effects and biases (Kagan 2003). Another drawback is that the hypocentre, which the PDE catalogue uses to representing location, could be at the edge of the rupture zone for a large earthquake. The moment centroid, reported by the GCMT, provides a more meaningful location even though the centroid is generally more uncertain than the hypocentre. We forecast the earthquake rate per unit area, time and magnitude on a 0.5 grid for a global zone region between 75N and 75S latitude (301 by 720 grid cells). A large number of cells and earthquakes (especially in the PDE catalogue) makes forecast testing a time-consuming task; for many calculations by FORTRAN or MATLAB more than 1 hr of CPU time is needed. Because of this we were unable to test many features of global forecasts. It is quite feasible that parallelization of calculations can substantially increase the speed of calculations. Use of faster computers would also clearly improve the performance. Since our task in this work was to demonstrate the validity of global forecast and evaluate its major properties, we would consider such improvements in our future work. 3 FORECAST MODELS 3.1 Long-term rate density estimates 3.1.1 Technique We take the distribution of earthquake sizes to follow the tapered Gutenberg-Richter (TGR) model which has two parameters: the corner moment and the asymptotic spectral slope (β). For the TGR the cumulative frequency/moment relation is F (M, M t,β,m c ) = (M t /M) β exp [ (M t M)/M c ] for M t M, (2) where F is the fraction of earthquakes (by event count) in the catalogue with the moment exceeding M, andm t is the threshold moment for the catalogue completeness (Jackson & Kagan 1999; Kagan & Jackson 2000; Bird & Kagan 2004). The corner moment (M c ) can be regarded as the earthquake size which is rarely exceeded (e.g. only a few times per century for the subduction earthquakes). Although the form of (2) is chosen for its simplicity and acceptable fit, some kind of an upper magnitude limit is required to keep the seismic moment rates finite (Bird & Kagan 2004). Our long-term forecast procedure relies on earthquake data alone (Kagan & Jackson 1994, 2000). Our procedure is based on smoothing of the past earthquake locations. For the GCMT catalogue the degree of the spatial smoothing is controlled by the kernel function f (r) = 1 π 1, (3) r 2 + rs 2 where r is the epicentroid distance from a cell centre to an earthquake and r s is the scale parameter. For the PDE catalogue we employ a more complicated smoothing kernel f (r) = sgn(λ 1) λ 1 π r 2(λ 1) s 1 ( ) r 2 + rs 2 λ, (4) where λ controls the degree of spatial smoothing; we use λ = 1.1 (Kagan & Jackson 2000). This formula has one more degree of freedom than equation (3). Our GCMT forecast uses an additional parameter to describe anisotropy of the spatial kernel, so that both the GCMT and PDE forecasts have the same total number of degrees of freedom. Similar techniques based on the smoothing of earthquake locations have been applied to long-term forecasts in California and other regions (Rhoades & Evison 2005, 2006; Rhoades 2007; Helmstetter et al. 2007; Schorlemmer & Gerstenberger 2007; Console et al. 2010; Kagan & Jackson 2010; Schorlemmer et al. 2010). Because our forecast models cover large areas, we restrict our distance influence function to 1000 km maximum to speed up the calculations. The distance r in kilometres is calculated by an approximate formula (Bullen 1979, eq. 9, p. 155) r 111.111 ( η 1 η 2 ) 2 + ( υ 1 υ 2 ) 2 cos 2 ( η1 + η 2 2 ), (5) where η is latitude, υ is longitude, and an appropriate correction is made if longitudes of both points are on different sides of υ = 180. The scaling distance r s may depend on many factors including tectonic environment, the Earth structure, earthquake location accuracy, etc. Thus, ideally one should determine it for each forecast region, but in this example we use one global value. We accept that 1 per cent of all earthquakes are surprises, assumed uniformly

762 Y. Y. Kagan and D. D. Jackson likely anywhere in our study area (Kagan & Jackson 2000). Bird et al. (2010) calculate that the intraplate regions earthquakes represent 2.7 per cent of global shallow seismicity. For the global forecast based on the GCMT catalogue, we use the same values for a long-term smoothing kernel as Kagan & Jackson (2000, see their eq. 3): the spatial scale parameter, r s is 2.5 km, same as for the southwest Pacific, and the azimuthal concentration factor (δ in eq. 6, ibid.) is 25. The resulting anisotropic spatial kernel function has a butterfly pattern, which can be seen at the regional spatial forecast displays (see Jackson & Kagan 1999; Kagan & Jackson 2000), with the assumed focal-plane going through the middle of the wings. In both kernel functions (eqs 3 and 4) we use two adjustable parameters which have been selected on the basis of previous studies. As we see below, those parameter values may not be optimal for the global forecast. However, to get that forecast running as quickly as possible, we adopted these somewhat arbitrary parameter values without full optimization. We plan to issue an improved forecast based on the optimized parameters and the inclusion of the global strain rate data (Bird et al. 2010). In Fig. 1 we display the global long-term earthquake rate density for magnitudes of 5.8 and above based on the GCMT catalogue. A similar forecast based on the PDE catalogue is shown in Fig. 2. For the PDE catalogue we apply an isotropic kernel function (4) with r s = 15 km. Both forecasts are similar in appearance. Given that the magnitude threshold is lower for the PDE catalogue (m t = 5.0), the forecast in Fig. 2 shows more details since many more earthquakes were used in the computation. Our procedure allows us to optimize the parameters by choosing those r s values which best predict the second part of a catalogue from its first part. Kagan & Jackson (1994) subdivided the catalogue into two equal parts, whereas Kagan & Jackson (2000) used the GCMT catalogue for 1977 1996 as the training (or learning) part and the data for 1997 1998 as the test catalogue. In this paper the 2004 2006 earthquakes are mostly used as a control set. We compared observed distribution of earthquake numbers to several theoretical curves (see Jackson & Kagan 1999; Kagan 2010). The Poisson cumulative distribution is calculated using the following formula: F(k) = P(N < k) = 1 k! λ y k e y dy = 1 Ɣ(k + 1, λ), where λ is the event rate of occurrence and Ɣ(k + 1, λ)isan incomplete gamma function. The negative-binomial distribution (NBD) probability density is defined as (Kagan 1973, 2010) ( ) τ + k 1 f (k) = θ τ (1 θ) k, (7) k where k = 0, 1, 2,... The parameters limits are 0 θ 1, and τ>0. The NBD cumulative distribution is calculated using 1 θ F(k) = P(N < k) = y τ 1 (1 y) k dy, (8) B(τ, k + 1) 0 where B(τ, k + 1) is the beta function. The right-hand part of the equation corresponds to the incomplete beta function, B(τ, k+1, x) (Gradshteyn & Ryzhik 1980). (6) Figure 1. Global earthquake long-term potential based on smoothed seismicity calculated 2010/10/22. Earthquakes from the GCMT catalogue since 1977 are used. Earthquake occurrence is modelled by a time-independent (Poisson) process. Colours show the long-term probability of earthquake occurrence.

Global earthquake forecasts 763 Figure 2. Earthquake long-term potential based on smoothed seismicity calculated 2010/10/22. Earthquakes from the PDE catalogue since 1969 are used. Earthquake occurrence is modelled by a time-independent (Poisson) process. Colours show the long-term probability of earthquake occurrence. Fig. 3 shows the cumulative distribution for the annual earthquake numbers in the GCMT catalogue. The visual fit by the Poisson law (6) is poor, whereas the NBD (8) is clearly the better approximation. In Fig. 4 we display the fit of a cumulative distribution for annual earthquake numbers in the PDE catalogue. The difference between the Poisson and NBD distributions is greater in this plot, because of a lower magnitude threshold in the PDE catalogue. 3.1.2 Information scores Several statistics can be used to characterize a long-term forecast and its fit to a future earthquake occurrence (Kagan 2009; Molchan 2010). For an inhomogeneous Poisson process in which n points are observed (x 1,..., x n ) in a region A, the log-likelihood can be written as (Daley & Vere-Jones 2003, eq. 7.1.2) n log L (x 1,...,x n ) = log λ (x i ) λ (x)dx, (9) i=1 A where λ (x i ) is the theoretical rate density at a point x i. The likelihood for an inhomogeneous Poisson process is normally compared to a similar likelihood, L 0, calculated for a Poisson process with constant intensity (ξ) to obtain the log-likelihood ratio (Daley & Vere-Jones 2003, Ch. 7; Schorlemmer et al. 2007) n log (L/L 0 ) = log λ (x i /ξ) [ λ(x) ξ ]dx. (10) i=1 A In our calculations we did not carry out the number test, that is, statistically test whether the observed earthquake number corresponds to the forecasted one (Kagan & Jackson 1995; Schorlemmer et al. 2007). Figs 3 and 4 show that the number distribution is well approximated by the NBD (7). Thus to spare space we decided to skip the N-test. We normalize both rates (λ, ξ) by the observed event number n, hence the integral term in (10) is zero. Kagan & Knopoff (1977) suggested measuring the performance of the earthquake prediction algorithm by evaluating the likelihood ratio to test how well a model approximates an earthquake occurrence. In particular, they estimated the information score, Î, per event by Î = l l 0 n = 1 n n i=1 log 2 λ i ξ, (11) where l l 0 is the log-likelihood ratio [log(l/l 0 )], n is the number of earthquakes in a catalogue, log 2 is used to obtain the score measured in the Shannon bits of information and λ i is the rate of earthquake occurrence according to a stochastic model. For the long-term forecast we calculate the forecast information rate as I 0 = N j=1 ( ) ν j ν j log 2, (12) τ j where j are cell numbers, N is the total number of grid points, and ν j and τ j are the normalized theoretical rates of occurrence and cell area (Kagan 2009, eq. 14).

764 Y. Y. Kagan and D. D. Jackson Figure 3. Cumulative distribution of yearly global earthquake numbers 1977 2007, m 5.8, according to the GCMT catalogue. The step-function shows the observed distribution, the dotted curve is the theoretical Poisson distribution and the dashed curve is the fitted negative binomial curve. The negative binomial curve fits the tails much better than the Poisson does. R1 and P1 are estimates of negative binomial distribution parameters (r and θ in eq. 7) and M and V are estimates of its mean and variance. Figure 4. Cumulative distribution of yearly earthquake numbers for the global PDE catalogue, 1969 2007, m 5.0. The step-function shows the observed distribution, the dotted curve is the theoretical Poisson distribution and the dashed curve is the fitted negative binomial curve. The negative binomial curve fits the tails much better than the Poisson does. R1 and P1 are estimates of negative binomial distribution parameters (r and θ in eq. 7). Using the forecasted rate values (λ i for cell centres in which earthquakes occurred) we compute I 1 = 1 n 2 λ i log n 2 2 ξ, (13) i=1 where n 2 is the target earthquake number during a forecast period and ξ is a similar rate for the event occurrence according to the Poisson process with a uniform rate over a region (Kagan 2009, eq. 7). As another measure of the forecast efficiency we compute the information score for the actual epicentre (centroid) locations (λ k ) I 2 = 1 n 2 λ k log n 2 2 ξ. (14) k=1

Global earthquake forecasts 765 Table 1. Forecast testing GCMT catalogue. Training Information scores time interval n 1 Forecast interval n 2 I 0 I 1 I 0 I 1 I 1 1977 1991 2412 1992 1994 508 4.516 3.339 1.1767 5.068 1977 1994 2920 1995 1997 574 4.438 3.693 0.7455 4.975 1977 1997 3494 1998 2000 477 4.374 3.547 0.8264 4.929 1977 2000 3971 2001 2003 473 4.382 3.725 0.6572 4.901 1977 2003 4444 2004 2006 573 4.304 3.340 0.9638 4.851 1977 2006 5017 2007 2008 467 4.260 3.597 0.7065 4.792 Notes: The GCMT catalogue is used: m 5.8 1977 2008/12/31; n 1 is the earthquake number in the training interval and n 2 is the earthquake number in the forecast interval. Smoothing parameters are: r s = 2.5 km, δ = 25 (3). For information scores I 0, I 1, I 2 and I 1 see the text (eqs 12 14). The tests indicated above are pseudo-prospective, in that the tests are performed on data not used in the forecast. They provide a good template for truly prospective tests. Table 2. Forecast testing PDE catalogue. Training Information scores time interval n 1 Forecast interval n 2 I 0 I 1 I 0 I 1 I 1 1969 1991 26360 1992 1994 3799 3.241 3.994 0.753 4.554 1969 1994 30154 1995 1997 3358 3.236 4.117 0.881 4.529 1969 1997 33511 1998 2000 3024 3.229 3.961 0.732 4.525 1969 2000 36530 2001 2003 3385 3.204 3.848 0.644 4.500 1969 2003 39908 2004 2006 4972 3.179 3.604 0.425 4.469 1969 2006 44870 2007 2008 3568 3.149 3.914 0.765 4.432 Notes: The PDE catalogue is used: m 5.0 1969 2008/12/31; n 1 is the earthquake number in the training interval and n 2 is the earthquake number in the forecast interval. Smoothing parameters are: r s = 15 km, λ = 1.1 (4). For information scores I 0, I 1, I 2 and I 1 see the text (eqs 12 14). As we see from Tables 1 and 2 later, the values of I 1 and I 2 may be significantly different. To obtain the likelihood ratio score or the average probability gain G we calculate G j = 2 I j. (15) Similarly to (13) we also calculate the score I 1 in both catalogues for earthquakes which occurred in the training period. These values are significantly larger than those calculated for the forecast intervals. Therefore, as expected, our forecast predicts better locations of past earthquakes than those of future (target) events (see Kagan 2009, fig. 10 and its explanations). The best r s value for forecast purposes corresponds to I 0 I 1 = 0, that is, when the score for the earthquakes which occurred in 2004 2006 equals the average score obtained by simulating events with the smoothed forecast spatial distribution. Kagan & Jackson (1994, fig. 7) discuss the interpretation of I 0 I 1 differences in more detail. We compare the general properties of the GCMT and PDE forecasts in the next section. In a future paper we intend to make a formal statistical comparison using the R-test (Kagan & Jackson 1995; Rong et al. 2003; Schorlemmer et al. 2007). In this test, many simulated catalogues generated from each of two forecast hypotheses are evaluated for consistency with the other hypothesis. This procedure provides a context for evaluating the likelihood scores of the observed catalogue according to the two hypotheses. 3.2 Long-term forecasts comparison In Fig. 5 we illustrate the long-term forecast testing for the PDE catalogue. We compare the forecast, based on the 1969 2003 earthquake record, to 2004 2006 earthquakes. On visual inspection, the forecast predicts the spatial distribution of seismic activity in 2004 2006 reasonably well. In our previous forecasts (Jackson & Kagan 1999; Kagan & Jackson 2000) we used Monte-Carlo simulation to test the forecast. In this work we use the Kagan (2009) results, described above (Section 3.1.2), to test forecasts without simulation. Fig. 6 displays several Error Diagrams curves for the GCMT catalogue. Since most of the curves are close to the abscissa and ordinate axes, we display the curves in a semi-logarithmic format, as in fig. 6 by Kagan (2009). Fig. 7 shows several Error Diagrams for the PDE catalogue in a new format by using the 1969 2003 forecast density as a template or a baseline for the likelihood score calculation. This display is equivalent to calculating the information scores by using λ i as a reference density I m = 1 n n i=1 ν i log 2 ζ i λ i, (16) where ζ i is a rate density for all of the other point distributions (Kagan 2009, fig. 10). An interesting feature of the blue line is a step at the abscissa value τ 0.44. This step is caused by the aftershocks of the 2004 December 26 giant Sumatra earthquake. In 2005 there were 98 aftershocks in the cell 94 E ± 0.25 ; 8 N ± 0.25. Before 2004 only eight events had been registered in the PDE catalogue for this cell. Therefore, the forecast, which is based on the past seismicity levels, is not fully successful. Fig. 8 displays the numbers of earthquakes in all cells sorted by the cell rate density. The cell with the most events (98) is close to the left edge of the plot (it is 2385-th cell of 216 720 total). In the ideal prediction this cell should be closer to the density maximum (on the left edge of the diagram). In Fig. 9 cell earthquake numbers are displayed against the cumulative rate in all cells (equivalent to the τ-axis in Fig. 7). The difference in the abscissa arrangement for Figs 8 and 9 is caused by the normalization change in both plots: the uniform rate ξ for the former diagram (as in eq. 13) versus the forecast rate λ i for the latter one (as in eq. 16).

766 Y. Y. Kagan and D. D. Jackson Figure 5. Earthquake long-term potential based on smoothed seismicity as in Fig. 2. White circles are earthquakes in 2004 2006, with radius proportional to magnitude. Figure 6. Error diagram (τ,ν) for global long-term seismicity (m W 5.8) forecast. Solid black line the strategy of spatial random guess. Solid thick red diagonal line is a curve for the global forecast, based on 1977 2003 earthquakes. Blue line is earthquake distribution from the GCMT catalogue in 2004 2006 (forecast); magenta line corresponds to earthquake distribution from the GCMT catalogue in 1977 2003. Figs 10 and 11 display the dependence of the information scores on the distance parameter r s (eqs 3 and 4) for both global catalogues. The forecast score (I 0 ) increases as r s decreases: as the smoothing kernel becomes more narrow, the potential efficiency of the prediction improves. However, this score should be compared to the scores calculated for target earthquakes in the forecast period (I 1 and I 2, eqs 13 and 14), because these likelihood values show the real efficiency of the forecast procedure. The intersections of the I 1 and I 2 curves with the I 0 line show a potential optimal value for the r s distance. The forecast for r s values below the intersection

Global earthquake forecasts 767 Figure 7. Error diagram (τ,ν) for global long-term seismicity (m 5.0) forecast (see Fig. 5). Solid black line the strategy of spatial random guess. Solid thick red diagonal line is a curve for the global forecast, based on 1969 2003 earthquakes. Blue line is earthquake distribution from the PDE catalogue in 2004 2006 (forecast); magenta line corresponds to earthquake distribution from the PDE catalogue in 1969 2003. Figure 8. The forecast for the PDE catalogue in 2004 2006 (see Figs 5 and 7). Number of earthquakes in cells, sorted in decreasing order of predicted rate density. is likely under-smoothed (Kagan & Jackson 1994), but for the larger spatial scale parameter values the maps are over-smoothed. Table 1 in Kagan & Jackson (2010) displays similar dependence of information scores on the smoothing distance (I 3 is similar to I 0 ): the optimal r s for the North-west Pacific region is between 10 and 15 km for both catalogues, GCMT and PDE. Like the results shown by Kagan (2009), the scores for cell centres (I 1 ) are lower than the scores obtained for the actual epicentres (I 2 ). The scores for the past earthquakes (magenta line) exceed the scores for the events in the forecast period; as mentioned by Kagan (2009), it is natural that the smoothing procedure forecasts better in the training period. We used different smoothing kernel functions (eqs 3 and 4) and different parameter values for both catalogues but the plots in Figs 10 and 11 look similar, though the PDE score values are slightly lower. One might expect the PDE scores to be higher because the PDE catalogue epicentres have better location accuracy (see Helmstetter et al. 2006, p. 99). However, the GCMT catalogue kernel is anisotropic, extending along the assumed fault-plane, hence the forecasted earthquake rate density is usually higher for this

768 Y. Y. Kagan and D. D. Jackson Figure 9. The forecast for the PDE catalogue in 2004 2006. Number of earthquake in cells versus cumulative cell rate as the abscissa of Fig. 7. Figure 10. Dependence of information scores on the smoothing scale parameter r s (3) for the 2007 2008 forecast based on the GCMT catalogue for 1977 2006. Red line is I 0 score, blue line is I 1 score, green line is I 2 score and magenta line is for I 1. catalogue. These factors may explain the observed score values, but additional investigations are needed to confirm the above conjecture. In Tables 1 and 2 we show the scores for the forecasts in both catalogues. We increase the training period by 3-year steps, starting with 1991. The forecast period is 3 yr, only the last period (2007 2008) is 2 yr. The forecast score, I 0, in both tables is decreasing with time, apparently because seismicity is becoming more uniform with the time interval increase. The score I 1 which corresponds to the likelihood for the cell centres where earthquakes occur during the test period, fluctuates up and down. The low value for the 2004 2006 period is likely to be caused by the influence of the 2004 Sumatra aftershocks (see Figs 7 9 and their discussion). The score I 1 also decreases with time; with a longer training period the likelihood score for the cells where past earthquakes occurred is smaller. The difference between the forecast score and the score for the events in the forecast period (I 0 I 1 ) shows the efficiency of our forecast (see more in Section 3.1.2). As explained earlier, the absolute value of the difference depends on degree of smoothing controlled by the kernel functions (eqs 3 and 4): it was likely undersmoothed for the GCMT catalogue and oversmoothed for the PDE catalogue. However, the behaviour of the information score difference during the forecast time interval is similar for both catalogues. Fig. 12 shows the curves for both catalogues, they are approximately similar, exhibiting a peak at the 2004 2006 forecast interval, obviously due to the 2004 Sumatra event.

Global earthquake forecasts 769 Figure 11. Dependence of information scores on the smoothing scale parameter r s (3) for the 2004 2006 forecast based on the PDE catalogue for 1977 2003. Red line is I 0 score, blue line is I 1 score, green line is I 2 score and magenta line is for I 1. Figure 12. Dependence of information score difference (I 0 I 1 ) on the forecast time period (Tables 1 and 2). Red line is for the GCMT catalogue, blue line is for the PDE catalogue. 3.3 Short-term rate density estimates 3.3.1 Branching models As clustering models are increasingly used to model earthquake occurrences and to forecast future events, the variability of modeling results and the disparities in published results point to a serious need to understand the stability and reliability of these models better. Most clustering models were developed to explain the occurrence of aftershock clusters on a fairly local scale. Here we discuss one of these models, known as Critical Branching Models (CBM) which describes clustering both at local and global scales. The CBM model is an example of a class of branching point process models known in the statistical literature as Hawkes or selfexciting point processes (Hawkes 1971). For a temporal Hawkes process, the conditional rate of events at time t, given information H t on all events prior to time t, can be written (t H t ) = ν + g(t t i ), (17) i:t i <t where ν > 0, is the background rate, g(u) 0 is the triggering function which describes the aftershock activity induced by a prior event, and g(u)du < 1 to ensure stationarity 0 (Hawkes 1971).

770 Y. Y. Kagan and D. D. Jackson The assumptions we make in constructing our initial branching model of earthquake occurrence have been summarized in Kagan & Knopoff (1987), Kagan (1991) and Kagan & Jackson (2000). A similar branching model called the ETAS (Epidemic Type Aftershock Sequence) was proposed by Ogata (1988, 1998), as well as by Ogata & Zhuang (2006). In both our model and ETAS, seismicity is approximated by a Poisson cluster process, in which clusters or sequences of parent earthquakes are statistically independent, although individual earthquakes in a cluster (offsprings) are triggered. The parent events are assumed to form a spatially inhomogeneous Poisson point process with a constant temporal rate. The major assumption regarding the events relationships within a cluster is that the interdependence of earthquakes is closely approximated by a stochastic magnitudespace critical branching process which develops in time. Under the branching assumption there is a sole trigger for any given dependent event. As shown below, the space-time distribution of the interrelated earthquake sources within a sequence is controlled by simple relations justified by analysing available statistical data on seismicity. The CBM is used to produce short-term hazard estimates based on earthquake times, locations and seismic moments. While most of the components of the model have been tested (Kagan & Knopoff 1987; Kagan 1991; Jackson & Kagan 1999; Kagan & Jackson 2000; Kagan et al. 2010), some require further exploration and need to be modified as our research progresses. The CBM was first proposed and applied to the central California earthquake record by Kagan & Knopoff (1987). The ETAS model (Ogata 1988, 1998) is essentially similar in design to the CBM and was initially applied to Japanese seismicity. The main difference between these two models lies in the parametrization of the influence functions of a dependent event and normalization of these functions. Kagan et al. (2010, section 6.2) discuss certain drawbacks of the ETAS model because of its parametrization of seismicity parameters. These defects may influence the implementation and quality of the forecast programs based on the ETAS model. Both the ETAS and CBM models posit that earthquakes occur in clusters of one or more events. Earthquakes within a cluster are strongly correlated, but the clusters themselves occur spontaneously, presumably by a Poisson process. Both models provide estimates of conditional probability of triggered events. In both models the branching property include triggering by foreshocks and aftershocks as well as main shocks. Traditional nomenclature uses the term main shock for the largest earthquake in a cluster, foreshock for events within the cluster before the main shock and aftershock for events within the cluster after the main shock. Clearly the labeling can only be applied after the cluster is complete and identified by some rule. We prefer to distinguish between spontaneous events, which are the first in their cluster, and main shocks, which are the largest. They may be the same, but some main shocks follow a spontaneous foreshock. In practice the time assigned to a cluster is that of its main shock, but logically it should be that of its spontaneous event. A common measure of success of a clustering model is that the identified main shocks should be temporally uniform. However, this presents practical problems for a couple of reasons. One is that clusters may be independent of one another but jointly affected by some external process such as distant earthquakes or viscoelastic stress variations. Another is that the spontaneous earthquakes might be below the magnitude threshold of the catalogue. One advantage of global clustering models is that clustering by large distant earthquakes can be included explicitly. 3.3.2 Model description In the CBM model, the conditional rate λ(t, x, M) at which earthquakes are expected to occur at time t, location x and scalar seismic moment M, given the history of previous seismicity, is given by λ (t, x, M) = νφ(x, M) + ψ(t t i, x x i, M M i ), (18) i where ν is the rate per time unit of the Poisson occurrence of independent (spontaneous) earthquakes in the observed spatial region S with scalar seismic moment M above the catalogue s moment threshold M t ; function φ (x, M) governs their space-seismic moment distribution; ψ (M) (t u, x y, M) is the conditional density of succeeding events at time t and location x, if preceding earthquakes have occurred at times t i in locations with coordinates x i.we subdivide the spatial coordinates x into s z, wheres are surface coordinates, and z is depth. If the duration of the catalogue is T,then ν T is the number of independent events, and ν T/n is the fraction of independent events (n is the total number of earthquakes in a catalogue). We assume that the rate density resulting from a given event within a cluster may be broken down into a product of its marginal distributions, that is, the conditional rate density of the j-th shock dependent on the i-th shock ( j > i and t j > t i ) with seismic moment M i modelled as ψ ( t,ρ,m j M i ) = ψ t ( t) ψ ρ (ρ) ψ M (M i ) φ M (M j ), (19) where t = t j t i and ρ is the horizontal distance between the i-th and the j-th epicentroids (or epicentres), calculated by (5), ψ t,ψ ρ and ψ M are the marginal temporal, spatial and moment densities, and are detailed later. The seismic moment probability density φ M can be obtained by differentiating eq. 2 (see also eq. 11 in Kagan & Jackson 2000). The total time-dependent rate density is a sum of effects from all previous earthquakes, (t j, x j, M j ) = i< j ψ ( t,ρ,m j M i ). (20) The function ψ decays rapidly with time and distance so that only neighbouring events substantially contribute to the sum, although the range of strong earthquakes is much longer than that of weak events (see the next two equations). A power-law relation is used for the probability density of time intervals between earthquakes within a cluster: ψ t ( t) = θ t θ M ( t) 1 θ, (21) for t t M, in accord with Omori-Utsu s law (Utsu 1961). The parameter θ is an earthquake memory factor, and t M is the coda duration time of an earthquake with seismic moment M i. We assume for the coda duration t M = t r M 1/3 i, (22) where t r is the standard (constant) coda duration time, taken here as t r = 0.0035 d (about 5 min), of an earthquake with the reference seismic moment M r = 10 15 Nm, corresponding to m r = 4.0 (Kagan 1991). The probability of the next dependent shock to occur in the time interval (t 1, t 2 ), for 0 < t 1 < t 2, given an event of a cluster occurring at time 0, can then be calculated simply as ( ) Prob (t 1 < t < t 2 ) = t θ M t θ 1 t θ 2. (23)

Global earthquake forecasts 771 Figure 13. The average aftershock numbers in logarithmic time intervals following m W 8.0 GCMT earthquakes during 1977 2003 period. Solid lines show real aftershocks, dashed lines are approximate theoretical estimates. Red lines are for aftershocks 5.1 m b 4.9, blue lines are for 5.4 m b 5.2. The black vertical line is the estimate for the coda duration t M (22) for an earthquake m = 8.1. The non-normalized function ψ M (M i ) which indicates the number of triggered shocks generated on average by an earthquake with seismic moment M i, is assumed to obey ( ) κ Mi ψ M (M i ) = μ, (24) M t (Kagan 1991). To illustrate our fit of temporal distribution of dependent earthquakes, average numbers of aftershocks following 15m W 8.0 GCMT earthquakes are displayed in Fig. 13 (similar to fig. 13 in Kagan 2004). We use the time period 1977 2003, so that all large earthquakes are approximately of the same size (8.45 m W 8.0, that is, excluding the 2004 Sumatra and its aftershocks). Main et al. (2008) show that after this earthquake the seismicity parameters have substantially changed. Since the GCMT catalogue has relatively few dependent events, we selected the aftershocks from the PDE catalogue. The aftershock rate is approximately constant above our estimate of the coda duration t M (see eq. 22); for the logarithmic intervals this corresponds to the standard form of the Omori law: the aftershock number n a is proportional to 1/t. For the smaller time intervals the aftershock numbers decline compared to the Omori law prediction (1/t). This decline is caused by several factors, the interference of main shock coda waves being the most influential (Kagan 2004). The decline is faster for weaker events (ibid.). Because our forecast is calculated once per day, these immediate aftershocks usually die out before the forecast is updated. The theoretical estimates for aftershock numbers are based on eqs (21) and (24) and on measured parameter values. We used the values of the parameters obtained during the likelihood function search (Kagan et al. 2010, Table 4) obtained for the full PDE catalogue, m t = 5.0: the branching coefficient μ = 0.141, the parent productivity exponent a = 0.63(a = κ 1.5) and the time decay exponent θ = 0.28. Theoretical estimates in Fig. 13 seem to be reasonably good at forecasting time intervals of the order of one day. For larger intervals the expected numbers decrease as n a ( t) 1.28 (see eq. 21), that is, stronger than the regular Omori law would predict. The Omori law assumes that all the aftershocks are direct consequence of a main shock, whereas a branching model takes any earthquake as a possible progenitor of later events. Thus, later aftershocks are the combined offsprings of a main shock and all the consequent earthquakes; with increase of time the difference between Omori s law and the CBM predictions should increase as well. If errors in earthquake location have a Gaussian distribution, the interearthquake distances should be distributed according to the Rayleigh distribution (Kagan 2003) (The Rayleigh law corresponds to the distribution of a vector length in two dimensions, if the components of a vector have a Gaussian distribution with zero mean and a standard error). Thus, we approximate the probability density of the horizontal distance (ρ) between two earthquake epicentroids (epicentres) in a cluster by a Rayleigh distribution: ψ ρ (ρ) = ρ exp [ ρ 2 / ( )] 2σ 2 σρ 2 ρ, (25) where σ ρ is a spatial standard deviation, which depends on the standard errors of epicentroid (epicentre) determination and on the seismic moment M i of the triggering event according to the relation σ 2 ρ = ɛ2 ρ + s2 r (M i /M r ) 2/3, (26) (Kagan 1991). Here ɛ ρ is the standard error in epicentroid determination and s r is a characteristic size of a focal zone of an earthquake with the reference seismic moment M r (see eq. 22). The spatial kernel for short-term forecast differs substantially from that of the long-term forecast (eqs 3 and 4). The reason is that nearby earthquakes in a long-term forecast usually belong to separate clusters. During a long time interval many different earthquake clusters may accumulate in the same region. In contrast, for a short-term forecast we specifically search for earthquake pairs which are closely connected in space and time. The parameters in eqs (25) and (26) characterize the spread of dependent earthquakes (usually aftershocks) in the neighbourhood of a parent event. The