PostScript file created: April 24, 2012; time 860 minutes WHOLE EARTH HIGH-RESOLUTION EARTHQUAKE FORECASTS. Yan Y. Kagan and David D.

PostScript file created: April 24, 2012; time 860 minutes WHOLE EARTH HIGH-RESOLUTION EARTHQUAKE FORECASTS Yan Y. Kagan and David D. Jackson Department of Earth and Space Sciences, University of California, Los Angeles, California 90095-1567, USA; Emails: ykagan@ucla.edu, david.d.jackson@ucla.edu Abstract. Since 1977 we have developed statistical short- and long-term earthquake forecasts to predict earthquake rate per unit area, time, and magnitude. The forecasts are based on smoothed maps of past seismicity and assume spatial and temporal clustering. Our new program forecasts earthquakes on a 0.1 grid for a global region 90N 90S latitude. We use the PDE catalog that reports many smaller quakes (M 5.0). For the long-term forecast we test two types of smoothing kernels based on the power-law and on the spherical Fisher distribution. The effective width of these kernels is much larger than the cell size, thus the map discretization effects are insignificant. We employ adaptive kernel smoothing which improves our forecast both in seismically quiet and active areas. Our forecasts can be tested within a relatively short time period since smaller events occur with greater frequency. The forecast efficiency can be measured by likelihood scores expressed as the average probability gains per earthquake compared to spatially or temporally uniform Poisson distribution. Another method uses the error diagram to display the forecasted point density and the point events. Our short-term forecasts also assume temporal clustering described by a variant of Omori s law. Like the long-term forecast, the short-term version is expressed as a rate density in location, magnitude, and time. Any forecast with a given lower magnitude threshold can be recalculated, using the tapered Gutenberg-Richter relation, to larger earthquakes with the 1

maximum (corner) magnitude determined for appropriate tectonic zones. Short running title: Global high-resolution earthquake forecasts Key words: Probabilistic forecasting; Probability distributions; Earthquake interaction, forecasting, and prediction; Seismicity and tectonics; Statistical seismology; Dynamics: seismotectonics. 1 Introduction We have developed a time-independent (long-term) and time-dependent (short-term) earthquake forecast by using several earthquake catalogs (Kagan & Knopoff, 1977; 1987; Kagan & Jackson, 1994; Jackson & Kagan, 1999). The importance of earthquake forecasting for seismic hazard and risk estimation and the difficulty of resolving basic differences in forecast models have motivated an international effort to report and test earthquake forecasts. That effort is organized by the Collaboratory for Study of Earthquake Predictability (CSEP) (Schorlemmer & Gerstenberger, 2007; Schorlemmer et al., 2010; Marzocchi & Zechar, 2011). Our purpose is to adapt a clustering model that we used to make testable forecasts over large regions of the western Pacific (Kagan & Jackson, 1994; Jackson & Kagan, 1999; Kagan & Jackson, 2000) to include long- and short-term regional and worldwide forecasts in areas designated as natural laboratories by CSEP. Our earlier effort (Kagan & Jackson, 2011) forecasts seismicity only within 75N to 75S latitude with low-resolution 0.5 cells. To get the test running as quickly as possible, we adopted arbitrary parameter values, similar to those that Kagan & Jackson (2000) used. Later we calculated optimized forecast parameters (Kagan & Jackson, 2011, Fig. 11) which were implemented here in our new high-resolution forecast. 2

We had to solve two problems to produce the high-resolution (0.1 cells) whole Earth forecasts: (a) Close to the poles we needed to use the exact spherical distance formula (for example, Bullen, 1979, Eq. 5, p. 155) which requires about twice the computation time. (b) The size of the grid increased by a factor of 30 compared to our quasi-global 0.5 degree forecast. Therefore, executing the forecast programs on the computers we employed previously (0.6 Ghz Alpha machines) required several days of the CPU time. On faster computers the time can be reduced by a factor of 10 at least. Our new program forecasts earthquakes on a 0.1 grid for the global region 90N 90S latitude. We used the PDE earthquake catalog (Preliminary determinations of epicenters, 2011) that reports many smaller events (M 5.0) and a maximum cell size around the equator of about 11 km. The resolution of these forecasts is probably the maximum desirable: the earthquake depth distribution peaks around 10 km, and the PDE catalog location uncertainty is about the same (Kagan et al., 2010). Many local forecasts are now produced within the CSEP infra-structure (Schorlemmer et al., 2010). For these forecasts the maximum resolution is usually 0.1 ; therefore, localized forecasts can be extracted from ours (as in Kagan and Jackson, 2010) and compared to those predictions. Moreover, a combined forecast employing local and global data can be calculated. In this work we rigorously test long-term forecasts. The advantage of worldwide forecast testing is that many earthquakes will be available for testing. Because these quakes occur over the globe, there are many practically independent sequences of events; thus seismicity is not dominated by only a few large earthquakes and their aftershocks, as happens in local and regional earthquake catalogs. Therefore, the testing results are more reliable and reproduceable. Another problem we avoid is that earthquakes outside of a regional catalog area may trigger earthquakes inside it (Wang et al., 2011). 3

2 Methods and Data For our long-term model we factor the rate density into spatial, magnitude and number distributions, assuming their independence from one another. We estimate the spatial distribution using all earthquakes above the assumed completeness threshold in the catalog. For the magnitude distribution, we assume a tapered Gutenberg-Richter (TGR) distribution (Bird & Kagan, 2004; Kagan et al., 2010). Any forecast with a given lower magnitude threshold applies as well to larger earthquakes with the maximum (corner) magnitude determined for appropriate tectonic zones. The map of tectonic zones shown in Kagan et al. (2010, Fig. 1) assigns any 0.1 cell to a specific tectonic category. Then we can recalculate any forecast rate using the TGR distribution with the appropriate corner magnitude value (ibid., Table 1). The PDE catalog is distributed in preliminary form within a few minutes after each earthquake and in the final form with a few months latency. The catalog reports earthquake size using several magnitude scales, providing the body-wave (m b ) and surface-wave (M S ) magnitudes. The moment magnitude (m W ) estimate has been added recently. Depending on the time period and region, the completeness threshold is on the order 4.5 to 4.7 (Kagan, 2003; Romashkova, 2009). To construct our smoothed seismicity maps, we need a single standard magnitude estimate for each earthquake. Like Kagan & Jackson (2011), in this work we use the maximum magnitude among those shown as a standard magnitude for each event. 4

3 Long-term Rate Density Estimates 3.1 Smoothing Kernel Selection Our previous regional and global forecasts (Kagan & Jackson, 1994; Jackson & Kagan, 1999; Kagan & Jackson, 2000; 2011) were based on fixed kernel smoothing. We selected a fixed kernel with the degree of spatial smoothing controlled by the function which is asymptotic to a power-law at distances much larger than r s f(r) = 1 π 1, (1) r 2 + rs 2 where r is epicenter distance, r s is the scale parameter of about 7.5 km and r 1000 km (Kagan & Jackson, 2011). Unfortunately, the 1000 km distance limit causes sharp discontinuities in the smoothed maps as can be seen in Fig. 4 (see below) around the Hawaii islands. These discontinuities can be avoided if we use a kernel with the whole Earth support. However, in this case the choice of available kernels is significantly restricted. If we employ a power-law kernel like (1), its normalization on a sphere involves the application of cumbersome hyper-geometric functions. Practically, the best simple expression for a spherical surface kernel is the Fisher distribution (Fisher, 1953, p. 296; Fisher et al., 1987, Eqs. 4.19 4.23; Mardia & Jupp, 2000, Eq. 9.3.4). These authors propose expressions for the spherical Fisher distribution in a general form. For our purpose we assume that the distribution center is at a pole of a sphere and the distribution has a rotational symmetry. Then the probability density function (PDF) of the spherical Fisher distribution is f(ρ, η) = κ 4π sinh κ = κ 2π(e κ e κ ) exp (κ cos ρ) sin(ρ) φ (η) exp (κ cos ρ) sin(ρ) φ (η), (2) 5

where η is an azimuthal angle, φ (η) is angular azimuthal distribution density, ρ = r/r is the epicenter distance in radians, R is the Earth radius, and κ is a scale parameter. It is more convenient to consider the Fisher distribution as depending only on distance, i.e., we take φ (η) = 1/(2π). For the distance distribution only, since sin (ρ) δ ρ is the differential distance element on a sphere, sin (ρ) term in (2) can be omitted as well as 1/(2π) term. Then, the cumulative spherical Fisher distribution function is F (ρ) = exp [ (κ (cos ρ) 1 ] 1 e 2κ 1. (3) For κ > 100, these equations can be simplified f(ρ) κ exp [κ (cos ρ 1)] = κ exp [ κ (sin 2 (ρ/2))]. (4) Since for small distance values (sin ρ) ρ, the above equation suggests that the probability density decays like a Gaussian function. In our California forecasts we applied the Gaussian as well as the power-law kernel smoothing distributions (Werner et al., 2011) and found that they yield similar results. To forecast California seismicity, we also employed an adaptive kernel smoothing (Helmstetter et al., 2006; 2007; Werner et al., 2011) where the smoothing distance associated with an earthquake is proportional to the horizontal distance between the event and its n th v closest neighbor (Silverman 1986, Ch. 2.6). Zhuang (2011) applied such variable kernels in the forecasts for the region around Japan. The number of the neighbors, n v, is an adjustable parameter, estimated by optimizing the forecasts. However, adaptive smoothing based on the distance between the epicenter and its n th neighbor has a certain disadvantage: if the (n 1) th and n th events are separated by a large distance, the kernel width would jump from one value to another. Thus, we propose a method which in effect smoothes the differences in the kernel width. 6

To carry out adaptive smoothing based on the Fisher distribution, we follow the advice of Silverman (1986, Ch. 5.3): we first create an initial weight value estimate for earthquake epicenter location (χ i ) by using Eq. 4 χ i = N f(ρ ij ), (5) j=1 where N is the number of earthquakes and ρ ij is the distance between two epicenters. Then local bandwidth factors (Λ i ) are defined Λ i (ρ) = κ (χ i /χ g ) α, (6) where χ g is the geometric mean of χ i log χ g = 1 N N log(χ i ). (7) i=1 The κ-values in Eqs. 5 and 6 could be different, but as Silverman (1986, Ch. 5.3) suggests and we tested (see below), the parameter value in the initial estimate does not significantly influence the final result. Since the larger κ-values correspond to a narrower kernel (see Eq. 4), in actual computations we use an inverse of the Λ i (ρ) function in (6). The forecast density at any point r is then estimated by N µ( r ) = Λ i ( r r i ). (8) i=1 In Fig. 1 we display two kernel examples: the densities for the power-law and for the spherical Fisher distribution. The density maximum for the Fisher law can be calculated by equating the derivative of its PDF (2) to zero. For large κ we obtain cos ρ κ sin 2 ρ 0. (9) Since for large κ (cos ρ) 1, the distance for the maximum is ρ m arcsin 1/κ. (10) 7

To demonstrate the difference between the kernels, Fig. 2 shows several examples of the cumulative distribution for various kernels: the Fisher distribution with the change of the κ parameter and two variants of the power-law distribution linearly normalized for a distance of 20,000 km and normalized on a sphere. The latter kernel normalization was adjusted so that the distributions coincide for smaller distances. According to Eq. 6, the two parameters κ and α control the choice of kernel bandwidth; κ is estimated iteratively from a starting value. But as described above, the final estimate depends very weakly on the starting one. The full spatial model includes one more parameter, ɛ: a background rate to cover surprise earthquakes in places far from any event locations in the catalog. 3.2 Testing Long-term Forecasts Kagan (2009) proposed a new method for testing the performance of a spatial point process forecast. It is based on a log-likelihood score for predicted point density and the information gain for events that actually occurred in the test period. The method largely avoids simulation and allows us to calculate the information score for each event as well as the standard error of each forecast. As the number of predicted events increases, the score distribution approaches a Gaussian distribution. To display the forecasted point density and the point events, we use, the Error Diagram (ED) (Molchan, 1990, 1997, 2010; Molchan & Kagan, 1992; Zaliapin & Molchan, 2004; Zechar & Jordan, 2008, 2010). Rong & Jackson describe an event concentration diagram which contains the same information in a slightly different format. The general idea is to specify rate densities λ i, calculated at points on a grid, using a hypothesis to be tested against a null hypothesis with rate densities ξ i. Here i are grid point numbers, and N is the total number of grid points. It is also useful to construct non-overlapping cells about each grid point, integrate the rate densities in each cell, and 8

normalize to 1.0 to get cell probabilities. We can then pick either the test or null hypothesis as the reference, and information scores relative to it can be computed from ν i, τ i, and n i, the number of earthquakes in any cell (Kagan, 2009). To construct the error diagram, we sort the cells in decreasing order of reference probability and compute a cumulative distribution function (CDF ) for each hypothesis. A theoretical error diagram is a plot of the complementary CDF (CCDF = 1 CDF ) for each hypothesis as a function of the CDF for the reference. The horizontal axis represents a kind of water level, determined by a threshold cell probability; what s plotted is the probability that an event would be forecast by the reference model. The vertical axis represents the probability that the earthquake would not be forecast according to the model in question. If the reference CDF is p, then by definition its error curve is (1 p). A forecast with a curve that lies below that of the reference curve predicts better, and will have a higher information score. Empirical error curves and information scores can also be computed for earthquake catalogs by assigning epicenters to cells, again sorting the cells in decreasing order of reference probability, and plotting the empirical CCDF vs. the CDF for the reference hypothesis. We calculate the theoretical forecast information score, measured in the bits of information per event, as I 0 = N i=1 ( ) νi ν i log 2, (11) τ i where the reference τ i is the corresponding probability for a spatially uniform probability density. Under that assumption, τ i is proportional to cell area (Kagan, 2009, Eq. 14). I 0 contains no forecast earthquake data; rather, it measures the degree to which the forecast rate itself is concentrated relative to the null hypothesis of uniform rate density. The equation above can be easily generalized to measure the information score relative to any other reference hypothesis. 9

Using the forecasted rate values λ i for cell centers in which earthquakes occurred, we compute the empirical information score I 1 = 1 n n i=1 log 2 λ i ξ, (12) where n is the earthquake number during a forecast period and ξ is a similar rate for the event occurrence according to the Poisson process with a uniform rate over a region (Kagan, 2009, Eq. 7). I 1 measures the degree to which the earthquakes in the test period are concentrated in cells where the forecast rate is high. If λ i truly represents the event rate for each cell, then within the limit of infinitely many earthquakes I 1 converges to I 0. As another option, instead of (12) we compute the empirical information score for the actual epicenter locations (λ k ) I 2 = 1 n n k=1 log 2 λ k ξ. (13) As Fig. 3 demonstrates, the values of I 1 and I 2 may also be significantly different (Kagan & Jackson, 2011) unless the cell size is much smaller than the smoothing distance ρ. Similarly to (12) we also calculate the empirical score I 1 for earthquakes which occurred in the training period. These values are significantly larger than those calculated for the forecast intervals. Therefore, as expected, our forecast predicts better the locations of past earthquakes than those of future (target) events (see Kagan, 2009, Fig. 10 and its explanations). Fig. 3 demonstrates how the fixed Fisher kernel was optimized using earthquake history from 1969-2005 to forecast earthquake occurrence in 2006-2010. The red curve displays the theoretical gain of a forecast (specificity): the scores increase for the narrower kernels. However, to select the most internally consistent kernel, we should compare these scores with those obtained for the test period (2006-2010). The latter scores can be calculated for the cell centers where the earthquakes occur (I 1 ) or for actual epicenter locations (I 2 ). 10

The intersection of I 0 and I 1 or I 2 scores suggests the best choice of kernel, because of the convergence property (I 1 I 0 ) mentioned above. To test whether the initial estimate significantly influences the score values, as Silverman suggested (1986, Ch. 5.3, see also Eq. 7), we produced two forecasts. In one the initial estimate for earthquake epicenter locations (4) was executed using κ = 10, 000 and in the second κ was taken 100, 000. In both cases the actual forecast κ is selected 100, 000. The resulting scores are I 0 = 5.538; 5.420, I 1 = 3.865; 3.859, I 2 = 4.145; 4.145, I 1 = 4.940; 4.972, respectively for both forecasts: differences connected with the change in the initial κ value are negligible. In Fig. 4 we show the global long-term forecast made with the PDE catalog using M 5.0 earthquakes from 1969 to the present. The fixed power-law kernel (1) with r s = 7.5 km is used on the 0.1 grid. Fig. 5 shows the worldwide long-term forecast made using an adaptive Fisher distribution kernel. Comparing this with Fig. 4 it is obvious that the width of seismicity peaks is reduced at subduction zones. This is due to narrower kernels at concentrations of earthquake epicenters. Conversely, seismicity contours for the low activity regions are smoother for the adaptive kernels than for the fixed ones. Optimizing the three parameters κ, α, and ɛ is computationally challenging, so we ve taken some shortcuts. We used a low-resolution grid (0.5 by 0.5 degrees) to estimate κ and α, and we made only a partial search for the maximum likelihood values of all parameters. The model used in Fig. 5 is based on that approximate optimization. We regard it as a respectable model worthy of comparison to others, but the values could probably be improved by further optimization. 11

4 Tests and Forecasts Optimization Fig. 6 shows two theoretical and two empirical error curves relevant to the fixed power-law forecast. To spread the curves on the graph, we use as a reference the forecast based on the power-law kernel (Eq. 1) with r s = 7.5 km and the earthquake catalog from 1969-2005. This display is equivalent to calculating the information scores by using forecast λ i as a reference density I = 1 n n j=1 ν i log 2 ζ i λ i, (14) where ζ i is a rate density for all of the other point distributions (see Kagan, 2009, Fig. 10; Kagan & Jackson, 2011, Fig. 7 and their explanations). The information scores reported below are still calculated with respect to the null hypothesis of uniform spatial rate density. The red line describes the reference hypothesis, so it satisfies (ν = 1 τ). The black line, corresponding to the null hypothesis, lies above the reference curve, indicating that its performance is worse than that of the power law kernel. Also shown are the two empirical distributions, one based on the learning catalog (1969 2005) and the other on the test catalog (2006 2010). Information scores for the curves shown in the plot are as follows: for the forecast (red curve) I 0 = 3.13; for earthquakes in the 2006-2010 test period (blue curve), I 1 = 3.85; for the backward forecast (magenta) I 1 = 4.46. The probability gain G can be calculated as G = 2 I for all the information rate values. The backward forecast does best as expected, because the test and learning data are the same (Kagan, 2009). In principle, we could adjust our forecast so that I 0 = I 1, because for a correct forecast the empirical information score should converge to the theoretical one. In this case, the empirical score exceeds the theoretical one, indicating that the earthquakes are more concentrated than the forecast predicts. As shown in Fig. 4, this occurs when the kernel width is too large. 12

Decreasing the value of r s would bring closer agreement. However, the sample size for I 1 is fairly small, so the agreement between I 1 and I 1 is close enough. The major part of the numerical difference between different forecasts in Fig. 6 generally corresponds to the curves behaviour in the upper-left part of the plot; the higher I values are associated with curves closer to the vertical axis (Kagan, 2009). This is observed if only the upper-left corner of a diagram is plotted. In Fig. 6 the complete curves are shown, so this feature is not readily observable. Fig. 7 displays error diagrams for adaptive kernel smoothing. The scores are higher than in the former case. The diagram shows the adaptive Fisher kernels more effectively approximate earthquake productivity than the fixed power-law kernel does in Fig. 6: I 0 = 4.26, I 1 = 4.04, and I 1 = 4.57. In Figs. 8 and 9 we show the same error diagrams in a different, more regular format (Kagan, 2009, Fig. 5; Kagan & Jackson, 2011, Fig. 6) to better display the approximation in low seismicity regions. The adaptive kernel smoothing forecast is closer to the test period earthquake distribution. The scores for Fig. 8 are the same as for Fig. 6, and Fig. 9 has scores identical to Fig. 7. Our forecasts do not predict well in regions where the previous earthquake rate was low. We have investigated whether our predictions can be improved by using the strain-based forecast (Bird et al., 2010a; 2010b). We normalized the seismicity and strain-based spatial densities, and tested a linear or geometric combination that preserves the normalization. Preliminary results show that a 50/50 linear combination works best. The tests indicated above are pseudo-prospective, in that except for I 1, the tests are performed on data not used in the forecast. They provide a good template for truly prospective tests. 13

5 Short-term Rate Density Estimates The short-term forecast in this work was carried out according to the technique described by Kagan & Jackson (2000, Section 3.1.2) and by Kagan & Jackson (2011, Section 3.3.3). Fig. 10 shows the calculated short-term forecast. Since our forecasts are calculated daily, they are only valid for the next moment after the end of a catalogue (Kagan & Jackson, 2000; 2011). To extend the forecast s horizon in time, we must note the possibility that other earthquakes might occur in the time between the end of a catalogue and the forecast time. Accounting for this can be managed by the Monte-Carlo simulation of multiple generations of dependent events (Kagan, 1973; Kagan & Knopoff, 1977; Zhuang, 2011). The short-term forecast s efficiency can be measured by its average probability gain per earthquake compared to the temporally uniform Poisson distribution (Kagan et al., 2010). For the short-term forecast, the gain G is about 3.7 for the PDE catalog relative to the temporally random but spatially localized null hypothesis (see Table 4 in the above paper). Forecasting and testing in this way does not provide information for rapid decisions, but it is an effective prospective operation as long as the algorithms and parameters are fixed in advance. Short-term probabilities computed by these programs can be used in the operational earthquake forecasting discussed by Jordan & Jones (2010), van Stiphout et al. (2010), and Jordan et al. (2011). 6 Discussion The earthquake forecast technique described here continues and in a certain sense a completes the work we have pursued since 1977 (see Introduction). Our question has been: how can we use the available seismicity record to optimally evaluate future seismic risk? Because of the fundamentally random character of earthquakes, such a forecast must be statistical. 14

Several problems must be solved to make such a forecast quantitatively efficient and applicable to the local and global environment. Relevant properties of earthquake occurrence need to be thoroughly investigated, including classical statistical earthquake distributions. How do Omori s law and the Gutenberg-Richter relation apply in various regions? We also need to consider in a quantitative, objective manner the spatial distribution of seismicity (Kagan, 2007). As with any application of measurement results to practical problems, we must have quantitative estimates for the measurement uncertainties and data completeness (Kagan, 2003). Our results (Bird & Kagan, 2004, and references therein; Kagan et al., 2010) indicate that whereas the b-value of the Gutenberg-Richter relation has a universal value 0.93-0.98, the maximum or corner magnitude changes significantly for various tectonic zones, from a high of M9.6 for subduction zones to a low of M5.9 for oceanic spreading ridge normal faulting earthquakes. Moreover, maximum earthquake size was shown to be the same, at least statistically, for all the studied subduction zones (Kagan, 1997). These maximum size determinations, combined with observed very large (m 9.0) events in the other subduction zones before 1997 warn us that such an earthquake could occur in any major subduction zone, including the Sumatra and Tohoku (Japan) regions. The Bird & Kagan (2004) regionalization is based, among other factors, on the earthquake focal mechanism. However, the PDE catalog does not have this information for many earthquakes. Kagan et al. (2010) subdivide global seismicity into five tectonic zones without using focal mechanism information. These zones are displayed in their Fig. 1. The global zone assignment table with 0.1 grid is available at http://bemlar.ism.ac.jp/wiki/index.php/bird%27s Zones. The corner magnitudes for these zones are shown in Table 1 by Kagan et al. (2010). These results allow us to extend our forecasts to evaluate the rate of the strongest earthquakes in each tectonic zone by extrapolating 15

the tapered Gutenberg-Richter distribution (Bird & Kagan, 2004; Kagan et al., 2010). Our tests of forecast efficiency, described in Sections 3.1 and 4, are conducted mostly for M5 to M6 earthquakes. These earthquakes are less interesting from a practical point of view, because they usually cause little damage. Preliminary tests for the forecast of higher magnitude earthquakes have shown that, although M 7.5 are mostly concentrated in trenches, the forecast skill for these events is similar to that for medium size earthquakes. Our investigation of earthquake time behavior (Kagan, 2011) suggests that the Omori s law parameters are also universal: even strong earthquakes are clustered in time instead of being quasi-periodic. We assume that in addition to earthquakes resulting from clustering, spontaneous earthquakes, approximately Poissonian, occur at a rate proportional to the regional strain rate. Thus, as clustered earthquakes decay in frequency according to the Omori law, the total rate becomes asymptotically Poissonian. Therefore, for tectonic plate interiors where the strain rate is low, only Omori s law decay for historical and paleo-earthquakes is usually observed. No transition to the Poisson rate can be unambiguously seen. England and Jackson (2011) indicate that most earthquake related casualties occur not at tectonic plate boundaries such as subduction zones but in the continental interiors far from the boundaries. This presents a challenge for seismicity-based earthquake forecasts: if earthquakes are rare, our technique cannot give a good estimate of future seismic hazard with temporally limited earthquake catalogs. However, a forecast based on including the global strain rate data (Bird et al., 2010a; 2010b) can significantly improve the seismicitybased forecasts, especially in regions where spatial geodesy and geologic data allow us to evaluate the strain rate accurately. Using the calibration provided by Bird & Kagan (2004) and Kagan et al. (2010), Bird et al. (2010a) converted the global strain rate to an earthquake hazard map. For most of the active continents (Alpine-Himalayan belt) we have enough earthquake 16

occurrence data and tectonic rate estimates to evaluate seismic risk, albeit with some difficulty. However, for plate interiors where the fault deformation rates are less than one mm per year, it is still a problem. In these regions only rough estimates of long-term earthquake rates can be obtained (Kagan, 2011). However, by using our results, extrapolation of seismic activity in aftershock zones can be obtained. As we indicated before, the forecast method described in this paper is a concluding step in the development of seismicity-based earthquake forecasts. Though a significant effort would be needed to obtain optimally chosen parameters for the smoothing kernel and to organize details of catalog preparation and processing, we feel that the main scientific problems have been solved. The forecast covers the whole surface of the Earth. The resolution of the forecast is unlikely to be significantly improved with the available data, and the adaptive feature of the smoothing kernel produces a nearly optimal adjusted forecast map. However, as discussed in Section 5, if a temporarily extended forecast is needed, then future seismic activity needs to be simulated. That may involve significant computation and would require greater resources than we have available. 7 Data and Resources The USGS National Earthquake Information Center Preliminary Determination of Epicenters (PDE) database is available at http://earthquake.usgs.gov/earthquakes/eqarchives/epic/ (last accessed December 2011). Acknowledgments We are grateful to Peter Bird and Paul Davis of UCLA, and Stefan Hiemer of ETH Zurich for useful discussion and suggestions. Reviews by Warner Marzocchi and an anonymous re- 17

viewer have been helpful in revising the manuscript. We thank Kathleen Jackson who edited the manuscript. The authors appreciate support from the National Science Foundation through grants EAR-0711515, EAR-0944218, and EAR-1045876, as well as from the Southern California Earthquake Center (SCEC). SCEC is funded by NSF Cooperative Agreement EAR-0529922 and USGS Cooperative Agreement 07HQAG0008. Publication 1634, SCEC. 18

References Bird, P. & Y. Y. Kagan, 2004. Plate-tectonic analysis of shallow seismicity: apparent boundary width, beta, corner magnitude, coupled lithosphere thickness, and coupling in seven tectonic settings, Bull. Seismol. Soc. Amer., 94(6), 2380-2399, (plus electronic supplement), see also an update at http://peterbird.name/publications/2004_global_coupling/2004_global_coupling.htm Bird, P., C. Kreemer & W. E. Holt, 2010a. A long-term forecast of shallow seismicity based on the global strain rate map, Seismol. Res. Lett., 81(2), 184-194 (plus electronic supplement). Bird, P., Y. Y. Kagan & D. D. Jackson, 2010b. Time-dependent global seismicity forecasts with a tectonic component: Retrospective tests, AGU Fall Meet. Abstract S44B-03. Bullen, K. E., 1979. An Introduction to the Theory of Seismology, 3rd ed., Cambridge, Cambridge University Press, 381 pp. England, P. & J. Jackson, 2011. Uncharted seismic risk, Nature Geoscience, 4, 348-349. Fisher, R. A., 1953. Dispersion on a sphere, Proc. Roy. Soc. London, ser. A, 271, 295-305. Fisher, N. I., T. Lewis & B.J.J. Embleton, 1987. Statistical Analysis of Spherical Data, Cambridge, Cambridge University Press, 329 pp. Helmstetter, A., Y. Y. Kagan & D. D. Jackson, 2006. Comparison of short-term and timeindependent earthquake forecast models for southern California, Bull. Seismol. Soc. Amer., 96(1), 90-106. Helmstetter, A., Y. Y. Kagan & D. D. Jackson, 2007. High-resolution time-independent grid-based forecast for M 5 earthquakes in California, Seism. Res. Lett., 78(1), 78-86. Jackson, D. D. & Y. Y. Kagan, 1999. Testable earthquake forecasts for 1999, Seism. Res. Lett., 70(4), 393-403. 19

Jordan, T. H. & L. M. Jones, 2010. Operational earthquake forecasting: some thoughts on why and how, Seismol. Res. Lett., 81(4), 571-574. Jordan, T. H., Y.-T. Chen, P. Gasparini, R. Madariaga, I. Main, W. Marzocchi, G. Papadopoulos, G. Sobolev, K. Yamaoka, & J. Zschau, 2011. Operational earthquake forecasting state of knowledge and guidelines for utilization, Annals Geophysics, 54(4), 315-391, doi: 10.4401/ag-5350. Kagan, Y. Y., 1973. Statistical methods in the study of the seismic process (with discussion: Comments by M. S. Bartlett, A. G. Hawkes, and J. W. Tukey), Bull. Int. Statist. Inst., 45(3), 437-453. Kagan, Y. Y., 1997. Seismic moment-frequency relation for shallow earthquakes: Regional comparison, J. Geophys. Res., 102(B2), 2835-2852. Kagan, Y. Y., 2003. Accuracy of modern global earthquake catalogs, Phys. Earth Planet. Inter., 135(2-3), 173-209. Kagan, Y. Y., 2007. Earthquake spatial distribution: the correlation dimension, Geophys. J. Int., 168(3), 1175-1194. doi: 10.1111/j.1365-246X.2006.03251.x Kagan, Y. Y., 2009. Testing long-term earthquake forecasts: likelihood methods and error diagrams, Geophys. J. Int., 177(2), 532-542. Kagan, Y. Y., 2011. Random stress and Omori s law, Geophys. J. Int., 186(3), 1347-1364, doi: 10.1111/j.1365-246X.2011.05114.x. Kagan, Y. Y., P. Bird & D. D. Jackson, 2010. Earthquake patterns in diverse tectonic zones of the globe, Pure Appl. Geoph. (The Frank Evison Volume), 167(6/7), 721-741, doi: 10.1007/s00024-010-0075-3. Kagan, Y. Y. & D. D. Jackson, 1994. Long-term probabilistic forecasting of earthquakes, J. Geophys. Res., 99, 13,685-13,700. 20

Kagan, Y. Y. & D. D. Jackson, 2000. Probabilistic forecasting of earthquakes, Geophys. J. Int., 143, 438-453. Kagan, Y. Y. & D. D. Jackson, 2010. Earthquake forecasting in diverse tectonic zones of the Globe, Pure Appl. Geoph. (The Frank Evison Volume), 167(6/7), 709-719, doi: 10.1007/s00024-010-0074-4. Kagan, Y. Y. & Jackson, D. D., 2011. Global earthquake forecasts, Geophys. J. Int., 184(2), 759-776, doi: 10.1111/j.1365-246X.2010.04857.x. Kagan, Y. & L. Knopoff, 1977. Earthquake risk prediction as a stochastic process, Phys. Earth Planet. Inter., 14(2), 97-108, DOI: 10.1016/0031-9201(77)90147-9. Kagan, Y. Y. & L. Knopoff, 1987. Statistical short-term earthquake prediction, Science, 236, 1563-1567. Mardia, K. V. & P. E. Jupp, 2000. Directional Statistics, Chichester, New York, Wiley, 429 pp. Marzocchi W. & J. D. Zechar, 2011. Earthquake forecasting and earthquake prediction: different approaches for obtaining the best model, Seismol. Res. Lett., 82(3), 442-448, doi: 10.1785/gssrl.82.3.442. Molchan, G. M., 1990. Strategies in strong earthquake prediction, Phys. Earth planet. Inter., 61(1-2), 84-98. Molchan, G. M., 1997. Earthquake prediction as a decision-making problem, Pure appl. Geophys., 149(1), 233-247. Molchan, G., 2010. Space-Time Earthquake Prediction: the Error Diagrams, Pure Appl. Geoph. (The Frank Evison Volume), 167(8/9), 907-917. doi: 10.1007/s00024-010-0087- z. Molchan, G. M. & Y. Y. Kagan, 1992. Earthquake prediction and its optimization, J. 21

Geophys. Res., 97, 4823-4838. Preliminary determinations of epicenters (PDE), 2011. U.S. Geological Survey, U.S. Dep. of Inter., Natl. Earthquake Inf. Cent., http://neic.usgs.gov/neis/epic/epic.html and http://neic.usgs.gov/neis/epic/code catalog.html Romashkova, L. L., 2009. Global-scale analysis of seismic activity prior to 2004 Sumatra- Andaman mega-earthquake, Tectonophysics, 470, 329-344, doi:10.1016/j.tecto.2009.02.011. Rong, Y.-F. & Jackson, D. D., 2002. Earthquake potential in and around China: Estimated from past earthquakes, Geophys. Res. Lett., 29(16): art. no. 1780, DOI: 10.1029/2002GL015297. Schorlemmer, D. & M. C. Gerstenberger, 2007. RELM testing Center, Seism. Res. Lett., 78(1), 30-35. Schorlemmer, D., J. D. Zechar, M. Werner, D. D. Jackson, E. H. Field, T. H. Jordan & the RELM Working Group, 2010. First results of the Regional Earthquake Likelihood Models Experiment, Pure Appl. Geoph. (The Frank Evison Volume), 167(8/9), 859-876, doi: 10.1007/s00024-010-0081-5 Silverman, B. W., 1986. Density Estimation for Statistics and Data Analysis, London, Chapman and Hall, 175 pp. van Stiphout, T., S. Wiemer & W. Marzocchi (2010), Are short-term evacuations warranted? Case of the 2009 L Aquila earthquake, Geophys. Res. Lett., 37, L06306, doi:10.1029/2009gl042352. Wang, Qi, D. D. Jackson & Y. Y. Kagan, 2011. California earthquake forecasts based on smoothed seismicity: Model choices, Bull. Seismol. Soc. Amer., 101(3), 1422-1430, doi: 10.1785/0120100125, (plus electronic supplement). 22

Watson, G. S., 1983. Statistics on Spheres, J. Wiley, New York, 238 pp. Werner, M. J., A. Helmstetter, D. D. Jackson & Y. Y. Kagan, 2011. High resolution longand short-term earthquake forecasts for California, Bull. Seismol. Soc. Amer., 101(4), 1630-1648, doi: 10.1785/0120090340, (plus electronic supplement). Zaliapin, I. & Molchan, G., 2004. Tossing the Earth: How to Reliably Test Earthquake Prediction Methods, Eos Trans. AGU, 85(47), Fall Meet. Suppl., Abstract S23A-0302. Zechar, J. D. & T. H. Jordan, 2008. Testing alarm-based earthquake predictions, Geophys. J. Int., 172(2), 715-724, doi:10.1111/j.1365-246x.2007.03676.x Zechar, J. D. & T. H. Jordan, 2010. The area skill score statistic for evaluating earthquake predictability experiments, Pure Appl. Geoph. (The Frank Evison Volume), 167(8/9), 893-906, DOI: 10.1007/s00024-010-0086-0 Zhuang, J., 2011. Next-day earthquake forecasts for the Japan region generated by the ETAS model, Earth Planets Space, 63, 207-216. 23

10 0 Fig. 1 10 1 Normalized density 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Distance, km Figure 1: Kernel functions: Red power-law kernel with fixed r s = 7.5 km; blue Fisher distribution kernels (Eq. 4) with κ = 10,000. 24

1 Fig. 2 0.9 0.8 Cumulative Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 κ = 100000 κ = 10000 κ = 1000 κ = 100 κ = 10 0 10 0 10 1 10 2 10 3 10 4 Epicentral Distance (km) Figure 2: Cumulative kernel functions: Solid and dashed diagonal lines correspond to power-law kernels (Eq. 1) with r s = 7.5 km; dashed line integrated over plane surface, solid line integrated over spherical surface. Sigmoidal solid lines from right to left correspond to the Fisher distribution kernels (Eq. 4) with κ = 10, 100, 1,000, 10,000, and 100,000. 25

7 Fig. 3 Information Scores (bits per event) 6 5 4 3 2 1 10 1 10 2 Distance Parameter, r m, km Figure 3: Optimization of Fisher kernels: dependence of information scores on the smoothing scale parameter r m = R ρ m in the Fisher distribution (10) for the 2006-2010 forecast based on the PDE catalog for 1969-2005. The abscissa r m values correspond to κ = 1, 000, 000; 100, 000; 10, 000 10; 10, 000; 1, 000 in this order. Red line is I 0 score, blue I 1, green I 2, and magenta line is for I 1. 26

Global Long-term Forecast, PDE 1969-Today, full sphere, 0.1 deg -7-6 -5-4 -3-2 Log10 probability of earthquake occurrence, M > 5.0, eq/day*(100km)2 Figure 4: Fixed power law kernel: long-term earthquake potential based on smoothed seismicity from the PDE catalog from 1969 to April 2012. Earthquake occurrence is modelled by a timeindependent Poisson process. The Lambert projection is applied for display. The fixed 27 power-law kernel (1) is used on 0.1 grid.

Long-term Forecast, PDE 1969-2005, full sphere, 0.5 deg., Fisher_ad 10000/0.003/0.5-7 -6-5 -4-3 -2 Log10 probability of earthquake occurrence, M > 5.0, eq/day*(100km)2 Figure 5: Adaptive Fisher kernel: long-term earthquake rates based on smoothed seismicity from the PDE catalog 1969 to April 2012. Adaptive smoothing kernel based on the Fisher spherical distribution (Eqs. 6 8) is used. Values of parameters are: κ = 100, 000, α = 0.5, and 28 ² = 0.003. Earthquake occurrence is modelled by a time-independent Poisson process.

1 Fig. 6 1.0 Fraction of earthquake rate, eq. numbers, ν 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 3 10 2 10 1 10 0 Relative fraction of total area, τ r Figure 6: Error diagram (τ, ν) for global long-term seismicity (M 5.0) forecast relative to that of power-law forecast based on earthquakes in learning period, 1969-2005. Solid black line, theoretical curve for spatially uniform rate density; red line, theoretical results for reference power-law kernel (Eq. 1) with r s = 7.5 km; blue line, empirical results for test period 2006-2010; magenta line, empirical results for retrospective application to training period, 1969-2005. 29

1 Fig. 7 1.0 Fraction of earthquake rate, eq. numbers, ν 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 3 10 2 10 1 10 0 Relative fraction of total area, τ r Figure 7: Error diagram (τ, ν) for global long-term seismicity (M 5.0) forecast relative to that of adaptive kernel forecast. Same format as in Fig. 6 is used. Solid black line, theoretical curve for spatially uniform rate density; red line, theoretical results for reference forecast based on adaptive Fisher kernel (Eq. 2) with κ = 10, 000 and α = 0.5; blue line, empirical results for test period 2006-2010; magenta line, empirical results for retrospective application to training period, 1969-2005. 30

10 0 Fig. 8 1.0 Fraction of earthquake rate, eq. numbers, ν 10 1 10 2 10 3 10 4 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Fraction of total area, τ r Figure 8: Partial error diagram for fixed power-law kernel. Same as Figure 6, except that reference forecast is the spatially uniform rate density model, horizontal axis is plotted on linear scale, and only the right half of the diagram is plotted. 31

10 0 Fig. 9 1.0 Fraction of earthquake rate, eq. numbers, ν 10 1 10 2 10 3 10 4 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Fraction of total area, τ r Figure 9: Partial error diagram for adaptive Fisher kernel. Same as Figure 7, except that reference forecast is the spatially uniform rate density model, horizontal axis is plotted on linear scale, and only the right half of the diagram is plotted. 32

Global Short-term Forecast, PDE 1969-Today, full sphere, 0.1 deg -12-10 -8-6 -4-2 0 Log10 probability of earthquake occurrence, M > 5.0, eq/day*(100km)2 Figure 10: Earthquake short-term potential on 0.1 grid based on smoothed seismicity from the PDE catalog since 1969. Earthquake occurrence is modelled by a branching temporal process controlled by Omori s law type dependence. 33