UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

Similar documents
Bootstrap prediction and Bayesian prediction under misspecified models

Comment on Systematic survey of high-resolution b-value imaging along Californian faults: inference on asperities.

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Scale-free network of earthquakes

Toward automatic aftershock forecasting in Japan

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

STAT 425: Introduction to Bayesian Analysis

Density Estimation. Seungjin Choi

A GLOBAL MODEL FOR AFTERSHOCK BEHAVIOUR

Modeling of magnitude distributions by the generalized truncated exponential distribution

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Impact of earthquake rupture extensions on parameter estimations of point-process models

Consideration of prior information in the inference for the upper bound earthquake magnitude - submitted major revision

arxiv:physics/ v2 [physics.geo-ph] 18 Aug 2003

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Finite data-size scaling of clustering in earthquake networks

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Significant improvements of the space-time ETAS model for forecasting of accurate baseline seismicity

AR-order estimation by testing sets using the Modified Information Criterion

Proof In the CR proof. and

Multifractal Analysis of Seismicity of Kutch Region (Gujarat)

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Inferring from data. Theory of estimators

Monte Carlo simulations for analysis and prediction of nonstationary magnitude-frequency distributions in probabilistic seismic hazard analysis

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Seismic Characteristics and Energy Release of Aftershock Sequences of Two Giant Sumatran Earthquakes of 2004 and 2005

Making statistical thinking more productive

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

A Very Brief Summary of Statistical Inference, and Examples

The Role of Asperities in Aftershocks

Regression, Ridge Regression, Lasso

ANALYSIS OF SPATIAL AND TEMPORAL VARIATION OF EARTHQUAKE HAZARD PARAMETERS FROM A BAYESIAN APPROACH IN AND AROUND THE MARMARA SEA

On the principle of maximum entropy and the earthquake frequency -magnitude relation

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

A. Evaluate log Evaluate Logarithms

Discrepancy-Based Model Selection Criteria Using Cross Validation

A simple analysis of the exact probability matching prior in the location-scale model

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Two Different Shrinkage Estimator Classes for the Shape Parameter of Classical Pareto Distribution

Geographically Weighted Regression as a Statistical Model

Bias-corrected AIC for selecting variables in Poisson regression models

INFORMATION THEORY AND AN EXTENSION OF THE MAXIMUM LIKELIHOOD PRINCIPLE BY HIROTOGU AKAIKE JAN DE LEEUW. 1. Introduction

SUPPLEMENTAL INFORMATION

Statistics: Learning models from data

Journal of Asian Earth Sciences

Derivation of Table 2 in Okada (1992)

Bayesian Inference and MCMC

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Model comparison and selection

V. Introduction to Likelihood

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Because of its reputation of validity over a wide range of

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Extreme Value Analysis and Spatial Extremes

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Robust Monte Carlo Methods for Sequential Planning and Decision Making

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Part 4: Multi-parameter and normal models

Research Article Improved Estimators of the Mean of a Normal Distribution with a Known Coefficient of Variation

Classical and Bayesian inference

Stat 451 Lecture Notes Monte Carlo Integration

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Lecture 8: Information Theory and Statistics

Toward interpretation of intermediate microseismic b-values

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

A Very Brief Summary of Bayesian Inference, and Examples

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

arxiv: v1 [math.st] 10 Aug 2007

Scaling Relationship between the Number of Aftershocks and the Size of the Main

BAYESIAN RELIABILITY ANALYSIS FOR THE GUMBEL FAILURE MODEL. 1. Introduction

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Quantum Minimax Theorem (Extended Abstract)

Stephen Hernandez, Emily E. Brodsky, and Nicholas J. van der Elst,

Small-world structure of earthquake network

A NEW STATISTICAL MODEL FOR VRANCEA EARTHQUAKES USING PRIOR INFORMATION FROM EARTHQUAKES BEFORE 1900

ECE 275A Homework 7 Solutions

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Regression and Time Series Model Selection in Small Samples. Clifford M. Hurvich; Chih-Ling Tsai

Bayesian Confidence Intervals for the Mean of a Lognormal Distribution: A Comparison with the MOVER and Generalized Confidence Interval Procedures

All models are wrong but some are useful. George Box (1979)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

V. Introduction to Likelihood

Lecture 6: Model Checking and Selection

Parametric Techniques

RAPID SOURCE PARAMETER DETERMINATION AND EARTHQUAKE SOURCE PROCESS IN INDONESIA REGION

OVERVIEW OF PRINCIPLES OF STATISTICS

Estimation for inverse Gaussian Distribution Under First-failure Progressive Hybird Censored Samples

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Carl N. Morris. University of Texas

Transcription:

J. Phys. Earth, 34, 187-194, 1986 UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY Yosihiko OGATA* and Ken'ichiro YAMASHINA** * Institute of Statistical Mathematics, Tokyo, Japan ** Earthquake Research Institute, The University of Tokyo, Tokyo, Japan (Received October 18, 1985; Revised May 8, 1986) If we assume that magnitudes of earthquakes are distributed identically and independently according to a negative-exponential function, then the maximum likelihood estimate proposed by Utsu for the b-value is biased from the true value. We suggest an unbiased alternative estimate which is asymptotically equivalent to the maximum likelihood estimate. The relation between the unbiased estimate and the maximum likelihood estimate are presented from a Bayesian viewpoint. The two estimates are compared in order to show the superiority of the unbiased estimate over the maximum likelihood estimate, on the basis of the expected entropy maximization principle for the predictive distributions. From the same principle, the posterior with the noninformative improper prior is recommended for the inferential distribution of the b-value rather than the standardized likelihood. 1. Bias of the Maximum Likelihood Estimate According to Gutenberg-Richter's law (GUTENBERG and RICHTER, 1944), the number N(M) of earthquakes having magnitude equal to or larger than M can be expressed by the equation UTSU (1965) proposed the estimate for b with the equation (1) (2) where N is the number of earthquakes and M0 is the minimum magnitude in a given sample. AKI (1965) suggested that this is nothing but the maximum likelihood estimate which maximizes the likelihood function (3) where 187

188 Y. OGATA and K. YAMASHINA and fl = b/log10e. Using the large sample theory for the maximum likelihood, An (1965) provided the asymptotic error band for b. It is further seen that the maximum likelihood estimate Eq. (2) tends to be the true value b and that b is the asymptotically unbiased estimate of b0; i.e., E(8)-bo_??_0 as N increases. For fixed N, however, b is not unbiased. Indeed, using the exact distribution density of b in (2) which is derived from Eq. (4) by UTSU (1966), we have This suggests that the bias is not small when N is small or b is large. A natural way for correcting the bias of b arises immediately based on Eq. (6); that is, (4) (5) (6) (7) which is the unbiased estimate for bo. Although there are many unbiased estimators of b0 besides b, we will show that the estimator Eq. (7) enjoys certain optimality. 2. Relation between b and 17 from a Bayesian Viewpoint Without loss of generality we shall consider the relationship between respectively, and where instead of b and b, Xi=Mi-M0 is distributed according to f (x 13)=@/+5/@e -@/+5/@x. Suppose there is a prior distribution n(/3) for the parameter /3. Then by applying the Bayes theorem we have the posterior distribution (8) Consider the case where although this is not the probability density; this is called the uniform improper prior, Then Eq. (8) provides the standardized likelihood which is conventionally used for the confidence probability distribution; (9)

Unbiased Estimate for b-value of Magnitude Frequency 189 (10) where the sum E is taken from i= 1 up to i = N, and the mode is given by the maximum likelihood estimate /3. In general, the uniform improper prior Eq. (9) does not seem to be appropriate when characterizing a situation where "nothing(or more realistically, little) is known a priori." JEFFEREYS (1961) suggested the rule that noninformative prior distribution is given by the square root of Fisher's information measure. The rule is justified on the grounds of its invariance under parameter transformations; see also Box and TIAO (1973) and AKAIKE (1978) for examples. In the present case the Fisher's information measure is Thus the noninformative prior is improper and 74)6)=1@/+5/@, (0 </3 < co). Therefore, the posterior probability is given by where the sum E is taken from i= 1 up to i= N. It is easily seen that the mode of the posterior distribution Eq. (11) is fi while the posterior mean (Bayes estimate) is which is equal to the maximum likelihood estimate @/+5/@. (11) (12) 3. Entropy Maximization Principle and the Performance of the Estimates AKAIKE (1977) introduced and formulated the entropy maximization principle, which may be specifically described for our purpose as follows. Denote by x the vector of observations (x1, x2,., xn). Assume that the true distribution of x is specified by an unknown density function f(z) =f(z1, z2,..., zn) and is considered to be included a parameterized density g(z 18). Considering y = (y1, y2,..., yn) as the future observation with the same distribution as x, and supposing that we predict it based on the present data set x, we may call g(y 18) a predictive distribution, where 6 --6(x) is an estimate of the true parameter 80 satisfying f CO= g(y _??_0). As the measure of the correct fit of g(y f B) to f(y) we use the entropy off (y) with respect to g(y 6), defined by Note here that B(f;g) is the function of x. Thus we consider the expected entropy (13)

190 Y. OGATA and K. YAMASHINA where Ex and Ex are expectations with respect to the random vectors X and Y, respectively. AKAIKE (1977) justified the use of this measure based on the process of conceptual sampling experiments, and described the natural relation to the log likelihood. Further, the AIC was derived from this quantity (AKAIKE, 1973). Here we use the expected entropy J (8) to measure the performance of the estimates; a larger value shows the better fit. Suppose {Xi} and { Yi}, i =1, 2,, N are identically and independently distributed according to the density function g(x @/+5/@) = @/+5/@0e-@/+5/@ox. Then we have and (14) (15) Here we have used the equalities (16) and (N) is the digamma function such that W(z) = d/dz log T (z). Note that both J(@/+5/@) and J(@/+5/@) are independent of /30. Since we see the superiority of the estimates @/+5/@=(N-1)/_??_Xi. Furthermore, consider the family of estimators of the form c//xi where is a positive constant. Here we find the optimal as follows: By substituting this for 8(X) in (14), we have From this we see that c= N-1 maximizes the expected entropy Eq. (14). There are many losses other than in Eq. (14). For example, one should consider the expected square loss Ex[(8(X) 80)21 to measure the performance of the estimators. For the estimators of the form _??_/_??_Xi we have (17) This equality suggests that the unbiased estimator P= (N-1)/_??_Xi is better than the

Unbiased Estimate for b-value of Magnitude Frequency 191 Table 1. Comparison of the average absolute loss. maximum likelihood estimator, = N/_??_Xi, although the minimum value of Eq. (17) is attained by the estimator (N-2)/_??_Xi. Of further interest is the loss to see in the absolute value of the difference, I 0(X)-00 I. With the Monte Carlo experiment the comparison has been performed for the estimator N/_??_Xi, (N-1)/_??_Xi and (N-2)/_??_Xi. The trial was repeated 10,000 times for $0 such that b0=@/+5/@01og10e = 0.5, 1.0, and 1.5. The sample averages of the loss are shown in Table 1, which shows that the unbiased estimator IT performs best. Thus we see that different losses lead to different preferences with the estimators. Therefore we should be cautious about the meaning of losses. Our purpose here is to compare the performance of the estimator b with the maximum likelihood estimator 5. The maximum likelihood method (as well as AIC) is viewed as an optimal realization of the entropy maximization principle (AKAIKE, 1973, 1977); we already know that the expected log likelihood is almost the probabilistic definition of entropy itself. Thus, we have used the loss Eq. (14) to examine the effect of b as the refinement of the maximum likelihood estimator. 4. Predictive Distributions Using Posteriors In the previous section we have only considered the case where the predictive distribution is given in the form of g(y g) by substituting the point estimate 6 = 8(x).

192 Y. OGATA and K. YAMASHINA Table 2. Comparison of the expected entropy. * N is the sample size. AKAIKE (1978) suggested that the predictive distribution of the form g(y _??_)p(_??_ x) d_??_ fitted better on average than a particular g(y g) with 8 randomly _??_ chosen according to the posterior distribution p(8 x). The implication of this is that the full use of posterior distribution performs better than the point estimates. For example the posteriors Eqs. (10) and (11) for the estimate of b-value provides the so-called confidence limits or error bars of the inferential distributions. Incidentaliy, both distributions Eqs. (10) and (11) tend to be normal distributions in a sense as the sample size increases, and the confidence limits provided in AKI (1965), for example, are obtained. To see the performance of posteriors with a fixed sample size we calculate the expected entropy. For the predictive distribution with the posterior Eq. (10), we have (18) and for the predictive distribution with the posterior Eq. (11), we have (19) Note that these quantities also are independent of the true value of flu. Therefore (20) which suggests that the posterior distribution with the noninformative prior works

Unbiased Estimate for b-value of Magnitude Frequency 193 better than one with the uniform prior from the predictive viewpoint. To compare the values of Eqs. (15), (16), (18), and (19), we carried out numerical calculation using the approximation of the digamma function for the large x and also used the relation P(x+ 1) = W(x) + 1/x. Table 2 suggests that the predictive distribution used in averaging the posterior is more effective than those based on the point estimates. 5. Discussion In personal communication Dr. Masuda pointed out that the different parameterization of Gutenberg-Richter's law log10n(m) = a M/s instead of Eq. (1), or the density function f(m I o-)= cr -1 e - (M-m )/' where cr = s/logl0e instead of Eq. (4), leads to the unbiased maximum likelihood estimator for c7. Indeed, the standard likelihood is given by (21) and this mode _??_=_??_Xi/N is unbiased. However, if we examine a family of estimators in the form 3, of _????_=_??_Xi/_??_ where is a positive constant, then we have, as in section From this we see that = N 1 maximizes the expected entropy, and Q = Xi/(N-1) performs best as the point estimator. _??_ Furthermore, since the noninformative prior is n(cr) =1/_??_ (0 < _??_ < _??_), the corresponding posterior to Eq. (11) is given by (22) By a similar computation in the previous section we have instead of Eq. (20). This suggests that the same consequence as that in section 4 holds for the present parameterization of Gutenberg-Richter's law. (23)

194 Y. OGATA and K. YAMASHINA 6. Concluding Remarks The entropy maximization principle suggests that the full use of posterior Eq. (11) is the most effective among the considered estimations for the b-value. Thus the confidence limits, for example, should be made based on the posterior Eq. (11). As far as the point estimation is concerned the unbiased estimate @/+5/@-=(N-1)/_??_Xi is better than the maximum likelihood estimate @/+5/@= N/_??_Xi, and furthermore is the best among the _??_/_??_Xi type family. We are grateful to Sigeo Aki for helpful information in the numerical calculation of the digamma function. We also wish to thank Dr. T. Masuda of Tohoku University for his useful comments. REFERENCES AKAIKE, H., Information theory and an extension of the maximum likelihood principle, in 2nd International Symposium on Information Theory, ed. B. N. Petrov and F. Csaki, pp. 267-281, Akademiai Kiado, Budapest, 1973. AKAIKE, H., On entropy maximization principle, in Applications of Statistics, ed. P. R. Krishnaiah, pp. 27-41, North-Holland, Amsterdam, 1977. AKAIKE, H., A new look at the Bayes procedure, Biometrika, 65, 53-59, 1978. AKI, K., Maximum likelihood estimate of b in the formula log N= a- bm and its confidence limits, Bull. Earthq. Res. Inst., 43, 237-239, 1965. Box, G. E. P. and G. C. TIAO, Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Massachusetts, 1973. GUTENBERG, B. and C. F. RICHTER, Frequency of earthquakes in California, Bull. Seismol. Soc. Am., 34, 185-188, 1944. JEFFEREYS, H., Theory of Probability, 3rd ed., Clarendon Press, Oxford, 1961. UTSU, T., A method for determining the value of b in a formula log n=a-bm showing the magnitude-frequency relation for earthquakes, Geophys. Bull. Hokkaido Univ., 13, 99-103, 1965 (in Japanese with English summary). UTSU, T., A statistical significance test of the difference in b-value between two earthquake groups,.1. Phys. Earth, 14, 37-40, 1966. Note Added in Proof Exact form of AIC can be derived from Eq. (14) for the estimates b= ; that is to say, where When we use the unbiased estimate for the comparison of b-values between two seismic activities (in time or space), we can adopt AIC(N-1), where c =1 always hold. The AIC(c) is derived by the similar way to AKAIKE (1973, 1977).