Bias of the Maximum Likelihood Estimator of the Generalized Rayleigh Distribution

Size: px

Start display at page:

Download "Bias of the Maximum Likelihood Estimator of the Generalized Rayleigh Distribution"

Hector Reed
5 years ago
Views:

1 Bias of the Maximum Likelihood Estimator of the Generalized Rayleigh Distribution by Xiao Ling B.Sc., Beijing Normal University, 2007 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS in the Department of Economics c Xiao Ling, 2011 University of Victoria All rights reserved. This thesis may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author.

2 ii Bias of the Maximum Likelihood Estimator of the Generalized Rayleigh Distribution by Xiao Ling B.Sc., Beijing Normal University, 2007 Supervisory Committee Dr. David E.A. Giles, Supervisor (Department of Economics) Dr. Kenneth G. Stewart, Departmental Member (Department of Economics)

3 iii Supervisory Committee Dr. David E.A. Giles, Supervisor (Department of Economics) Dr. Kenneth G. Stewart, Departmental Member (Department of Economics) ABSTRACT We derive analytic expressions for the biases, to O(n 1 ) of the maximum likelihood estimators of the parameters of the generalized Rayleigh distribution family. Using these expressions to bias-correct the estimators is found to be extremely effective in terms of bias reduction, and generally results in a small reduction in relative mean squared error. In general, the analytic bias-corrected estimators are also found to be superior to the alternative of bias-correction via the bootstrap.

4 iv Contents Supervisory Committee ii Abstract iii Table of Contents iv List of Tables vi List of Figures vii Acknowledgements viii 1 Introduction The Generalized Rayleigh Distribution Estimation Issues Theoretical Results Notation and Definition Motivating Examples Bias of the MLE for Parameters in Generalized Rayleigh Distribution 15 3 Numerical Evaluations Simulation design Simulation results

5 v 4 Illustrative Example 38 5 Conclusions 42 References 44 A The Rayleigh Distribution 46 B R code for Bias-Adjusted Estimators of the Generalized Rayleigh Distribution 49 C Inversion Method to Generate a Random Variable 53

6 vi List of Tables Table 3.1 Parameter setting Table 3.2 Parameter setting for bootstrapping Table 3.3 Percentage biases and MSE s of the Rayleigh distribution Table 3.4 Percentage biases and MSE s of the Maxwell distribution Table 3.5 Percentage biases and MSE s of the Half-Normal distribution. 32 Table 3.6 Percentage biases and MSE s of the Chi distribution Table 4.1 Total Expenditure on Alcohol and Conditional Budget Shares for Alcohol: United Kingdom, Table 4.2 Estimates of Generalized Regression Model with Half Normal distribution Table 4.3 Mean Marginal Effects

7 vii List of Figures Figure 1.1 Four density plots illustrating specific distributions in the Generalized Rayleigh distribution family: (a) the Rayleigh distribution with parameter λ = 2, (b) the Maxwell distribution with parameter λ = 2, (c) the Half-Normal distribution with σ = 2, and (d) the Chi distribution with parameters a = 1 and τ = 1. 3 Figure 3.1 The Rayleigh probability density plot with different parameter values: λ = 0.5, 1, 2 and Figure 3.2 The Maxwell probability density plot with different parameter values: λ = 0.5, 1 and Figure 3.3 The Half Normal probability density plot with different parameter values: σ = 0.5, 1, 2 and Figure 3.4 The Rayleigh probability density plot with different parameter values: τ = 0.5, 1, 2 and 3 in both (a) and (b); a = 1 in (a) and a = 3 in (b) Figure C.1 The plot of c.d.f for Rayleigh distribution with scale parameter equals to one

8 viii ACKNOWLEDGEMENTS I would first like to acknowledge my gratitude to my supervisor, Dr. David Giles who has guided me into the field of bias correction. During my study at the University of Victoria, he has provided infinite help and patience. I would also like to express my thanks to my committee members, who have been very helpful in assisting in the completion of the thesis. Finally, I would like to thank my family members for their support.

9 Chapter 1 Introduction 1.1 The Generalized Rayleigh Distribution Voda (1976) derived a generalized version of the Rayleigh distribution called the Generalized Rayleigh distribution (GRD). Its density function, which differed from another version introduced by Burr is given as follows: f(x; θ, k) = 2θk+1 Γ(k + 1) x2k+1 exp{ θx 2 }, (1.1) with x > 0, θ > 0, and k 0, where Γ(u) is the well-known gamma function: Γ(u) = 0 t u 1 e t dt. A wide range of probability distributions emerge as particular cases of the distribution given in (1.1), as follows: (a) For k = 0 and θ = 1/2λ 2 we obtain the one-parameter Rayleigh distribution with the density function as follows: f(x; λ) = x x2 exp{ }, x > 0, λ > 0; (1.2) λ2 2λ2 (b) For k = 1/2 and θ = 1/2λ 2 we obtain the one-parameter Maxwell distribution

10 2 with the density function as follows: f(x; λ) = 2 λ 3 (2π) 1 /2 x2 exp{ x2 } x > 0, λ > 0; (1.3) 2λ2 (c) For k = a 2 1 and θ = 1/2τ 2 we obtain the Chi distribution with a degrees of freedom, and its density function is as follows: f(x; τ, a) = x a 1 2 a 2 a 1 τ a Γ( a 2 x2 }, x > 0 a N, τ > 0; (1.4) )exp{ 2τ 2 (d) If we drop the positivity requirement for k and take k = 1/2 and θ = 1/2σ 2, we obtain the Half-Normal distribution with the density function as follows: f HN (x; σ) = 2 σ(2π) 1/2 exp{ x2 }, x > 0, σ > 0. (1.5) 2σ2 Figure 1.1 shows some illustrative density functions for each distribution in the Generalized Rayleigh family. Figure 1.1 (a) shows the density plot of the Rayleigh distribution with parameter λ = 2. (b) shows the density plot of the Maxwell distribution with parameter λ = 2. (c) is the density plot of the Half-Normal distribution with σ = 2. Finally, (d) is the density plot of the Chi distribution with parameters a = 1 and τ = 1. Given this illustration of the special cases that emerge with different parameter values under the Generalized Rayleigh distribution family, applying a model based on the Generalized Rayleigh distribution family seems to be very useful. For example it is more flexible than the widely used Weibull model for statistical modeling of reliability, since the latter includes only the Rayleigh distribution as a special case, while the former also encompasses the Maxwell distribution. In addition, other distributions, such as the Half-Normal distribution in (1.5) can also be applied widely in social

11 3 (a) (b) pdf(x) pdf(x) x x (c) (d) pdf(x) pdf(x) x x Figure 1.1: Four density plots illustrating specific distributions in the Generalized Rayleigh distribution family: (a) the Rayleigh distribution with parameter λ = 2, (b) the Maxwell distribution with parameter λ = 2, (c) the Half-Normal distribution with σ = 2, and (d) the Chi distribution with parameters a = 1 and τ = 1.

12 4 science research. More precisely, let us look into some application examples for each distribution in the Generalized Rayleigh distribution family. A Rayleigh distribution has many applications in life testing electro-vacuum devices (Polovko, 1968), and in communication engineering (Dyer and Whisenand, 1973). In addition it also arises when the overall magnitude of a vector is related to its directional components. One example where the Rayleigh distribution naturally arises is when wind speed is analyzed in terms of its orthogonal 2-dimensional vector components. Assuming that the magnitude of each component is uncorrelated and normally distributed with equal variance, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distribution. A second example of this distribution arises in the case of random complex numbers whose real and imaginary components are i.i.d. (independently and identically distributed) Gaussian. In that case, the absolute value (modulus) of the complex number is Rayleigh-distributed. These random variables play an important role in land mobile radio because they accurately describe the instantaneous amplitude and power, respectively, of a multipath fading signal (Wikipedia, 2011a). The Maxwell distribution describes how the speed of a mixture of moving particles varies at a particular temperature (Wikipedia, 2011b). In any particular mixture of moving molecules, the speed will vary a great deal, from very slow particles (low energy) to very fast particles (high energy). Most of the particles, however, will be moving at a speed very close to the average. In other words, this distribution describes particle speeds in gases, where the particles do not constantly interact with each other but move freely between short collisions. It describes the probability of a particle s speed (the magnitude of its velocity vector) being near a given value as a function of the temperature of the system, the mass of the particle, and that speed value.

13 5 The Half-Normal distribution is an extension of the Normal distribution. Whenever a difference or deviation is measured and the algebraic sign is unknown, disregarded, lost, or otherwise absent, the resulting distribution of these absolute measurements can range in shape from Half-Normal to the normal distribution as the limit. Typical of this are the examples which arise in industrial practice, such as quality control. The Chi distribution usually arises when a k-dimensional vector s orthogonal components are independent and each follow a standard normal distribution. The length of the vector will then have a Chi distribution. The most familiar example is the Maxwell distribution of (normalized) molecular speeds which is a Chi distribution with 3 degrees of freedom. The variety of applications described above demonstrates the reason to investigate the Generalized Rayleigh distribution in this thesis. Most researches studying the Generalized Rayleigh distribution concern the moment properties or confidence intervals, not the goodness of fit. This thesis focuses on the bias and mean squared error of the maximum likelihood estimator for the parameters in the Generalized Rayleigh distribution family. 1.2 Estimation Issues There are several general estimation strategies that can be used in a wide variety of situations, such as generalized method of moments, maximum likelihood, simulationbased estimation and Bayesian methods. Among these methods, the maximum likelihood estimator (MLE) is very well known and popular, and this is going to be applied in this thesis to estimate the two unknown parameters in the Generalized Rayleigh distribution given in (1.1). It should be noted that these MLEs cannot be expressed

14 6 in closed form, but we will still be able to obtain an analytic expression for their bias, to O(n 1 ). In statistical analysis, an estimator is a rule for calculating an estimate of a particular unknown parameter, based on observed data. Econometric theory is concerned, among other things, with the properties of estimators; that is, with defining properties that can be used to compare different estimators (different rules for creating estimates) for the same quantity, based on the same data. There are two kinds of properties: finite-sample properties, including mean squared error, variance, bias, and so on; and asymptotic properties, including consistency, asymptotic normality, asymptotic efficiency, and so on. Such properties can be used to determine the best rules to use under given circumstances. In this thesis, we are going to focus on one particular issue: the bias of the MLE for the parameters of the Generalized Rayleigh distribution, in finite samples. In statistics, the bias (or bias function) of an estimator is the difference between this estimator s expected value and the true value of the parameter being estimated. Suppose that ˆξ is an estimator of the parameter ξ. Then Bias(ˆξ) = E(ˆξ) ξ. An estimator with zero bias is called unbiased; Bias(ˆξ) = 0. Otherwise the estimator is said to be biased. As is well known, the MLE of the location parameter for the normal distribution is unbiased. But the MLE of the scale parameter for the normal distribution is not. To reduce it, first we need to find the expectation of MLE for the scale parameter. Fortunately, in the Normal distribution, it is easy to find a closed form MLE for the scale parameter. Then we can compute the bias for this parameter and eliminate it exactly. However, for many other distributions, their MLE s are not in closed form, which means their bias can not be easily determined, or eliminated. One of the early considerations of analytic approximations to the bias of an MLE was illustrated by Bartlett (1953a). It was the the derivation of the O(n 1 ) bias for

15 7 the MLE of a one dimensional parameter. To illustrate the basic strategy of this derivation, we set l(θ) as the log-likelihood function for single parameter, θ. Assume that l(θ) is regular with respect to all derivatives up to and including the third order. If ˆθ is the MLE, then l (ˆθ) ( l/ θ) θ=ˆθ = 0. Another very important property is that the expectation of the first derivative of the log-likelihood function is zero, i.e. E(l (θ 0 )) = 0 where θ 0 could be any number in its domain. Then if we consider the Taylor expansion of the first derivative of the log-likelihood function when θ = ˆθ, we get the following: l (ˆθ) = l (θ 0 ) + (ˆθ θ 0 )l (θ 0 ) (ˆθ θ 0 ) 2 l (θ 0 ) +... = 0. If we take the expectation of both sides of the equation above, we get the following: E((ˆθ θ 0 ))E(l (θ 0 )) + cov((ˆθ θ 0 ), l (θ 0 )) E((ˆθ θ 0 ) 2 )E(l (θ 0 )) cov((ˆθ θ 0 ) 2, l (θ 0 )) 0. (1.6) Then we can solve a closed form expression for E((ˆθ θ 0 )) which is the bias equation of the MLE, to O(n 1 ). Note, however, that we do not need the closed form expression for ˆθ in order to derive the bias equation. This workable sense of the analysis inspired many scientists to study this area during the latter half of twentieth century. Haldane and Smith (1956), derived expressions for the first four cumulants to this same order of accuracy, and Shenton and Bowman (1963) obtained higher-order approximations for the first four moments of an MLE. Bartlett (1953b) and Haldane (1953) explored the bias of an MLE when p = 2, and Shenton and Wallington (1962) and Cox and Snell (1968) derived formula for the O(n 1 ) bias of an MLE in the multi-parameter case. In this thesis, we will apply the theory developed by Cox and Snell.

16 8 On the other hand, there are several other methods to reduce the bias, for example, the Jacknife bias reduction, bootstrap bias reduction and the analytic reduction introduced by Firth (1993). The idea of Jacknife bias reduction is recomputing the statistic estimate leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias of the statistic can be calculated. The idea of bootstrap bias reduction is that recalculating the statistic estimate from a bulk of equal size new samples sampling from the original data set with replacement. Then an estimate for the bias of the statistic can be calculated from those resamples. Although there are huge theoretical differences between these two methods in their mathematical insights, the main practical difference for researchers is that the bootstrap gives different results when repeated on the same data, whereas the jackknife might give exactly the same result each time. Firth considered reducing bias in a different theoretical way. The O(n 1 ) bias may be removed from the maximum likelihood estimator by the introduction of an appropriate bias term into the score function. If the target parameter is the canonical parameter of an exponential family, the method simply penalizes the likelihood by the Jeffreys invariant prior. For other parameterizations of exponential family models, and for nonexponential families, a choice is available between corrections using observed and expected information. Outside the exponential family, use of the expected information results in a loss of second-order efficiency. The difference between the method introduced by Firth and the one by Cox and Snell is that Firth proposed some consideration of reducing the bias from the score function, while Cox and Snell made some reduction based on the MLE from the original score function. We will focus on the method introduced by Cox and Snell and the bootstrapping one. However, the theory introduced by Cox and Snell is the most important method we are going to discuss in this thesis. The basic idea is to improve the MLE by

17 9 subtracting the estimated bias. We work on an appropriate bias term coming from (1.6) introduced by Bartlett. With some early papers about this expansion idea for multiple parameters discussed before, the bias estimates for two parameters in the Generalized Rayleigh distribution are also easy to work out. Then, the behavior of the bias-corrected estimates for the parameters in the Generalized Rayleigh distribution is explored in a simulation study. Moreover, we also discuss the effectiveness of reducing the bias of the MLE for the Generalized Rayleigh distribution by using bootstrap simulation. This thesis is concerned with investigating the bias property of the MLE in the generalized Rayleigh distribution step by step. The rest of this thesis is organized as follows. In Chapter 2, we review important concepts and tools of Cox and Snell s idea. We first demonstrate the notations and definitions illustrated by Cox and Snell (1968). Then we introduce some examples with well known distribution such as the binomial distribution and normal distribution, to see how the idea of Cox and Snell works. We also discuss the analytical expressions for the biases of the MLEs in the Generalized Rayleigh distribution, and then illustrate the details of these biases and the corresponding bias-adjusted MLEs. In Chapter 3, a Monte Carlo experiment is used to compare the performance of the latter estimators with that of bias-adjusted estimators that use the bootstrap to determine finite-sample bias. We start by describing the experimental designs for both Monte Carlo simulation and Bootstrapping. Then we will discuss the sampling properties of the bias of MLE with different sample sizes. An empirical example using real data is also included in Chapter 4. Finally, in Chapter 5 we provide some general discussion of the advantages and disadvantages of Cox and Snell s method and briefly discuss future work.

18 10 Chapter 2 Theoretical Results To prepare for the discussion of our bias correction methods for the estimators of the two parameters in the Generalized Rayleigh distribution, we review the important bias derivation introduced by Cox and Snell (1968) in this chapter. In Section 2.1, we discuss the idea proposed by Cox and Snell with basic notations and definitions. In Section 2.2, we apply the procedure described by Cox and Snell to two well known distributions and compare those bias property with bias definition. In Section 2.3, more details about the derivatives of the Generalized Rayleigh distribution that we are going to use in this thesis are calculated. 2.1 Notation and Definition First of all, we set up the notation, before we start to describe the procedure of estimating bias that we are going to use in this thesis. Let l(θ) be the log-likelihood function based on a sample of n observations, with p-dimensional parameter vector, θ. Assume that l(θ) is regular with respect to all derivatives up to and including the third order. Then recall the basic strategy we described in Chapter 1. The procedure of bias analysis we are going to use involves the joint cumulants of the derivatives of

19 11 l(θ) and the derivatives of cumulants. The joint cumulants of the derivatives of l(θ) are denoted as follows: ( ) 2 l k ij = E θ i θ j ( ) 3 l k ijl = E θ i θ j θ l ( ) 2 l l k ij,l = E θ i θ j θ l ; i, j = 1, 2,..., p ; i, j, l = 1, 2,..., p (2.1) ; i, j, l = 1, 2,..., p. On the other hand, the derivatives of cumulants are denoted as follows: k (l) ij = k ij θ l ; i, j, l = 1, 2,..., p. (2.2) All of the expressions in (2.1) and (2.2) are assumed to be O(n). Extending earlier work by Tukey (1949), Bartlett (1953a, 1953b), Haldane (1953), Haldane and Smith (1956), Shenton and Wallington (1962) and Shenton and Bowman (1963), Cox and Snell (1968) showed that when the sample data are independent (but not necessarily identically distributed) the bias of the s th element of the MLE of θ (ˆθ) is: Bias(ˆθ s ) = p p p k si k sj ( 1 2 k ijl + k ij,l ) + O(n 2 ), s = 1, 2,..., p, i=1 j=1 l=1 where k ij is the (i, j) th element of the inverse of the (expected) information matrix, K = { k ij }. Cordeiro and Klein (1994) note that this bias expression also holds if the data are nonindependent, provided that all of the k terms are O(n), and show that it can be re-written as:

20 12 Bias(ˆθ s ) = p i=1 k si p j=1 p l=1 k sj (k (l) ij 1 2 k ijl)k jl + O(n 2 ), s = 1, 2,..., p. The computational advantage of equation above is that it does not involve terms of the form defined in the third equation of (2.1). Now, let a (l) ij i, j, l = 1, 2,..., p, and define the following matrices: = k (l) ij 1 2 k ijl, for A (l) = {a (l) ij }; i, j, l = 1, 2,..., p A = [A (1) A (2) A (p) ]. Cordeiro and Klein (1994) show that the expression for the O(n 1 ) bias of ˆθ can be re-written as: Bias(ˆθ) = K 1 A vec(k 1 ) + O(n 2 ). (2.3) A bias-corrected MLE for θ can then be obtained as: θ = ˆθ ˆK 1 Â vec( ˆK 1 ), (2.4) where ˆK = (K) ˆθ and Â = (A) ˆθ. It can be shown that the bias of θ will be O(n 2 ). It is crucial to note that (2.3) and (2.4) can be evaluated even when the likelihood equation does not admit a closed-form analytic solution, so that the MLE has to be obtained via a numerical solution. For this reason, this methodology is very useful for the Generalized Rayleigh distribution. Applying this procedure to establish the bias of the MLEs for some well-known distributions, such as the normal distribution, is very helpful. Then, some applications are illustrated in the next section.

21 Motivating Examples First of all, we would like to apply the Cox-Snell methodology in the context of a distribution with a single parameter, the binomial distribution for instance. Suppose there are n independent Bernoulli trials and the probability of success is θ. Then the likelihood function of this binomial distribution is n P r(x = x) = θx (1 θ n x ) = L(θ x); x = 0, 1, 2,..., n; 0 < θ < 1. x The log-likelihood function is l(θ) = const. + xln(θ) + (n x)ln(1 θ). According to the methodology introduced by Cox and Snell, we need to find the first, the second and the third derivatives of the log-likelihood function, which are: l/ θ = (x/θ) (n x)/(1 θ) 2 l/ θ 2 = (x/θ 2 ) (n x)/(1 θ) 2 3 l/ θ 3 = (2x/θ 3 ) 2(n x)/(1 θ) 3.

22 14 Next the information matrix K and matrix A should be worked out: k 11 = n/[θ(1 θ)] K = k 11 = n/[θ(1 θ)] K 1 = θ(1 θ)/n k 111 = 2n(1 2θ)/[θ 2 (1 θ) 2 ] k (1) 11 = n(1 2θ)/[θ 2 (1 θ) 2 ] a 11 = n(1 2θ)/[θ 2 (1 θ) 2 ] 0.5[2n(1 2θ)]/[θ 2 (1 θ) 2 ] = 0. Since the matrix A = {a 11 } = 0, according to the methodology applied in this these, the bias of MLE ˆθ is Bias(ˆθ) = K 1 Avec(K 1 ) = 0. On the other hand, as is well known the MLE of θ in the binomial distribution is ˆθ = x/n. This MLE is exactly unbiased. We could state this as E(ˆθ) = E(x)/n = nθ/n = θ and bias = E(ˆθ) θ = 0. This is exactly the same as the bias solution calculated by the method introduced by Cox and Snell. As a second example, we are going to investigate the bias of the MLE for the parameters of a normal distribution. Suppose that X is normally distributed. The data are independently and identically distributed. Then solve (2.1) according to the log-likelihood function is l(µ, σ 2 ) = (n/2)ln(2π) nln(σ 2 )/2 n i=1 (x i µ) 2 /2σ 2, and we get the information matrix n/σ 2 0 K =. 0 n/2σ 4 So vec(k 1 ) = (σ 2 /n, 0, 0, 4σ 4 /n). The next thing is to determine the matrix A.

23 15 A (1) = 0 n/2σ 4 n/2σ 4 0, A(2) = n/2σ , and A = 0 n/2σ 4 n/2σ 2 0 n/2σ Through the expression for the bias of ˆθ to O(n 1 ) illustrated by Cox-Snell, as well as Cordeiro-Klein, we obtain: ˆµ Bias = 0 K 1 Avec(K 1 ) =. ˆσ 2 σ 2 /n From the result above, we can see that the bias of ˆµ is zero and that of ˆσ is σ 2 /n, which coincides with the known properties of the MLE s for the normal distribution. Again, we can note that this result was obtained without needing to be able to write down the expressions for the MLE s themselves in closed form. If we work on the bias-adjusted estimator of σ 2, σ 2 = ˆσ 2 ( ˆσ 2 /n) = (n + 1)ˆσ 2 /n, and Bias( σ 2 ) = σ 2 /n 2. Correcting for the O(n 1 ) bias yields an estimator that is biased O(n 2 ). On the other hand, of course, in this particular example, we also know how to eliminate the bias in ˆσ 2 completely. We just use the estimator nˆσ 2 /(n 1). 2.3 Bias of the MLE for Parameters in Generalized Rayleigh Distribution The preceding has been concerned with obtaining a closed form solution for the O(n 1 ) bias function introduced by Cox and Snell. It works well for the binomial

24 16 distribution and the normal distribution as discussed in Section 2.2. Now it is appropriate to spend time on our problem - the bias function of the Generalized Rayleigh distribution. Suppose X is a random variable following the Generalized Rayleigh distribution. From Section 1.1, the likelihood function, based on a sample of n independent observations, is: n ( ) ( 2θ k+1 n ) 2k+1 L = f(x i ; θ, k) = x i exp{ θ Γ(k + 1) i=1 i=1 Then the corresponding log-likelihood function is: n x 2 i }. i=1 n l = nlog(2) + n(k + 1)logθ nlogγ(k + 1) + (2k + 1) logx i θ i=1 n x 2 i. (2.5) i=1 Recalling the procedure of working out the expression of the O(n 1 ) bias in section 2.1, the next thing we need to do is to solve the joint cumulants of the derivatives and the derivatives of cumulants showed in (2.1) and (2.2). First we find out the higher order of the derivatives for (2.5). These are as: l θ = n(k + 1) θ n x 2 i ; i=1 2 l + 1) = n(k θ2 θ 2 ; 3 l 2n(k + 1) = θ3 θ 3 ; l k = nlogθ n( 1 n k + Ψ(k)) + 2 logx i ; i=1 2 l k = n( 1 2 k + Ψ (1)(k)) ; (2.6) 2 3 l k = n( 2 3 k + Ψ (2)(k)), 3 where the Ψ(k) is the usual digamma function:

25 17 Ψ(k) = dlogγ(k)/dk = γ ξ(j + 3)( (k 1)) j, where ξ(s) = j=1 (n s ) is the Riemann zeta function and γ = lim n ξ[( u=1 1/k) j=1 log(n)] = is the Euler-Mascheroni constant. In what follows we also need the trigamma and tetragamma functions, these being Ψ (i) (k) = d i logψ(k)/dk i ; i = 1, 2. Second, we need to determine the cross derivatives of the two parameters in the generalized Rayleigh distribution. These are follows: 2 l θ k = n 2 l ; θ k θ = n θ ; 3 l θ 2 k = n 3 l ; θ 2 k 2 θ = 0 ; (2.7) 3 l θ k = 0 ; 3 l 2 k θ = n 2 θ ; 2 3 l θ k θ = n 3 l ; θ 2 k θ k = 0. Based on the higher-order derivatives in (2.6) and (2.7), we note that there are no observations, x s, i.e. we avoid the calculation of expectations. This shows us that the answers for the joint cumulants of the derivatives and the derivatives of cumulants shown in (2.1) and (2.2) are the same as the higher-order derivatives in (2.6) and (2.7) as follows:

26 18 k 11 = n(k+1) θ 2 The information matrix is and So, to O(n 1 ), k 12 = k 21 = n θ k 22 = n ( 1 k 2 + Ψ (1) (k) ) k 111 = k (1) 11 = 2n(k+1) θ 3 k 112 = k 121 = k 211 = k (1) 12 = k (1) 21 = k (2) 11 = n θ 2 k 122 = k 212 = k 221 = k (1) 22 = k (2) 12 = k (2) 21 = 0 k 222 = k (2) 22 = n ( 2 k 3 + Ψ (2) (k) ). n(k+1) θ 2 n θ K = n n ( 1 + Ψ θ k 2 (1) (k) ), A = n 2(k+1) θ 3 θ 2 θ 2 2 n 0 0 ( 2 + Ψ θ k 3 (2) (k) ). Bias(ˆθ) = θ{[2(k + 1)( 1 k 2 + Ψ (1) (k)) 3]( 1 k 2 + Ψ (1) (k)) (k + 1)( 2 k 3 + Ψ (2) (k))} 2n[(k + 1)( 1/k 2 + Ψ (1) (k)) 1] 2, (2.8) Bias(ˆk) = (k + 1)[ 1/k2 + Ψ (1) (k) (k + 1)(2/k 3 + Ψ (2) (k))] 2 2n[(k + 1)( 1/k 2 + Ψ (1) (k)) 1] 2. (2.9) Bias-adjusted estimators are then obtained as

27 19 θ = ˆθ ˆθ{[2(ˆk + 1)( 1ˆk2 + Ψ (1)(ˆk)) 3]( 1ˆk2 + Ψ (1)(ˆk)) (ˆk + 1)( 2ˆk3 + Ψ (2)(ˆk))} 2n[(ˆk + 1)( 1/ˆk 2 + Ψ (1) (ˆk)) 1] 2, (2.10) k = ˆk (ˆk + 1)[ 1/ˆk 2 + Ψ (1) (ˆk) (ˆk + 1)(2/ˆk 3 + Ψ (2) (ˆk))] 2 2n[(ˆk + 1)( 1/ˆk 2 + Ψ (1) (ˆk)) 1] 2. (2.11)

28 20 Chapter 3 Numerical Evaluations In this chapter, we investigate the sampling properties of the bias-adjusted estimators shown in (2.8) and (2.9) within the Generalized Rayleigh distribution family. Our primary objective is to compare the biases and mean squared errors (MSE) of the maximum likelihood estimators and the bias-corrected maximum likelihood estimators. In addition, the analysis of the bias and MSE is developed for four specific special cases of the Generalized Rayleigh distribution family; i.e., the Rayleigh distribution, the Maxwell distribution, the Chi distribution, and the half-normal distribution, which are described in (1.2), (1.3), (1.4), and (1.5). To get a good sense on the performance of the bias-adjusted estimators showed in in (2.8) and (2.9), we will apply two well known simulation methods to construct the comparison. One is the Monte Carlo experiment. The other is the bootstrap technique. The rest of this chapter is organized as follows. In Section 3.1, we provide two detailed algorithm designs for both the Monte Carlo experiment and the bootstrap experiment. In Section 3.2, we analyze the differences between the biases and MSEs of the maximum likelihood estimators and the bias-corrected maximum likelihood estimators in a Monte Carlo experiment. Moreover, those estimators are also compared

29 21 with bias-adjusted estimators constructed using the bootstrap estimate of the bias. 3.1 Simulation design We now illustrate the iteration algorithms for both the Monte Carlo experiment and the bootstrap experiment. The basic idea is to compare the bias and MSE of the MLE and bias-corrected estimators for each of four different distributions in the Generalized Rayleigh distribution family described in Section 1.1 by two different simulation experiments. We start with the Monte Carlo experiment. The objective here is to apply the analytic bias-corrected estimator to a randomly generated sample based on a certain distribution function in the Monte Carlo simulation. In addition, we are also interested in how sensitive the estimation results are in different samples generated by different parameter values in a particular distribution, as well as with different sample sizes. To implement this, we first determine the sample sizes of 20, 30, 50, 100, and 200 observations. Then we need to select the parameter values for each of four different distributions in the Generalized Rayleigh distribution family to generate the random samples. We start with the Rayleigh distribution. Figure 3.1 is the density plot of the Rayleigh distribution with some parameter values, λ = 0.5, 1, 2 and 4. In the Generalized Rayleigh distribution family, parameter λ is regarded as θ = 1/2λ 2, and in addition, k = 0. We can see that when λ gets bigger and bigger, the modes of the density functions get smaller and smaller and the shape of those become flatter and flatter. Based on these results, we determine the parameter value of λ less than four, as well as λ > 0 (see the definition of the Rayleigh distribution in (1.2)). Otherwise, the mode of the density is too small, smaller than 0.2, and its shape is too flat to see the differences. Thus, we decide upon λ = 0.5, 1, 2 and 4 as the true

30 22 parameter values for the Rayleigh distribution to generate the random samples in the Monte Carlo simulations. Using the notation of parameters in the Generalized Rayleigh distribution family, the corresponding parameter setting for the Rayleigh distribution will be θ = 2, 0.5, 0.125, and Then we ran the simulation to test the sensitivity of the results with different values of the parameters. In addition, we only construct the sensitivity test with sample size 50 to reduce the computational expenses and to get more reliable comparisons. This is also applied with the other three distributions in the Generalized Rayleigh distribution family. pdf(x) lambda=0.5 lambda=1 lambda=2 lambda= x Figure 3.1: The Rayleigh probability density plot with different parameter values: λ = 0.5, 1, 2 and 4. Similarly, we decided on the parameter values of the other three distributions in the Generalized Rayleigh distribution family in the same way as for the Rayleigh distribution. For example, in Figure 3.2 the shape of the density of the Maxwell distribution differs substantially when λ = 0.5,1, and 2, i.e. θ = 2, 0.5, and When θ = 2, the density of Maxwell distribution is U shaped, while it turns out to be monotonically decreasing when θ = 0.5 and When θ = 0.125, the shape

31 23 of the density becomes very flat. We choose the parameter values on the basis of the same considerations as in the case of the half normal distribution. pdf(x) lambda=0.5 lambda=1 lambda= x Figure 3.2: The Maxwell probability density plot with different parameter values: λ = 0.5, 1 and 2. For the Chi distribution, we have to decide on values for two parameters. One is the degrees of freedom parameter, a (or corresponding to the earlier notation in the Generalized Rayleigh distribution family, k = a/2 1). The other one is τ (or corresponding to the earlier notation θ = 1/2τ 2 ). When a = 1, the density functions are monotonic decreasing, while these turn out to be U shaped when a > 1. The location of these density functions moves to the right when a gets bigger. On the other hand, the density becomes flatter and flatter when τ gets bigger. The biggest number for τ is when the shape becomes almost horizontal with very little change, i.e. τ = 0.5, 1, 2 and 3. More details of the parameter setting and sample sizes determination are shown in Table 3.1 for each of four distributions in the Generalized Rayleigh distribution family that we consider. To prepare for the Monte Carlo iteration in subsequent simulations, we include in

32 24 pdf(x) sigma=0.5 sigma=1 sigma=2 sigma= x Figure 3.3: The Half Normal probability density plot with different parameter values: σ = 0.5, 1, 2 and 3. (a) (b) pdf(x) df=1 tau=0.5 df=1 tau=1 df=1 tau=2 df=1 tau=3 pdf(x) df=3 tau=0.5 df=3 tau=1 df=3 tau=2 df=3 tau= x x Figure 3.4: The Rayleigh probability density plot with different parameter values: τ = 0.5, 1, 2 and 3 in both (a) and (b); a = 1 in (a) and a = 3 in (b).

33 25 Table 3.1: Parameter setting Distribution sample size k θ n obs = 20 λ = 1 n obs = 30 λ = 1 Rayleigh n obs = 50 k = 0 θ = 1 λ = 0.5, 1, 2, and 4 Maxwell Half-normal Chi n obs = 100 λ = 1 n obs = 200 λ = 1 n obs = 20 λ = 1 n obs = 30 λ = 1 n obs = 50 k = 1 θ = 1 2 λ = 0.5, 1, 2 n obs = 100 λ = 1 n obs = 200 λ = 1 n obs = 20 σ = 1 n obs = 30 σ = 1 n obs = 50 k = 1 θ = 1 2 σ = 0.5, 1, 2, and 3 n obs = 100 σ = 1 n obs = 200 σ = 1 n obs = 20 k = a 1 and a = 1 2 τ = 1 n obs = 30 k = a 1 and a = 1 2 τ = 1 n obs = 50 k = a 1 and a = 1 or 3 2 θ = 1 2τ 2 τ = 0.5, 1, 2 or 3 n obs = 100 k = a 1 and a = 1 2 τ = 1 n obs = 200 k = a 1 and a = 1 2 τ = 1 2λ 2 2λ 2 2σ 2 Table 3.1 the k values, θ values (the values of parameters in the Generalized Rayleigh distribution family) and n obs values (sample sizes). Then we need to compute the bias and the mean squared error (MSE) based on different samples with different parameter setting, with different sample sizes in each Monte Carlo simulation. The actual biases of MLE s are undertaken using the maxlik package (Toomet, 2008) for the R statistical software environment (R, 2008). On the other hand, the biases of the bias-corrected estimators are computed by applying (2.8) and (2.9). Note that the bias expressions in (2.8) and (2.9) are valid only to

34 26 O(n 1 ). As shown in Table 3.1, there are thirty five Monte Carlo simulations. Each simulation is designed as following: Step 1: Parameter setting: decide the values of parameter, θ and k, for each particular distribution in the Generalized Rayleigh distribution family. Step 2: Generate each particular distribution using the inversion method in R with certain sample sizes, n obs, as described in Appendix C. Step 3: Compute the MLE using the maxlik package, i.e. ˆθ and ˆk, as well as MSE, i.e. MSEˆθ = (ˆθ θ) 2 and MSEˆk = (ˆk k) 2. Step 4: Apply (2.8) and (2.9) to calculate the bias of k and θ with the maximum likelihood estimators in Step 3. Then the corresponding analytic bias-corrected maximum likelihood estimators are computed as in (2.10) and (2.11), denoted as θ and k. On the other hand, the MSE of the analytic bias-corrected maximum likelihood estimators is calculated as the square of the difference of the analytic bias-corrected maximum likelihood estimators and the original parameters specified in Step 1, i.e. MSE θ = ( θ θ) 2 and MSE k = ( k k) 2. Step 5: Repeat Step 2 to 4 a total of 200,000 times. Step 6: Calculate the asymptotic bias of MLE by subtracting the average of MLE s in Step 2 and the original parameters in Step 1, and then calculate the MSE of the MLE as the average of MSE s in Step 3. Besides the Monte Carlo experiment, we want to build a comprehensive interpretation of the bias reduction introduced by Cox and Snell. A good way to do that is to compare the results found by Monte Carlo simulation illustrated with those derived

35 27 Table 3.2: Parameter setting for bootstrapping Distribution sample size k θ Rayleigh Maxwell Half-normal Chi n obs = 20 k = 0 θ = 1 λ = 1 2λ n 2 obs = 50 λ = 1 n obs = 20 k = 1 θ = 1 λ = 1 2 2λ n 2 obs = 50 λ = 1 n obs = 20 k = 1 θ = 1 σ = 1 2 2σ n 2 obs = 50 σ = 1 n obs = 20 k = a 2 1, a = 1 θ = 1 2τ 2 τ = 1 n obs = 50 k = a 2 1, a = 1 τ = 1 by bootstrapping simulation. Bootstrapping allows one to gather many alternative versions of the single statistic that would ordinarily be calculated from one sample. The basic idea is to use the bootstrapping method to calculate the finite sample biases and MSE s for four different distributions in the Generalized Rayleigh distribution family with the same consideration of determining values of parameters and the sample sizes. The same consideration of setting values of parameters makes the comparison comparable and reliable. To simplify the comparisons and reduce the computational expenses, the sample sizes for each distribution are only 20 and 50, while the shape parameter of each is only as unity, implying θ = 0.5. In particular, the parameter of degrees of freedom for the Chi distribution is taken as unity; implying k = 0.5. The details are provided in Table 3.2. The key point is computing the bias by recomputing the statistic when sampling a number of equal size resamples with replacement of the original dataset. Then each of the bootstrapping simulations is designed as following: Step 1: Generate certain distributions using the inversion method in R with values of the parameters as in Table 3.2.

36 28 Step 2: Estimate the MLE using the maxlik package, i.e. ˆθ and ˆk, as well as MSE; which is the same as in Step 3 of the Monte Carlo simulation. Step 3: Bootstrap a sample with replacement from the sample generated in Step 1. Step 4: Estimate the MLE for the sample generated in Step 4, i.e. ˆθ and ˆk. Step 5: Repeat Steps 3 and 4 for n boot = 1000 times. Step 6: Compute the bootstrap bias, i.e. Bias θbs = ˆθ /n boot ˆθ and Bias kbs = ˆk /n boot ˆk. Therefore, the bootstrap-bias-corrected estimators are calculated as ˇθ = ˆθ Bias θbs and ǩ = ˆk Bias kbs. On the other hand, the MSE s of the bootstrap-bias-corrected estimators are calculated as MSEˇθ = (ˇθ θ) 2 and MSE ǩ = (ǩ k)2. Step 7: Repeat Steps from 1 to 6, 200,000 times. Step 8: Calculate the bias of the MLE by subtracting the average of ˇθ (ǩ) and the original values of parameters chosen in Step 1. In addition, calculate the MSE of the MLE as the average of MSEs in Step 6. In these two experiments, the Monte Carlo experiment is analytical with a closed form expression for the bias, while the bootstrapping experiment is numerical and computes the bias by resampling the original dataset. The bootstrapping experiment is computationally more expensive. However, it is the only choice if we do not have the analytical solution for the bias. Besides, it provides the comparable analysis in theoretical evaluations to get a sense on the absolute performance of the Monte Carlo experiment.

37 Simulation results We describe the iteration of two numerical experiments, the Monte Carlo experiment and the Bootstrapping experiment. In the present section, we report the findings of these two experiments. The simulation results are shown in the separate tables for each distribution. To make the results easy to compare, the results of both the Monte Carlo simulation and bootstrapping simulation of the same distribution are displayed in one table. In addition, since we are interested in the sensitivity of the parameters for each distribution in the Generalized Rayleigh distribution family, the simulation results will be displayed in one table if they are computed by different values of the parameters in the same distribution. We begin with the Rayleigh distribution. The values of the parameter are determined as in Tables 3.1 and 3.2 for the Monte Carlo experiment and the Bootstrapping experiment. The analytic bias-adjusted estimators obtained in the Monte Carlo experiment are computed from a total 200,000 replications, denoted as θ. The bootstrap bias-adjusted estimators obtained in the Bootstrapping experiment are computed from a total 200,000 replications with 1,000 repeated samples in each replication, denoted as θ. In addition, the MLE is denoted as ˆθ. Recall that the parameter k in the Rayleigh distribution equals zero as in (1.2). Several joint cumulants of the derivatives or derivatives of cumulants would then be undefined. For example, the first order partial derivative with respect to k is undefined in (2.6), since k equals to zero. Therefore, we can t apply (2.10) and (2.11) to compute the bias-corrected estimator θ for the Rayleigh distribution. A special deductive procedure is explained in Appendix A to calculate the bias-corrected estimator. These special deductive equations are applied in the Monte Carlo simulation for the Rayleigh distribution. Table 3.3 shows the percentage biases and MSEs (in square brakets) of the MLE, the analytic bias-adjusted estimator and the bootstrap bias-adjusted estimator of the

38 30 Table 3.3: Percentage biases and MSE s of the Rayleigh distribution n ˆθ θ θ (a): λ = 0.5 or θ = [0.4991] [0.5008] [0.5762] (b): λ = 1 or θ = [1.2521] [1.2638] [1.3237] [0.8339] [0.8394] [0.8407] [0.4992] [0.5012] [0.5271] [0.2480] [0.2485] [0.2512] [0.1252] [0.1254] [0.4976] (c): λ = 2 or θ = [0.5013] [0.5033] [0.5003] (d): λ = 4 or θ = [0.4978] [0.4996] [0.4907]

39 31 Table 3.4: Percentage biases and MSE s of the Maxwell distribution n ˆk k k ˆθ θ θ (a): k = 0.5, λ = 0.5 or θ = [ ] [ ] [ ] [6.3404] [5.1692] [5.1912] (b): k = 0.5, λ = 1 or θ = [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [9.3493] [ ] [ ] [ ] [ ] [6.3250] [5.1539] [5.2091] [ ] [ ] [ ] [2.7152] [2.4441] [2.4406] [8.0070] [7.6173] [7.6588] [1.2543] [1.1889] [1.2001] (c): k = 0.5, λ = 2 or θ = [ ] [ ] [ ] [6.3073] [5.1391] [5.1857]

40 32 Table 3.5: Percentage biases and MSE s of the Half-Normal distribution n ˆk k k ˆθ θ θ (a): k = 0.5, σ = 0.5 or θ = [3.5173] [2.9658] [2.9424] [9.6729] [7.6740] [7.6818] (b): k = 0.5, σ = 1 or θ = [ ] [8.5404] [8.4679] [ ] [ ] [ ] [6.9887] [5.2710] [5.2178] [ ] [ ] [ ] [3.5124] [2.9627] [7.7405] [9.6484] [7.6661] [7.7405] [1.5457] [1.4178] [1.4084] [4.0349] [3.5847] [3.5852] [0.7285] [0.6975] [0.6987] [1.8427] [1.7348] [1.7523] (c): k = 0.5, σ = 2 or θ = [3.5429] [2.9847] [2.9421] [9.7361] [7.7241] [7.6813] (d): k = 0.5, σ = 3 or θ = 1/ [3.5242] [2.9692] [2.9423] [9.7866] [7.7700] [7.6817]

41 33 Table 3.6: Percentage biases and MSE s of the Chi distribution n ˆk k k ˆθ θ θ (a): a = 1 or k = 0.5, λ = 0.5 or θ = [3.4970] [2.9519] [2.9679] [9.7082] [7.7049] [7.7230] (a): a = 3 or k = 0.5, λ = 0.5 or θ = [ ] [ ] [ ] [6.3204] [5.1442] [5.2102] (b): a = 1 or k = 0.5, λ = 1 or θ = [ ] [8.5651] [8.4769] [ ] [ ] [ ] [6.9467] [5.2307] [5.1906] [ ] [ ] [ ] [3.5207] [2.9654] [2.9729] [9.8263] [7.7870] [7.7815] [1.5475] [1.4181] [1.4278] [4.0359] [3.5858] [3.5942] [0.7280] [0.6977] [0.6989] [1.8415] [1.7354] [1.7425] (b): a = 3 or k = 0.5, λ = 1 or θ = [40.13] [ ] [ ] [6.3887] [5.1946] [5.1907] (c): a = 1 or k = 0.5, λ = 2 or θ = [3.5103] [2.9620] [2.9615] [9.7865] [7.7620] [7.8086] (c): a = 3 or k = 0.5, λ = 2 or θ = [ ] [ ] [ ] [6.4147] [5.2243] [5.1769] (d): a = 1 or k = 0.5, λ = 3 or θ = 1/ [3.5385] [2.9782] [2.9954] [9.6563] [7.6562] [7.7047] (d): a = 3 or k = 0.5, λ = 3 or θ = 1/ [ ] [ ] [ ] [6.3260] [5.1557] [5.1846]

42 34 Rayleigh distribution. The percentage biases are the percentage ratios of the biases computed based on the simulations and the values of parameter specified in Tables 3.1 or 3.2. Biases here are the difference between the bias-adjusted estimates computed from the simulations and the values of parameter specified in Tables 3.1 or 3.2. The percentage MSEs are the percentage ratios of the MSEs computed from the simulations and the squared parameter values specified in Tables 3.1 or 3.2; i.e., 100 (MSE/k 2 ) or 100 (MSE/θ 2 ). In Table 3.3, we can see that the percentage biases of the MLEs are much larger than those of both the analytic bias-adjusted estimates and the bootstrapped biasadjusted estimates, θ and θ. All the percentage biases of the MLEs for the Rayleigh distribution are less than 1 percent. The percentage biases of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are generally reducing by at least one order of magnitude in all cases. In case (b), the absolute values of percentage biases of MLEs monotonically decline as the sample size increases because of the consistency of the MLEs. The analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates perform well too. In the cases listed in Table 3.3, their percentage biases are smaller than those of MLE s by at least one order of magnitude. On the other hand, the percentage MSEs represented in Tables 3.3 are very close to each other. We can see that the percentage MSEs of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are a little larger than those of the MLEs. Recalling the definition of the MSE, MSE = variance + bias 2, the dramatic decrease of biases we discussed in the last paragraph comes at the expense of increased variance for the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates. For the remaining three distributions in the Generalized Rayleigh distribution

43 35 family, the values of the parameters are also determined as in Tables 3.1 and 3.2. The procedures of the Monte Carlo experiment and the Bootstrapping experiment follow the steps illustrated in Section 3.1. The corresponding MLE, the analytic biasadjusted estimates and the bootstrapped bias-adjusted estimates are still denoted as ˆk, k and k (ˆθ, θ and θ). Tables 3.4, 3.5 and 3.6 display the percentage bias and MSE for all MLE, bias-adjusted estimators calculated by (2.10) and (2.11), and bias-adjusted estimators calculated by the bootstrapping method for the Maxwell, Half-Normal, and Chi distributions. Among Tables 3.4, 3.5 and 3.6, we reach conclusions to what we find for the Rayleigh distribution. First, the analytic bias correction and the bootstrap bias correction perform extremely well in all cases, in most cases reducing the percentage biases by at least an order of magnitude. Especially, the analytic bias-adjusted estimates decline by as much as two orders of magnitude in most cases. Second, these gains come at the cost of increases in variance, as evidenced by the very small differences in the percentage MSEs. Even though the MSEs of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are a little bit less than those of MLEs in most cases, we can still see that the variance is increasing since the bias is negligible in percentage terms. Among these three distributions, the percentage bias of MLE k in the Maxwell distribution with sample size 20 is the largest, around 46 percent. The biases of analytic bias-adjusted estimates are reducing by three orders of magnitude for k with sample size 50, while they are reducing by two order of magnitude for every parameter values for θ. In Table 3.4 case (b), under certain parameter values, when sample size increase from 20 to 200, the percentage biases of ˆθ are more stable than those of ˆk. Those of k are much more stable than those of ˆk since the percentage bias of k are at least reducing two order of magnitude. In addition, those of θ are slightly more

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence