2.2.2 Comparing estimators

Size: px

Start display at page:

Download "2.2.2 Comparing estimators"

Jasper Lawrence
5 years ago
Views:

1 2.2. POINT ESTIMATION Comparing estimators Estimators can be compared through their mean square errors. If they are unbiased, this is equivalent to comparing their variances. In many applications, we try to find an unbiased estimator which has minimum variance, or at least low variance. We can compare the variances of several unbiased estimators, but how do we know if we have found an estimator with the lowest variance among all unbiased estimators? The answer is given by Cramér-Rao lower bound (CRLB). The CRLB for unbiased estimators ofφ = g(ϑ), when there is a single parameter, is CRLB(φ) = E { } 2 dg(ϑ) { }, d2 logf(y ;ϑ) 2 where Y = (Y 1,...,Y n ) T and f(y;ϑ) is the joint density function (pdf) of Y 1,...,Y n when Y is continuous. For discrete r.vs we replace pdf with a pmf. E{ d 2 logf(y;ϑ)/ 2 } is called the Fisher information. When there arepparameters, ϑ = (ϑ 1,...,ϑ p ) T, andφ = g(ϑ), the CRLB is ( g CRLB(φ) =,..., g ) M 1, ϑ 1 ϑ p where the matrix M has (i,j)th element E{ 2 logf(y;ϑ)/ ϑ i ϑ j } and is called the Fisher Information Matrix. g ϑ 1. g ϑ p For the following result, we need to assume that the sample space does not depend on ϑ 1,...,ϑ p, and that the differentiability and continuity constraints for all ϑ Θ hold, so that the order of integration and differentiation can be interchanged, and the Fisher Information is positive and finite. Problems which satisfy these conditions are called regular. Theorem 2.4. Cramér-Rao inequality. For any unbiased estimator,t(y ), ofφif the estimation problem is regular then var{t(y)} CRLB(φ).

2 88 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE Proof. For simplicity, we only prove the result in the one-parameter, continuous case. The proof uses the result that, for any two random variables U and V, their correlation coefficient ρ(u,v) lies between -1 and +1, that is, {ρ(u,v)} 2 1, which may be written as var(u) {cov(u,v)}2. (2.2) var(v) LetU = T(Y ) and V = dlogf(y;ϑ)/. Then we may write { } dlogf(y ;ϑ) E(V) = E [ d = E f(y;ϑ) ] property of derivative of log f(y;ϑ) d =... f(y;ϑ) f(y;ϑ) f(y;ϑ)dy 1...dy n df(y;ϑ) =... dy 1...dy n = d... f(y;ϑ)dy 1...dy n = d (1) = 0. Next, sincee(v) = 0, we have { } dlogf(y ;ϑ) cov(u,v) = E(UV) = E T(Y) { } d f(y ;ϑ) = E T(Y) f(y ;ϑ) d =... T(y) f(y;ϑ) f(y;ϑ) f(y;ϑ)dy 1...dy n =... T(y) df(y;ϑ) dy 1...dy n = d... T(y)f(y;ϑ)dy 1...dy n = d dg(ϑ) E{T(Y)} =.

3 2.2. POINT ESTIMATION 89 Finally, again sincee(v) = 0, we have var(v) = E(V 2 ) [ {dlogf(y;ϑ) } ] 2 = E { d dlogf(y;ϑ) = E f(y;ϑ) } f(y;ϑ) { d dlogf(y;ϑ) =... f(y;ϑ) } f(y;ϑ)dy 1...dy n f(y;ϑ) dlogf(y;ϑ) df(y;ϑ) =... dy 1...dy n. Now, since we can write d 2 f = d 2 ( f dlogf ) = df dlogf logf +fd2, 2 we obtain d 2 f(y;ϑ) var(v) =... dy dy n... f(y;ϑ) d2 logf(y;ϑ) dy dy n { } = d2... f(y;ϑ)dy dy n +E d2 logf(y;ϑ) { } { 2 } = d2 2(1)+E d2 logf(y;ϑ) = E d2 logf(y;ϑ). 2 2 The required result then follows immediately from (2.2). Example Suppose thaty 1,...,Y n are independentpoisson(λ) random variables. Let g(λ) = λ. Then we have dg/dλ = 1. Here we have a discrete r.v. whose joint pmf can be written as the following product P(Y 1 = y 1,...,Y n = y n ;λ) = n λ y i e λ y i! = λ n y i e nλ n y i!,

4 90 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE and so taking log we obtain logp = ( n ) y i logλ nλ log y i!. Thus, we have dlogp n = y i n dλ λ and d 2 logp n = y i. dλ 2 λ 2 Hence, the Fisher information is ( ) E d2 logp dλ 2 ( ) = 1 λ 2E Y i = 1 E(Y λ 2 i ) = 1 λ 2nλ = n λ and CRLB(λ) = λ/n. The efficiency of an asymptotically unbiased estimator,t(y), ofφis eff(t) = lim n CRLB(φ) var{t(y)}. An asymptotically unbiased estimator, T(Y ), of φ is said to be efficient if its efficiency is equal to one. In this case, for large n, var{t(y)} is approximately equal to the CRLB. Exercise 2.5. Suppose thaty 1,Y 2,...,Y n is an independent sample from abernoulli(p) distribution. Calculate the CRLB(p) and see if Y is an unbiased estimator of p with minimum variance. Exercise 2.6. Suppose thaty 1,Y 2,...,Y n is an independent sample fromn(µ,σ 2 ). Derive the Fisher Information Matrix M for ϑ = (µ,σ 2 ) T and find the Cramer- Rao lower bound for an unbiased estimator ofg(ϑ) = µ+σ 2.

5 2.2. POINT ESTIMATION Minimum Variance Unbiased Estimators If an unbiased estimator has the variance equal to the CRLB, it must have the minimum variance amongst all unbiased estimators. We call it the minimum variance unbiased estimator (MVUE) ofφ. Sufficiency is a powerful property in finding unbiased, minimum variance estimators. If T(Y) is an unbiased estimator of ϑ and S is a statistic sufficient for ϑ, then there is a function of S that is also an unbiased estimator of ϑ and has no larger variance than the variance of T(Y). The following theorem formalizes this statement. Theorem 2.5. Rao-Blackwell theorem. Let Y = (Y 1,Y 2,...,Y n ) T be a random sample, S = (S 1,...,S p ) T be jointly sufficient statistics for ϑ = (ϑ 1,...,ϑ p ) T and T(Y ) (which is not a function of S) be an unbiased estimator of φ = g(ϑ). Then, U = E(T S) is a statistic such that (a) E(U) = φ, so thatu is an unbiased estimator of φ, and (b) var(u) < var(t). Proof. First, we note thatu is a statistic. Indeed, sinces are jointly sufficient for ϑ, the conditional distributiony S does not depend on the parameters and so the conditional distribution of a function T(Y ) given S, T S, does not depend on ϑ either. Thus,U = E(T S) is a function of the random sample only, not a function ofϑ, therefore it is a statistic. Next, we will use the known facts about the conditional expectation and variance given Exercise SinceT is an unbiased estimator ofφ, we have E(U) = E[E(T S)] = E(T) = φ. So U is also an unbiased estimator ofφ, which proves (a). Finally, we get var(t) = var[e(t S)]+E[var(T S)] = var(u)+e[var(t S)]. However, since T is not a function of S we have var(t S) > 0, thus, it follows thate[var(t S)] > 0, and hence (b) is proved.

6 92 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE It means that, if we have an unbiased estimator, T, of φ, which is not a function of the sufficient statistics, we can always find an unbiased estimator which has smaller variance, namely U = E(T S 1,...,S p ) which is a function of S. We thus have the following result. Corollary 2.1. MVUEs must be functions of sufficient statistics. Example Suppose thaty 1,...,Y n are independentpoisson(λ) random variables. Then T = Y i, for any i = 1,...,n, is an unbiased estimator of λ. Also, S = n Y i is a sufficient statistic forλand T is not a function ofs. Hence, a better unbiased estimator is given by any ofe(y 1 n Y i),...,e(y n n Y i). Now, sincee(y 1 n Y i) =... = E(Y n n Y i) and E(Y 1 Y i )+E(Y 2 Y i )+...+E(Y n Y i ) = E [ n Y i we have ne(y 1 Y i ) = Y i. ] Y i = Y i, Hence U = E(Y 1 n Y i) = Y is a better unbiased estimator of λ than Y 1 or than any of Y i. In fact, Y is a MVUE of λ as its variance is equal to the Cramer- Rao Lower Bound forλ. (See Example 2.12) Complete sufficient statistics T(Y)is a function of random sampley = (Y 1,Y 2,...,Y n ) and so it is a random variable as well. Hence, we may ask about the distribution oft(y). For example, assume that σ 2 is known equal to σo. 2 Then, to make inference about µ we may reduce the random sample to its mean. We know that T(Y) = Y N ( ) µ, σ2 o n ify i N(µ,σo 2 ) and we may write iid f T (t;µ σ 2 o ) = 1 2πσo e n(t µ)2 /2σ 2 o. Due to this data reduction we can make inference about µ based on the distribution of Y only rather than on the multivariate distribution of the whole random

7 2.2. POINT ESTIMATION 93 sampley. A minimal sufficient statistic reduces data maximally while retaining all the information about the parameter ϑ. We would also like such a statistic to be independent of any so called ancillary functions of the random sample whose distributions do not depend on the parameter of interest. Such an independent statistic is called complete. First, we introduce a notion of a complete family of distributions. Definition 2.8. A family of distributionsp = {P ϑ : ϑ Θ} defined on a common spacey is called complete if for any real measurable functionh(y) E[h(Y)] = 0 implies that P ϑ ( h(y) = 0 ) = 1 for all ϑ Θ. Note: P ϑ ( h(y) = 0 ) = 1 can also be written as Pϑ ( h(y) 0 ) = 0, which means that function h(y) may have non-zero values only on a set B Y such thatp(y B) = 0. Then we say that h(y) = 0 almost surely in Y. Definition 2.9. A statistic T(Y) is called complete for the family P = {P ϑ : ϑ Θ} on Y if the family of probability distributions P T = {P ϑ,t : ϑ Θ} is complete for allϑ, that is, E[h(T)] = 0 implies that P{h(T) = 0} = 1. Example Let Y = (Y 1,...,Y n ) be a random sample from a family of Bernoulli(p) distributions for 0 < p < 1. We will show that T(Y) = n Y i is a complete sufficient statistic forp. Sufficiency The pmf of each Y i isp(y i = y i ) = p y i (1 p) 1 y i and the joint pmf fory can be factorized as follows: n P(Y = y) = p y i (1 p) 1 y i = p n y i (1 p) n n y i 1 Hence, T(Y) = n Y i is a sufficient statistic forp.

8 94 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE Completeness Now, we know that a sum of independent Bernoulli rvs has a Binomial distribution, i.e, T Bin(n,p) for 0 < p < 1, t = 0,1,...,n. Let h(t) be such that E[h(T)] = 0. Then 0 = E[h(T)] = h(t) n C t p t (1 p) n t t=0 = (1 p) n t=0 ( ) t p h(t) n C t. 1 p The factor (1 p) n 0 for anyp (0,1). Thus, it must be that 0 = = ( ) t p h(t) n C t 1 p h(t) n C t r t t=0 t=0 for all r = p > 0. The last expression is a polynomial of degree n in r. 1 p For the polynomial to be zero for all r the coefficients h(t) n C t must all be zero. It means that h(t) = 0 for all t {0,1,...,n}. Since P(T = t,t {0,1,...,n}) = 1 it means that P{h(T) = 0} = 1 for all p. Hence, T is a complete statistic forp. The following theorem gives a connection between complete and minimal sufficient statistics: Theorem 2.6. IfT(Y) is a complete sufficient statistic for a family of distributions with parameterϑ, then T(Y)is a minimal sufficient statistic for the family. Exercise 2.7. Suppose that Y 1,Y 2,...,Y n is a random sample from a Poisson(λ) distribution. Show that T(Y) = n Y i is a complete sufficient statistic forλ. The following Theorem establishes the minimum variance property of complete sufficient statistics.

9 2.2. POINT ESTIMATION 95 Theorem 2.7. Lehmann-Scheffé Theorem. Let Y = (Y 1,Y 2,...,Y n ) T be a random sample. If S(Y ) is a jointly complete sufficient statistic and T(Y) is an unbiased estimator for φ = g(ϑ) then U = E[T S] is, with probability 1, a unique MVUE of φ. Proof. First, to prove that U is a MVUE of g(ϑ), we show that whatever unbiased estimatort(y)we take we obtain the samee[t S], i.e., the sameu. Then, by Rao-Blackwell Theorem, condition (b), U must be MVUE ofg(ϑ). Suppose thatt(y)and T (Y)are any two unbiased estimators ofg(ϑ). Let U = E[T S] U = E[T S]. Then we have E{U U } = E{E[T S] E[T S]} = E(T) E(T ) = g(ϑ) g(ϑ) = 0. Hence, by completeness ofs(y ) we get P [ U ( S(Y ) ) = U ( S(Y ) )] = 1 for allϑ. This proves the first part of the theorem. Now, we will show uniqueness. Suppose that U and T are two MVUE of g(ϑ). Then if T is a function of the sufficient statistics S(Y ) then, as shown above, it must be equal to U. If T is not a function of S(Y ) then var(u) < var(t ), hence T cannot be a MVUE. Hence, U is a unique MVUE ofg(ϑ). Note: Lehmann-Scheffé Theorem may be used to construct MVUE of g(ϑ) by two methods. Both are based on complete sufficient statisticss(y ). Method 1: If we can find a function of S = S(Y ), say U(S) such that E[U(S)] = g(ϑ) thenu(s) is a unique MVUE ofg(ϑ).

10 96 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE Method 2: If we can find any unbiased estimator T = T(Y) of g(ϑ), then U(S) = E[T S] is a unique MVUE ofg(ϑ). Example Method 1. Let Y i iid Bernoulli(p), i = 1,...,n. Earlier we showed that n Y i is a complete sufficient statistic forp. Denote it bys(y ). Y = 1 n n Y i = 1 S(Y ) is an unbiased estimator ofp, hence, as a function of a n complete sufficient statistic, it is the unique MVUE ofp. Now, let g(p) = var(y) = p(1 p). The sample variance 1 n 1 (Y i Y) 2 is an unbiased estimator of g(p). It is in fact a function of the complete sufficient statistics(y) = n Y i. Hence, it is the unique MVUE ofg(p) = p(1 p). Exercise 2.8. Suppose thaty = (Y 1,...,Y n ) T is a random sample from apoisson(λ) distribution. Find a MVUE ofφ = λ 2.

2.3 Methods of Estimation

96 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE 2.3 Methods of Estimation 2.3. Method of Moments The Method of Moments is a simple technique based on the idea that the sample moments are natural estimators