Chapter 7 - Section 8, Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd

Size: px

Start display at page:

Download "Chapter 7 - Section 8, Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd"

Scott Powers
5 years ago
Views:

1 References Chapter 7 - Section 8, Mris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd Edition, Addison-Wesley, Boston. Chapter 5 - Section 1.3, Bernard W. Lindgren, Statistical They, 3 rd Edition, MacMillan, New Yk. Properties of the Sce Equations The expected value of the sce equations is zero. Since the probability of the entire sample space is one, we have: If(X)dX = 1 where the integral denote an n-dimensional integral over the sample space of X. Note that f(x) is a function of 2. Differentiating with respect to the vect 2 gives M[If(X)dX ]/M2 = 0 Assuming that the der of differentiation and integration may be reversed: I[Mf(X)/M2]dX = 0 (The region of integration should not be a function of 2. See Amemiya f details.) The chain rule gives Mlnf(X)/M2 = [f(x)] -1 [Mf(X)/M2] Substituting into the previous equation gives: I[Mlnf(X)/M2]f(X)dX = 0 since Mlnf(X)/M2 = MlnL(2)/M2 I[MlnL(2)/M2]f(X)dX = 0 which states E X S(2) = 0

2 The Infmation Matrix The Infmation Matrix is the variance covariance matrix of the sce. Since E X S(2) = 0, I(2) = E X {[MlnL(2)/M2][MlnL(2)/M2']} by definition. The Infmation Matrix may be expressed in terms of the Hessian matrix of lnl(2). Since the mean sce is zero, we have: I[MlnL(2)/M2]f(X)dX = 0 Differentiating with respect to 2', again assuming the der of differentiation and integration may be reversed, and applying the product rule, gives the (kxk) matrix equality I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][Mf(X)/M2']dX = 0 since Mlnf(X)/M2 = [f(x)] -1 [Mf(X)/M2], I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][Mlnf(X)/M2']f(X)dX = 0 I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][MlnL(2)/M2']f(X)dX = 0 In terms of expectations, E X {[MlnL(2)/M2][MlnL(2)/M2']} = - E X [M 2 lnl(2)/m2m2'] Hence, we have an alternative expression f the infmation matrix. I(2) = - E X [M 2 lnl(2)/m2m2'] = - E X [H(2)]

3 The Cramer-Rao Bound The CR bound establishes a lower bound f the variance of unbiased estimats. Letting t(x) denote an arbitrary unbiased estimat of the parameter 2, we know that E X [t(x)] = 2 f any sample size n and any valid 2. In the continuous case we have It(X)f(X)dX = 2. Differentiating with respect to 2' gives This states that: It(X)[Mf(X)/M2']dX = I K It(X)[Mlnf(X)/M2']f(X)dX = I K. It(X)[MlnL(2)/M2']f(X)dX = I K. E X [t(x)s(2)'] = I K. Since the mean sce is a zero vect, the covariance matrix of any unbiased estimat and the sce is an identity matrix. Thus, the covariance matrix of the stacked vect +, +, * t(x) * is * E t I K * * S(2) * * I K I(2) * where E t denote the covariance matrix of the estimat t(x). If we denote the covariance matrix of the stacked vect by C, then Z'CZ$0 f arbitrary Z 0, since any covariance matrix is positive semi-definite. This inequality must hold f an arbitrary non-zero 2K-vect Z, including +, * W * * - I(2) -1 W * where W is an arbitrary non-zero k-vect. F this choice of Z, the inequality above reduces to

4 Z'CZ = W'[E t -I(2) -1 ]W $ 0 f W 0. That is, the difference between the covariance matrix of an unbiased estimat t(x) and the inverse of the infmation matrix is a positive semi-definite matrix. Since the diagonal elements of a positive semi-definite matrix must be non-negative, the variance of an unbiased estimat is not less than the cresponding diagonal element of the inverse of the infmation matrix. Theem ( Properties of ML Estimates ) Let X represent a random sample from a population with joint density f(x) of known fm (not necessarily nmal). Then subject to certain regularity conditions, ML estimats are consistent, asymptotically efficient, and asymptotically nmal. (Most notable among the regularity conditions, the sample space of X must not depend on 2, and the joint density must be differentiable with respect to 2. Details may be found in Econometrics, Peter Schmidt, 1976, Marcel Dekker, New Yk.)

5 Application Consider a SI sequence of Bernoulli trials. Since the observations are statistically independent, the joint density is the product of the marginal densities. f(x) = A n i=1f(x i ) The log likelihood function is thus lnl(p) = E n i=1[ X i ln(p) + (1-X i ) ln(1-p) ] and the sce equation is MlnL(p)/Mp = E n i=1[(x i /p)-(1-x i )/(1-p)] The ML estimat solves S(p$) = E n i=1[(x i /p$)-(1-x i )/(1-p$)] = 0 Multiplying both sides by p$(1-p$) gives E n i=1[x i (1-p$)-(1-X i )p$] = 0 E n i=1x i - p$ E n i=1x i -n p$ + p$ E n i=1x i = 0 E n i=1x i -n p$ = 0 Thus, p$ = [E n i=1x i /n] = &X, and the ML estimat of p is just the sample mean. We have seen that &X has mean : X and variance F 2 x/n, regardless of the underlying distribution. F the Bernoulli trial, : x = p and F 2 x = p(1-p). Consequently, E(p$) = E(&X) = p and V(p$) = V(&X) = p(1-p)/n. This implies that the ML estimat p$ is an unbiased estimat of p.

6 The Hessian matrix -- a scalar in this case -- is M 2 lnl(p)/mp 2 = -[E n i=1(x i /p 2 ) + E n i=1(1-x i )/(1-p) 2 ] Since E[E n i=1x i ]=np, the infmation matrix is E[-H(p)] = np/(p 2 ) + n(1-p)/[(1-p) 2 ] = n/p + n/(1-p) = n(1-p)/[p(1-p)] + np/[p(1-p)] = n/[p(1-p)] The CR bound f unbiased estimats of p is thus p(1-p)/n. Since p$ is an unbiased estimat of p and has variance that meets the CR bound, p$ is efficient. Application Assume that X i are iidn(:,f 2 ). That is, X~N(:,F 2 I n ). (Note that we are using the same notation to denote the scalar parameter : and the vect of common means :.) The joint density of the random vect X is f(x) = (2B) -n/2 (F 2 ) -n/2 exp[-½ (X-:)'(F 2 ) -1 (X-:)] The log-likelihood function is thus lnl($,f 2 ) = - (n/2)ln(2b) - (n/2)ln(f 2 ) - ½ (X-:)'(F 2 ) -1 (X-:) Note that : enters the log-likelihood function only through the last term. In der to find the sce equations, we will need to find the vect of partial derivatives of (X-:)'(X-:) with respect to :. This is done most easily, by recognizing that (X-:)'(X-:) = E n i=1(x i -:) 2

7 Consequently, M(X-:)'(X-:)/M: = - 2 E n i=1(x i -:) The first sce equation is thus MnL(:,F 2 )/M: = - ½ (F 2 ) -1 [- 2 E n i=1(x i -:)] = (F 2 ) -1 [E n i=1(x i -:)] and the second sce equation is MlnL(:,F 2 )/MF 2 = - (n/2)(f 2 ) -1 + ½(F 2 ) -2 [(X-:)'(X-:)] = - (n/2f 2 ) + (1/2F 4 )[(X-:)'(X-:)] The ML estimats solve S($2)=0. F this problem we have (1/F$ 2 )[E n i=1(x i -$:)] = 0 and -(n/2 F$ 2 ) + (1/2 F$ 4 )[(X-$:)'(X-$:)] = 0 The first sce equation may be solved f $: = E n i=1(x i /n) = &X Given $:, the final sce equation may solved f F$ 2 = (X-$:)'(X-$:)/n. At this point, it is convenient to find the Hessian matrix, the matrix of second partials and cross-partials of the log-likelihood function. The first diagonal element is given by M 2 lnl(:,f 2 )/M: 2 = - (n/f 2 ) The off-diagonal element is given by M 2 lnl(:,f 2 )/M:MF 2 = - (F 2 ) -2 [E n i=1(x i -:)] = - (1/F 4 )[E n i=1(x i -:)]

8 The second diagonal element is M 2 lnl(:,f 2 )/MF 4 = (n/2)(f 2 ) -2 - (F 2 ) -3 [(X-:)'(X-:)] = (n/2f 4 ) - (1/F 6 )[(X-:)'(X-:)] The Hessian matrix is thus given by +, * -(n/f 2 ) -(1/F 4 )[E n i=1(x i -:)] * * * * -(1/F 4 )[E n i=1(x i -:)] (n/2f 4 ) - (1/F 6 )[(X-:)'(X-:)] * The infmation matrix is -E X [H(2)]. Note that E(X i -:) = 0, and E[(X-:)'(X-:)] = nf 2. Thus, the infmation matrix f this problem is +, * (n/f 2 ) 0 * * * * 0 - (n/2f 4 ) + (nf 2 /F 6 ) * which reduces to +, * (n/f 2 ) 0 * * * * 0 (n/2f 4 ) * The CR bound is the inverse of the infmation matrix. Since the infmation matrix f this problem is block diagonal, the CR bound is simply +, * F 2 /n 0 * * * * 0 (2F 4 /n) * We saw earlier that $: = &X ~ N(:,F 2 /n). Since $: is unbiased f : and has variance that meets the CR bound, $: is efficient. We will see later that F$ 2 =(X-$:)'(X-$:)/n has mean [(n-1)/n]f 2 F 2, and consequently, cannot be efficient.

Chapter 6 - Sections 5 and 6, Morris H. DeGroot and Mark J. Schervish, Probability and

References Chapter 6 - Sections 5 and 6, Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd Edition, Addison-Wesley, Boston. Chapter 5 - Section 2, Bernard W. Lindgren, Statistical