Solution-Midterm Examination - I STAT 4, Spring 0 Prof. Prem K. Goel. [0 points] Let X, X,..., X n be a random sample of size n from a Gamma population with β = and α = ν/, where ν is the unknown parameter. (a) [6 points] Find the Method of Moments estimator of ν, and show that it is unbiased for ν. Note that the Gamma distribution with β = and α = ν/ is same as the Chi-square distribution with ν degrees of freedom, for which E(X) = ν, and V ar(x) = ν (from Appendix C.3). Therefore the MOM estimator ˆν MOM of ν is equal to the first sample moment X, and its expectation E( X) = n E(X i) = E(X), i.e., X is unbiased for the population mean, if it exists. Hence, ˆν MOM unbiased estimator for ν. (b) [4 points] Find the variance of the MoM estimator. If V ar(x) = σ, it is known that V ar( X) = σ. As mentioned in part (a) above, n σ = ν. Therefore, V ar( X) = ν. n (c) [0 points] Find the Maximum Likelihood Estimator of ν. Given ν, the joint density of f(x, X,..., X n ) is given by f(x, X,..., X n ν) = n i= Γ(ν/) (x i ) ν e x i. () Therefore, the log-likelihood function of ν is given by lnl(ν) = nln() nln(γ( ν n ) + (ν ) ln( X i) n X i. On taking the derivative of this function, and setting it equal to zero, we get dγ(α) d dν lnl(ν) = n dα Γ(α) + n ln( X i) = 0, where α = ν/. This expression can be simplfied into the following equation. dγ(α) dα Γ(α) = n n ln( X i). () The MLE ˆν = ˆα is obtained by solving i() numerically for α. For a complete solution, we would check that this solution is a maxima by evaluating the second
derivative of the log-likelihood function at ˆν, i.e., at the solution of the above equation. However, the second derivative of the log-likelihood function is equal to the second derivative of nln(γ( ν ) with respect to ν. If you look into advanced calculus books, you will find that the log-gamma function, lnγ(α), is known to be convex, i.e., its second derivative is positive. Thefore, lnl(α) < 0. For those (α who are anxious to go even deeper: Some of these books give the second derivative of log-gamma function equal to (α + i), which is obviously positive. i=0 (d) [BONUS QUESTION: points] Based on your understanding of the MLE and the MOM for this problem, is the MLE an unbiased estimator of ν? YES/NO. Any explanation! The bonus question was supposed to be a bit of a challenge for some students to think outside the box. The answer is no, MLE is not unbiased. If your answer was NO, you got one bonus point. (Big DEAL!). For the explanation, a simple statment would be that the MOM estimator of α = ν/ is equal to half the arithmatic mean of X, X,..., X n, which is unbiased. Whereas, the MLE ˆν is a solution of (), whose right hand side is equal to the natural log of the geometric mean of X i /. It is known that the geometric mean for any collection of non-negative numbers is less than their arithmatic mean. Of course, one still needs to find the solution of () for ˆν. At this point, recognizing this connection is plenty. Those who answered YES, mostly gave the explanation that the MOM is a oneto-one function of the MLE! Note that the arithmatic mean is NOT a one-to-one function of the geometric mean. For a fixed value of the arithmatic mean of n numbers, the geometric mean can take more than one value. Try it out for several sets of two numbers which have the same arithmatic mean. Their geometric means will not be all equal!. [0 points] Let X, X be a random sample of two Bernoulli trials with parameter p. Show that X + X is a sufficient statistic for p. Explain why the conditional distribution of X X given X + X does not depend on p. First part of this problem is a simpler version of the Example 0.0, for n =, as well as of an assigned Problem in Home Work No., Exercise 0.43, with (n =, n = ). The joint distribution of X, X is given by f(x, x ; p) = f (x ; p) f (x ; p) = p x ( p) x p x ( p) x = p x +x ( p) (x +x ).
3 By factorization theorem (Theorem 0.4), X + X is a sufficient statistic for p because g(x + x ; p) = p x +x ( p) (x +x ), and h(x, x ) =. The second part simply follows from the defintion of sufficient statistic, which says that the conditional distribution of the data X, X, given the sufficient statistic X + X does not depend on the parameter p. Therefore, the conditional distribution of any function of the data, e.g., X X, given the sufficient statistic X + X does not depend on the parameter p. Instead, you could just say that the conditional distribution of data given the sufficient statistics is proportional to f(x, x ; p) g(x + x ; p) = h(x, x ) =, which does not depend on p. Thefore the conditional distribution of any function of data, e.g., X X, given the sufficient satistics, does not depend on p. If you wish, another way to solve this part of the problem is to obtain the conditional probability distribution of X X given X + X directly, using simple probability calculations as follows: Note that there are only four possible outcomes, with probabilty distribution f(, ) = p, f(, 0) = f(0, ) = p( p), f(0, 0) = ( p). Given that, X + X =, only one outcome (, ) satisfies this condition, for which P (X X = 0) =. Given that, X + X = 0, only one outcome (0, 0) satisfies this condition, for which P (X X = 0) =. Given that, X + X =, two outcomes (, 0), (0, ) satisfy this condition, leading to X X = and X X =, respectively. However, each of these two outcomes have equal probability ( p)p. Therefore, conditional on X + X =, P (X X = ) = P (X X = ) = /. Note that, these conditional distributions of X X given a value of the sufficient statistic X + X, do not depend on p, but only on the values of the sufficient statistic X + X itself. 3. [5 points] Let Y, Y,..., Y n be a random sample of size n from a Poisson population with unknown parameter λ. (a) [0 points] Show that the sample mean Ȳ is the minimum variance unbiased estimator of λ. It is known from the method of moments that the sample mean Ȳ is an unbiased estimator of the population mean with variance σ /n. For the Poisson distribution, E(Y i ) = λ, σ = λ. [Appendix B.7.] Hence Ȳ is an unbiased estimator of λ, with variance equal to λ/n.
4 For showing the minimum variance property of Ȳ, Fisher information can be derived as follows: From Appendix B.7, the probability distribution f(y ) of Y is f(y λ) = Y! λy e λ ln f(y ) = ln(y!) + Y ln λ λ ln f(y ) = 0 + Y λ λ ( ) ( ) ( ) Y (Y λ) E ln f(y ) = E λ λ = E = λ λ From the Cramér-Rao inequality, the lowest possible value of the the variance of any unbiased estimator of λ is However, as shown earlier, n Fisher Information = λ n. (3) V ar(ȳ ) = λ n. (4) Since (3) and (4) are identical, Ȳ is an unbiased estimator of λ with variance equal to the minimum possible variance. Thus, it is a minimum variance unbiased estimator of λ by Theorem 0.. (b) [5 points] Consider ˆλ = (Y + Y n )/3 as another unbiased estimator of λ. Find the efficiency of ˆλ relative to Ȳ. In other versions of this problem, the alternate estimator was ˆλ = (Y + Y n )/. So we will work with a general estimator of the form ˆλ = (a Y +a Y n )/(a +a ). Note that E( ˆλ ) = E[(a Y + a Y n )/(a + a )]. Therefore, E( ˆλ ) = (a E(Y ) + a E(Y n ))/(a + a ). Since, E(Y i ) = λ, the right hand side of this expression is equal to λ. Hence ˆλ is unbiased for all values of the coefficients a i s. Now, since Y, Y,..., Y n are independent random variables, it follows that from Corollary 4.3 that V ar( ˆλ ) = (a V ar(y ) + a V ar(y n ))/(a + a ). However, since V ar(y i ) = λ, this expression simplifies to V ar( ˆλ ) = λ a +a (a +a ). Therefore, the efficiency of ˆλ relative to Ȳ is given by V ar(ȳ )/V ar( ˆλ ) = λ/n, which simplifies to (a+a) λ a +a n(a (a +a ) +a). For the special cases in two versions of this problem, your solution would use the specific values of the two coeffiecients from the beginning, ending up with: a =, a = with E = 9 5n, and a =, a = with E = n.
5 4. [5 points] Let X, X,..., X n be a random sample of size n from a Uniform distribution on the interval (0, θ), with unknown parameter θ. Let X (n) = max(x, X,..., X n ). It was shown in the class that the MLE of θ is equal to X (n). Using the methods of Chapter 8, it is known that the sampling distribution of U = X (n) is given by θ g n (u) = nu n, 0 < u <, and 0 otherwise. Show that the MLE is an asymptotically unbiased and consistent estimator of θ. The simplest approach to solving the first problem is to recognize that the sampling density of the random variable U is a Beta distribution, with α = n, β =. It follows from, Appendix C., that E(U) = α α + β = n n +, V ar(u) = αβ (α + β) (α + β + ) = n (n + )(n + ). If one doesn t recognise the conection with the Beta distribution, one must be able to perform simple integration to obtain E(U) = 0 nu n du = n n +, E(U ) = 0 nu n+ du = n n +. Given these expected values, it is easy to find the mean and variance of the random variable U. Now, E(X (n) ) = θe(u) = θ n n+. Therefore, B(θ) = Bias(X (n) ) = E(X (n) θ) = n + θ. Clearly, B(θ) 0 as n. Hence, the MLE X (n) is asymptotically unbiased. Given the distribution of U, the easiest appproach to proving consistency of X (n) is to directly evaluate the probability P ( X (n) θ < ɛ) = P ( U < ɛ ), which does θ require being able to integrate u n. Since U takes values in the interval (0, ), P ( U < ɛ θ ) = P ( ɛ θ < U < ) = nu n du = ( ɛ θ )n. Note that for small value of ɛ, 0 < ( ɛ θ )n <. Now, as n, ( ɛ θ )n 0. Therefore, P ( X (n) θ < ɛ). Hence, X (n) is a consistent estimator of θ. If one wants to avoid doing integrtion, one can use Chebyshev s inequality, P ( X (n) θ > ɛ) MSE(X (n)) ɛ, where MSE(X (n) ) = E[(X (n) θ) ] is the Mean Squared Error of X (n). Now, using the mean and variance of U obtained above, MSE(X (n) ) = V ar(x (n) ) + B (θ) = ɛ θ (n + ) ( n n + + )θ = (n + )(n + ) θ. As n, the MSE(X (n) ) converges to 0. Therefore, by Chebyshev s inequality, P ( X (n) θ > ɛ) 0 as n. Hence, X (n) is a consistent estimator of θ.
6 5. [0 points] Chronic anterior compartment syndrome is a condition characterized by exercise-induced pain in the lower leg. Swelling and impaired nerve and muscle function also accompany the pain, which is relieved by rest. Susan Beckham and colleagues conducted an experiment involving ten healthy runners and ten healthy cyclists to determine if pressure measurements within the anterior muscle compartment differ between runners and cyclists under two conditions, namely resting and after the 80% maximal O consumption. You can assume that the measurements are normally distributed and that under each condition, the variance of the measurements are same for the two groups. The means and standard deviations summary of data, compartment pressure (in millimeters of mercury), for the two groups under these two conditions are given in the following table: Runners Cyclists Condition Mean S Mean S Resting 4.5 3.9. 3.98 80% maximal O consumption. 3.49.5 4.95 (a) [8 points] Construct a 95% confidence interval for the difference in the mean compartment pressures between Runners and Cyclists under the resting condition. Note that in this problem, n = 0 and n = 0 are both < 30, and we can assume normality of the measurements from the two populations, as well as equal variances, so we can apply Theorem.5. First, we compute the pooled estimate of the variance, which can be simplified because n = n = n s p = (n )s + (n )s n + n = s + s = 3.9 + 3.98 = 5.6034 (5) Hence s p = 5.6034 = 3.95, ν = 0 + 0 = 8. If you were asked to find the 95% confidence interval, you must use the cut-off point t 0.05,8 =.0. However, if you were asked to find the 90% confidence interval, you must use the cut-off point t 0.05,8 =.734. The 00( α)% CI for the difference µ µ is given by ( x x ) ± t α/,8 s p + = (4.5.) ± t α/,8 3.95 n n 0 + 0. Therefore, the 95% confidence interval for the difference in the mean compartment pressures between Runners and Cyclists under the resting condition is 3.4±3.7, i.e., ( 0.03 mm, 7. mm). Similarly, the 90% CI for this difference is 3.4 ± 3.06, i.e., (.34mm, 6.46mm).
7 (b) [8 points] Construct a 90% confidence interval for the difference in the mean compartment pressures between Runners and Cyclists under the 80% maximal O condition. Under the 80% maximal O condition, you were asked to calculate either a 90% CI or a 95% CI. In addition, the value of S was either 3.95 or 4.95. The pooled estimates of variance is slighlty different in the two versions. Furthermore, as in part (a), simplified formula for the pooled variance can be used. In the first version, s p = 3.49 +3.95 = 5.89, i.e., s p = 5.89 = 3.77. In the second version, s p = 3.49 +4.95 = 8.34, i.e., s p = 8.34 = 4.86. Note: The pooled estimate s p is a weighted average of two sample standard deviations, it must be between these two numbers. If your calculations produce a value of s p that is either less than the min SD or more than the max SD, you should not use that value and check your calculations. Now, under this condition, x x =..5 = 0.7. On substituting values of s p for each version of data, and appropriate cut-off points in the 00( α)% CI for the difference in the mean compartment pressures between Runners and Cyclists under the 80% maximal O consumption condition ( x x ) ± t α/,8 s p n, the following confidence intervals are obtained. Version : 90% CI (.7 ± (.734)(3.77)., i.e., (., 3.59) Version : 95%CI (.7 ± (.0)(3.77)., i.e., (.80, 4.0). Version : 90% CI (.7 ± (.734)(4.86)., i.e., (.6, 4.0) Version : 95%CI (.7 ± (.0)(4.86)., i.e., ( 3.3, 4.7). (c) [4 points] Consider the intervals contructed in parts (a) and (b) above. How would you interpret the results that you obtained? For the CI s under the Resting condition, at each level of confidence the lower limit of the CI is close to zero (for 90% CI, it is just above zero, and for the 95% CI, it is just below zero). For this condition, I am x% confident that the true value of the diference in mean compartment pressure may almost be the same (or slightly higher) for runners than for cyclists. Since all four confidence intervals for the difference of mean compartment under the 80% maximal O consumption condition contain 0, at each level of confidence, the two groups have similar mean compartment pressures at each level of confidence, and the two versions of the data do not affect the conclusion. The joint interpretation of both the confidence intervals on your version of the exam, will depend slightly on the combination you have out of 4 possible combinations of confidence levels.