STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator of θ. Obtain the maximum likelihood estimator (MLE) ˆθ(x) of θ and show that it is not unique.is any choice of MLE unbiased? (a) Let T (2) 4, T (x) 0 when x 2. Then E[T (X)] 4 θ + 0 (( θ)/4 + θ/2 + /2 + (3 θ)/2) θ 4, which is an unbiased estimator of θ. (b) Denote the MLE ˆθ. Since there s only one observation: ˆθ( 2) arg max(( θ)/4)) 0 ˆθ( ) arg max(θ/2) ˆθ(0) arg max(/2) c (0, ) ˆθ() arg max((3 θ)/2) 0 ˆθ(2) arg max( θ 4 ) Since ˆθ(0) can be multiple values, it s not unique and all the MLE s are biased since E[ˆθ] 0 ( θ)/4 + θ/2 + ˆθ(0) 2 + 0 (3 θ)/2 + θ 4 θ 2. (0 pt) Bickel & Doksum pg 53, problem 2.3.8
(a) It is enough to show that the negative log likelihood function l(θ) log f(x i θ) is a strictly convex function of θ R p. Since it s the sum of the negative log likelihoods for each X j, and a sum of strictly convex functions is strictly convex, it s enough to consider a single observation n. To show the negative log likelihood is strictly convex it is enough to show that its Hessian, the matrix H ij 2 θ i θ j l(θ) of second derivatives, is positive-definite everywhere (except possibly at ˆθ x). Let s compute the necessary derivatives: θ 2 k ) α/2 l(θ) log c α + (x i θ i ) 2 ) (α 2)/2 l(θ) α (x i θ i ) 2 (x k θ k ) θ k ) (α 2)/2 ) (α 4)/2 2 l(θ) α (x i θ i ) 2 + α(α 2) (x i θ i ) 2 (x k θ k ) 2 α (x i θ i ) 2 + (α 2)(x k θ k ) 2 ] 2)(α 4)/2 [ p (x i θ i ) ) (α 4)/2 2 l(θ) α (x i θ i ) 2 (α 2)(x i θ i )(x k θ k ) θ i θ k The Hessian H α x θ α 4 A is the product of a constant positive factor α x θ α 4 and a matrix A whose on- and off-diagonal entries are: A kk p (x i θ i ) 2 + (α 2)(x k θ k ) 2 A kj (α 2)(x k θ k )(x j θ j ) If we introduce the notation (x θ) R p for the vector with components i (x i θ i ), and I p for the p p identity matrix, we can write A in the form A 2 I p + (α 2) The matrix A staisfies A λ with eigenvalue λ 2 (α ), strictly postive since α > (except at 0, i.e., θ ˆθ x, which is okay). The other eigenvectors are orthogonal to, all with eigenvalues λ 2, which are also strictly positive. Thus A is positive-definite and so is the Hessian, H α α 4 A. 2
(b) First consider the case where α in dimension p, with n 2m even. WLOG order the data x x 2 x n. The log likelihood function is given by l(θ) n log c(α) x i θ, a continuous function whose derivative does not exist at the data points θ {x i } and which elsewhere satisfies d dθ l(θ) d dθ x i θ [ ] [ ] ( ) + (+) x i <θ x i >θ The number of x i > θ minus the number of x i < θ (note that the derivative of x θ is on the interval θ (, x), + on the interval θ (x, ), and undefined at the point θ x). Thus l(θ) is increasing when θ < x m, when more than half the {x i } exceed θ, and is decreasing when θ > x m+, when fewer than half the {x i } exceed θ. In the interval x m < θ < x m+ the derivative is zero, so l(θ) is constant there and equal to its maximum value. In case n 2m is odd, the same argument shows that l(θ) achieves a unique maximum at the median x m. In dimension p > 2 a similar argument holds, only ˆθ now should be any value in the median rectangle (or median block ) where each of its components is a median of the corresponding components of the x i s. 3. (0 pt) Let (X,, X n ) be a random sample from the uniform distribution on the interval (θ 2, θ + 2 ), where θ R is unknown. Let X (j) be the j th order statistic. (a) Show that (X () + X (n) )/2 is strongly consistent for θ, i.e., that lim n (X () + X (n) )/2 θ a.s. (b) Show that X n : (X () + + X (n) )/n is L 2 consistent. solution: (a) For any ɛ > 0, P ( X () (θ 2 ) > ɛ) P (X () > ɛ + (θ 2 )) [P (X > ɛ + (θ 2 ))]n ( ɛ) n 3
and P ( X (n) (θ + 2 ) > ɛ) P (X (n) < ɛ + (θ + 2 )) [P (X < ɛ + (θ + 2 ))]n ( ɛ) n Since n ( ɛ)n <, by the crollary of Borel-Cantelli lemma, we conclude that lim n X () θ 2 a.s. and lim n X (n) θ + 2 a.s. Hence lim n (X () + X (n) )/2 θ a.s. (b) X n : (X () + + X (n) )/n (X + + X n )/n Since E(X ) θ, E[X n ] θ and V ar[x n ] E[X n θ 2 ] n E[(X θ) 2 ] n 0. 2n Therefore, X n is L 2 consistent. 4. (0 pt) Let {X i } iid No(µ, σ 2 ) be a random sample from the normal distribution with unknown mean µ R and known variance σ 2 > 0. For fixed t 0, find the Uniform Minimum-Variance Unbiased Estimator (UMVUE) of e tµ and show that its variance is larger than the Cramér-Rao lower bound. Show that the ratio of its variance to the Cramér-Rao lower bound converges to as n. (a) The sample mean X is complete and sufficient for µ. Since E[e tx ] e µt+σ2 t 2 /(2n), the UMVUE of e tµ is T (X) e σ2 t 2 /(2n)+tX. The Fisher information I(µ) n/σ 2. Then the Cramér-Rao lower bound is ( d dµ etµ ) 2 /I(µ) σ 2 t 2 e 2tµ /n. On the other hand, V ar(t ) e σ2 t 2 /n Ee 2tX e 2tµ (e σ2 t 2 /n )e 2tµ > σ2 t 2 e 2tµ, the Cramér-Rao lower bound. (b) The ratio of the variance of the UMVUE over the Cramér-Rao lower bound is (e σ2 t 2 /n )/(σ 2 t 2 /n), which converges to as n, since lim x 0 (e x )/x n 4
5. (0 pt) Let X have density function f(x θ), x R d, and let θ have (proper) prior density π(θ). Let δ π (x) denote the Bayes estimate of θ under π(θ) for squared-error loss (i.e., δ π (x) : E[θ X x]) and suppose δ π has finite Bayes risk r(π, δ π ) E[ δ π (X) θ 2 ]. Show that for any other estimator δ(x), (δ(x) r(π, δ) r(π, δ π ) δ π (x) )2 f(x)dx where f(x) is the marginal density of X. If f(x θ) is normal No(θ, ), consider the collection of estimators of the form δ(x) : cx + d. Show that whenever 0 c < these estimators are all proper Bayes estimators, and hence admissible [Hint: Find a prior π for which δ π (X) cx + d]. Show that if c > the resulting estimator is inadmissible. (a) r(π, δ) r(π, δ π ) [(θ δ(x)) 2 (θ δ π (x)) 2 ]f(x θ)π(θ)dxdθ [(θ δ(x)) 2 (θ δ π (x)) 2 ]π(θ x)f(x)dθdx f(x)[ [(θ δ(x)) 2 (θ δ π (x)) 2 ]π(θ x)f(x)dθ]dx f(x)[δ 2 (x) δ π2 (x) + 2δ π2 (x) 2δδ π ]dx (δ(x) δ π (x) )2 f(x)dx (b) Take π(θ) No(µ, τ 2 ), where τ is the precision parameter. Then E[θ x] τ 2 τ 2 + x + τ 2 + µ. Let c τ 2, d µ. Whenever 0 c <, we can always get µ, τ 2 and τ 2 + τ 2 + cx + d is a Bayes estimator with proper prior No(µ, τ 2 ). (c) When c >, MSE(cX + d) R(θ, δ c ) (cθ + d θ) 2 + c 2 V ar(x) >, while MSE for δ 0 (X) X is, so the estimator δ 0 (X) X is uniformly better than cx + d. Similarly when c, d 0, X is uniformly better than X + d. 6. (0 pt) Bickel & Doksum pg 97, problem 3.2. 5
(a) One way to show this is to notice that it is a limiting case of homework problem.2.4 from the first homework set, taking the limit as τ 2 goes to infinity. Then the posterior distribution of µ approaches a normal distribution with mean x and standard deviation σ 2. If we were bad and interpreted a frequentist confidence interval for µ in a Bayesian way, we would be implicitly assuming that our prior knowledge of µ was utterly nil that all real values were equally likely in our minds (a questionable assumption, since it is not a proper prior). Here is a slightly more detailed development of the solution. We know that if X j iid No(θ, σ 2 ), then the sufficient statistic X No(θ, σ 2 /n). So the likelihood function for θ is given by: l(θ) f( x θ) ( ) 2πσ2 /n exp ( x 2σ 2 θ)2 /n The posterior distribution of θ given X is given by: π(θ X) π(θ X) π(θ)f(x θ) ( ) 2πσ2 /n exp ( x 2σ 2 θ)2 /n This function, which came about because it was the pdf of x, is also the kernel of a normal density for θ, and it is easy to see that the mean is x and the variance is σ 2 /n. So θ X No( x, σ 2 /n). Under squared error loss, the Bayes estimator is the posterior mean, which in this case is x. 7. (0 pt) Bickel & Doksum pg 97, problem 3.2.4 (a) We begin with λ θ. If we take the prior on λ to be the improper prior θ π λ (λ), we find the induced prior on θ as follows: ( θ π θ (θ) π λ θ ) [ ] d θ dθ θ ( θ) 2 ( θ) 2 Using this prior for θ, we find the posterior distribution of θ thus: π(θ X) π(θ)π(x θ) ( θ) 2 ( θ S ( θ) n S) θ S ( θ) n S 2 6
This is the kernel of a Be(S +, n S ) density, so long as the conditions are met for such a density to exist, which are that S + > 0 and n S > 0. The first of these will always be true, but the second will only be true when n S >, i.e., when at least two failures are observed. (Should we have n S or n S 0, then the expression θ S ( θ) n S 2 would not have a fininte integral, so the posterior distribution on θ would be improper, a decided no-no.) Under the squared error loss function, the Bayes estimator is the posterior mean S+ of θ, which for a Be(S +, n S ) density is known to be S+. (S+)+(n S ) n Note that some students calculated the Bayes rule for λ instead of for θ, which S+ I also marked correct. The Bayes rule for λ is, exists when S < n, n S 2 finite when S < n 2. 7