STAT215: Solutions for Homework 2

Similar documents
STAT215: Solutions for Homework 1

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples

Part III. A Decision-Theoretic Approach and Bayesian testing

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Econometrics I, Estimation

Evaluating the Performance of Estimators (Section 7.3)

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 8 October Bayes Estimators and Average Risk Optimality

1. Fisher Information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

ST5215: Advanced Statistical Theory

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Statistics: Learning models from data

Statistical Inference

Lecture 7 October 13

Problem Selected Scores

STAT 830 Decision Theory and Bayesian Methods

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Qualifying Exam in Probability and Statistics.

Bayesian statistics: Inference and decision theory

Lecture 2: Statistical Decision Theory (Part I)

λ(x + 1)f g (x) > θ 0

Statistics & Data Sciences: First Year Prelim Exam May 2018

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

ST5215: Advanced Statistical Theory

Brief Review on Estimation Theory

1. (Regular) Exponential Family

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

STAT 830 Bayesian Estimation

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

STAT 730 Chapter 4: Estimation

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Lecture 1: Introduction

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Parameter Estimation

Mathematical statistics

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

6.1 Variational representation of f-divergences

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

7. Estimation and hypothesis testing. Objective. Recommended reading

Qualifying Exam in Probability and Statistics.

Stat 5102 Final Exam May 14, 2015

Chapter 3. Point Estimation. 3.1 Introduction

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

ECE531 Lecture 8: Non-Random Parameter Estimation

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Chapters 9. Properties of Point Estimators

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

STAT 512 sp 2018 Summary Sheet

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Lecture 8: Information Theory and Statistics

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Maximum Likelihood Estimation

First Year Examination Department of Statistics, University of Florida

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Maximum Likelihood Estimation

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Statistics Masters Comprehensive Exam March 21, 2003

Suggested solutions to written exam Jan 17, 2012

Principles of Statistics

Maximum Likelihood Estimation

Density Estimation. Seungjin Choi

Lecture 7 Introduction to Statistical Decision Theory

Statistics and Econometrics I

Theory of Maximum Likelihood Estimation. Konstantin Kashin

MAS223 Statistical Inference and Modelling Exercises

Theory of Statistics.

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Mathematical statistics

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

David Giles Bayesian Econometrics

General Bayesian Inference I

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Nearest Neighbor Pattern Classification

ECE 275A Homework 7 Solutions

MIT Spring 2016

BTRY 4090: Spring 2009 Theory of Statistics

. Find E(V ) and var(v ).

Frequentist-Bayesian Model Comparisons: A Simple Example

CSC411 Fall 2018 Homework 5

STAT 450: Final Examination Version 1. Richard Lockhart 16 December 2002

Lecture 1: Bayesian Framework Basics

MIT Spring 2016

Parametric Techniques Lecture 3

Introduction to Machine Learning. Lecture 2

Minimax lower bounds I

Transcription:

STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator of θ. Obtain the maximum likelihood estimator (MLE) ˆθ(x) of θ and show that it is not unique.is any choice of MLE unbiased? (a) Let T (2) 4, T (x) 0 when x 2. Then E[T (X)] 4 θ + 0 (( θ)/4 + θ/2 + /2 + (3 θ)/2) θ 4, which is an unbiased estimator of θ. (b) Denote the MLE ˆθ. Since there s only one observation: ˆθ( 2) arg max(( θ)/4)) 0 ˆθ( ) arg max(θ/2) ˆθ(0) arg max(/2) c (0, ) ˆθ() arg max((3 θ)/2) 0 ˆθ(2) arg max( θ 4 ) Since ˆθ(0) can be multiple values, it s not unique and all the MLE s are biased since E[ˆθ] 0 ( θ)/4 + θ/2 + ˆθ(0) 2 + 0 (3 θ)/2 + θ 4 θ 2. (0 pt) Bickel & Doksum pg 53, problem 2.3.8

(a) It is enough to show that the negative log likelihood function l(θ) log f(x i θ) is a strictly convex function of θ R p. Since it s the sum of the negative log likelihoods for each X j, and a sum of strictly convex functions is strictly convex, it s enough to consider a single observation n. To show the negative log likelihood is strictly convex it is enough to show that its Hessian, the matrix H ij 2 θ i θ j l(θ) of second derivatives, is positive-definite everywhere (except possibly at ˆθ x). Let s compute the necessary derivatives: θ 2 k ) α/2 l(θ) log c α + (x i θ i ) 2 ) (α 2)/2 l(θ) α (x i θ i ) 2 (x k θ k ) θ k ) (α 2)/2 ) (α 4)/2 2 l(θ) α (x i θ i ) 2 + α(α 2) (x i θ i ) 2 (x k θ k ) 2 α (x i θ i ) 2 + (α 2)(x k θ k ) 2 ] 2)(α 4)/2 [ p (x i θ i ) ) (α 4)/2 2 l(θ) α (x i θ i ) 2 (α 2)(x i θ i )(x k θ k ) θ i θ k The Hessian H α x θ α 4 A is the product of a constant positive factor α x θ α 4 and a matrix A whose on- and off-diagonal entries are: A kk p (x i θ i ) 2 + (α 2)(x k θ k ) 2 A kj (α 2)(x k θ k )(x j θ j ) If we introduce the notation (x θ) R p for the vector with components i (x i θ i ), and I p for the p p identity matrix, we can write A in the form A 2 I p + (α 2) The matrix A staisfies A λ with eigenvalue λ 2 (α ), strictly postive since α > (except at 0, i.e., θ ˆθ x, which is okay). The other eigenvectors are orthogonal to, all with eigenvalues λ 2, which are also strictly positive. Thus A is positive-definite and so is the Hessian, H α α 4 A. 2

(b) First consider the case where α in dimension p, with n 2m even. WLOG order the data x x 2 x n. The log likelihood function is given by l(θ) n log c(α) x i θ, a continuous function whose derivative does not exist at the data points θ {x i } and which elsewhere satisfies d dθ l(θ) d dθ x i θ [ ] [ ] ( ) + (+) x i <θ x i >θ The number of x i > θ minus the number of x i < θ (note that the derivative of x θ is on the interval θ (, x), + on the interval θ (x, ), and undefined at the point θ x). Thus l(θ) is increasing when θ < x m, when more than half the {x i } exceed θ, and is decreasing when θ > x m+, when fewer than half the {x i } exceed θ. In the interval x m < θ < x m+ the derivative is zero, so l(θ) is constant there and equal to its maximum value. In case n 2m is odd, the same argument shows that l(θ) achieves a unique maximum at the median x m. In dimension p > 2 a similar argument holds, only ˆθ now should be any value in the median rectangle (or median block ) where each of its components is a median of the corresponding components of the x i s. 3. (0 pt) Let (X,, X n ) be a random sample from the uniform distribution on the interval (θ 2, θ + 2 ), where θ R is unknown. Let X (j) be the j th order statistic. (a) Show that (X () + X (n) )/2 is strongly consistent for θ, i.e., that lim n (X () + X (n) )/2 θ a.s. (b) Show that X n : (X () + + X (n) )/n is L 2 consistent. solution: (a) For any ɛ > 0, P ( X () (θ 2 ) > ɛ) P (X () > ɛ + (θ 2 )) [P (X > ɛ + (θ 2 ))]n ( ɛ) n 3

and P ( X (n) (θ + 2 ) > ɛ) P (X (n) < ɛ + (θ + 2 )) [P (X < ɛ + (θ + 2 ))]n ( ɛ) n Since n ( ɛ)n <, by the crollary of Borel-Cantelli lemma, we conclude that lim n X () θ 2 a.s. and lim n X (n) θ + 2 a.s. Hence lim n (X () + X (n) )/2 θ a.s. (b) X n : (X () + + X (n) )/n (X + + X n )/n Since E(X ) θ, E[X n ] θ and V ar[x n ] E[X n θ 2 ] n E[(X θ) 2 ] n 0. 2n Therefore, X n is L 2 consistent. 4. (0 pt) Let {X i } iid No(µ, σ 2 ) be a random sample from the normal distribution with unknown mean µ R and known variance σ 2 > 0. For fixed t 0, find the Uniform Minimum-Variance Unbiased Estimator (UMVUE) of e tµ and show that its variance is larger than the Cramér-Rao lower bound. Show that the ratio of its variance to the Cramér-Rao lower bound converges to as n. (a) The sample mean X is complete and sufficient for µ. Since E[e tx ] e µt+σ2 t 2 /(2n), the UMVUE of e tµ is T (X) e σ2 t 2 /(2n)+tX. The Fisher information I(µ) n/σ 2. Then the Cramér-Rao lower bound is ( d dµ etµ ) 2 /I(µ) σ 2 t 2 e 2tµ /n. On the other hand, V ar(t ) e σ2 t 2 /n Ee 2tX e 2tµ (e σ2 t 2 /n )e 2tµ > σ2 t 2 e 2tµ, the Cramér-Rao lower bound. (b) The ratio of the variance of the UMVUE over the Cramér-Rao lower bound is (e σ2 t 2 /n )/(σ 2 t 2 /n), which converges to as n, since lim x 0 (e x )/x n 4

5. (0 pt) Let X have density function f(x θ), x R d, and let θ have (proper) prior density π(θ). Let δ π (x) denote the Bayes estimate of θ under π(θ) for squared-error loss (i.e., δ π (x) : E[θ X x]) and suppose δ π has finite Bayes risk r(π, δ π ) E[ δ π (X) θ 2 ]. Show that for any other estimator δ(x), (δ(x) r(π, δ) r(π, δ π ) δ π (x) )2 f(x)dx where f(x) is the marginal density of X. If f(x θ) is normal No(θ, ), consider the collection of estimators of the form δ(x) : cx + d. Show that whenever 0 c < these estimators are all proper Bayes estimators, and hence admissible [Hint: Find a prior π for which δ π (X) cx + d]. Show that if c > the resulting estimator is inadmissible. (a) r(π, δ) r(π, δ π ) [(θ δ(x)) 2 (θ δ π (x)) 2 ]f(x θ)π(θ)dxdθ [(θ δ(x)) 2 (θ δ π (x)) 2 ]π(θ x)f(x)dθdx f(x)[ [(θ δ(x)) 2 (θ δ π (x)) 2 ]π(θ x)f(x)dθ]dx f(x)[δ 2 (x) δ π2 (x) + 2δ π2 (x) 2δδ π ]dx (δ(x) δ π (x) )2 f(x)dx (b) Take π(θ) No(µ, τ 2 ), where τ is the precision parameter. Then E[θ x] τ 2 τ 2 + x + τ 2 + µ. Let c τ 2, d µ. Whenever 0 c <, we can always get µ, τ 2 and τ 2 + τ 2 + cx + d is a Bayes estimator with proper prior No(µ, τ 2 ). (c) When c >, MSE(cX + d) R(θ, δ c ) (cθ + d θ) 2 + c 2 V ar(x) >, while MSE for δ 0 (X) X is, so the estimator δ 0 (X) X is uniformly better than cx + d. Similarly when c, d 0, X is uniformly better than X + d. 6. (0 pt) Bickel & Doksum pg 97, problem 3.2. 5

(a) One way to show this is to notice that it is a limiting case of homework problem.2.4 from the first homework set, taking the limit as τ 2 goes to infinity. Then the posterior distribution of µ approaches a normal distribution with mean x and standard deviation σ 2. If we were bad and interpreted a frequentist confidence interval for µ in a Bayesian way, we would be implicitly assuming that our prior knowledge of µ was utterly nil that all real values were equally likely in our minds (a questionable assumption, since it is not a proper prior). Here is a slightly more detailed development of the solution. We know that if X j iid No(θ, σ 2 ), then the sufficient statistic X No(θ, σ 2 /n). So the likelihood function for θ is given by: l(θ) f( x θ) ( ) 2πσ2 /n exp ( x 2σ 2 θ)2 /n The posterior distribution of θ given X is given by: π(θ X) π(θ X) π(θ)f(x θ) ( ) 2πσ2 /n exp ( x 2σ 2 θ)2 /n This function, which came about because it was the pdf of x, is also the kernel of a normal density for θ, and it is easy to see that the mean is x and the variance is σ 2 /n. So θ X No( x, σ 2 /n). Under squared error loss, the Bayes estimator is the posterior mean, which in this case is x. 7. (0 pt) Bickel & Doksum pg 97, problem 3.2.4 (a) We begin with λ θ. If we take the prior on λ to be the improper prior θ π λ (λ), we find the induced prior on θ as follows: ( θ π θ (θ) π λ θ ) [ ] d θ dθ θ ( θ) 2 ( θ) 2 Using this prior for θ, we find the posterior distribution of θ thus: π(θ X) π(θ)π(x θ) ( θ) 2 ( θ S ( θ) n S) θ S ( θ) n S 2 6

This is the kernel of a Be(S +, n S ) density, so long as the conditions are met for such a density to exist, which are that S + > 0 and n S > 0. The first of these will always be true, but the second will only be true when n S >, i.e., when at least two failures are observed. (Should we have n S or n S 0, then the expression θ S ( θ) n S 2 would not have a fininte integral, so the posterior distribution on θ would be improper, a decided no-no.) Under the squared error loss function, the Bayes estimator is the posterior mean S+ of θ, which for a Be(S +, n S ) density is known to be S+. (S+)+(n S ) n Note that some students calculated the Bayes rule for λ instead of for θ, which S+ I also marked correct. The Bayes rule for λ is, exists when S < n, n S 2 finite when S < n 2. 7