Evaluating the Performance of Estimators (Section 7.3)

Similar documents
Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Chapters 9. Properties of Point Estimators

ST5215: Advanced Statistical Theory

Elements of statistics (MATH0487-1)

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

A General Overview of Parametric Estimation and Inference Techniques.

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Mathematical Statistics

STAT215: Solutions for Homework 2

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

ST5215: Advanced Statistical Theory

STAT 512 sp 2018 Summary Sheet

MIT Spring 2016

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Theory of Statistics.

Lecture 2: Statistical Decision Theory (Part I)

f (1 0.5)/n Z =

Statistics and Econometrics I

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators

All other items including (and especially) CELL PHONES must be left at the front of the room.

1 Complete Statistics

Brief Review on Estimation Theory

Chapter 3. Point Estimation. 3.1 Introduction

HT Introduction. P(X i = x i ) = e λ λ x i

Lecture 2: Basic Concepts of Statistical Decision Theory


ECE 275A Homework 7 Solutions

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

A Very Brief Summary of Statistical Inference, and Examples

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Part 4: Multi-parameter and normal models

Proof In the CR proof. and

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Week 9 The Central Limit Theorem and Estimation Concepts

3.0.1 Multivariate version and tensor product of experiments

Estimation of parametric functions in Downton s bivariate exponential distribution

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

A Very Brief Summary of Statistical Inference, and Examples

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STAT 830 Decision Theory and Bayesian Methods

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Math 362, Problem set 1

1. (Regular) Exponential Family

Lecture 4: UMVUE and unbiased estimators of 0

Review and continuation from last week Properties of MLEs

Parametric Inference

F79SM STATISTICAL METHODS

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Limiting Distributions

Inference in non-linear time series

Peter Hoff Statistical decision problems September 24, Statistical inference 1. 2 The estimation problem 2. 3 The testing problem 4

f(y θ) = g(t (y) θ)h(y)

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Qualifying Exam in Probability and Statistics.

Math 494: Mathematical Statistics

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

4 Invariant Statistical Decision Problems

STA 260: Statistics and Probability II

Limiting Distributions

Lecture 12 November 3

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

University of Regina. Lecture Notes. Michael Kozdron

Statistics Ph.D. Qualifying Exam

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

13. Parameter Estimation. ECE 830, Spring 2014

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Lecture 4: September Reminder: convergence of sequences

Chapter 4. Chapter 4 sections

The Central Limit Theorem

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Lecture 7 Introduction to Statistical Decision Theory

Estimation, Inference, and Hypothesis Testing

Basic Concepts of Inference

6 The normal distribution, the central limit theorem and random samples

Maximum Likelihood Estimation

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Spring 2012 Math 541B Exam 1

Lecture notes on statistical decision theory Econ 2110, fall 2013

Pubh 8482: Sequential Analysis

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Chapter 8.8.1: A factorization theorem

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Lecture 7 October 13

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

ECE531 Lecture 10b: Maximum Likelihood Estimation

Statistical inference

Transcription:

Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean and ˆθ = M sample median. Which is better? How to measure performance? Some possibilities: 1. Compare E X θ with E M θ. 2. Compare E( X θ) 2 with E(M θ) 2. 3. Compare EL(θ, X) with EL(θ, M) where L(, ) is an appropriate loss function : the value L(θ, a) is some measure of the loss incurred when the true value is θ and our estimate is a. absolute error loss: L(θ, a) = a θ squared error loss: L(θ, a) = (a θ) 2 game show loss: L(θ, a) = I( a θ > c) Stein s loss (for θ, a > 0): L(θ, a) = a θ 1 log ( a θ ) Historically, estimators have been most frequently compared using Mean Squared Error: MSE(θ) = E θ (ˆθ θ) 2. This is because the MSE can often be calculated or approximated (for large samples), and has nice mathematical properties.

Admissible and Inadmissible Estimators Let W = W (X) be an estimator of τ = τ(θ). Define MSE W (θ) = E θ (W τ(θ)) 2. An estimator W is inadmissible (w.r.t. squared error loss) if there exists another estimator V = V (X) such that MSE V (θ) MSE W (θ) θ Θ with strict inequality for at least one value of θ. (An estimator is inadmissible if there is another estimator that beats it.) An estimator which is not inadmissible is called admissible. (Draw some pictures.) Note: An admissible estimator may actually be very bad. An inadmissible estimator can sometimes be pretty good. Note: If we are using a loss function L(τ, a), we also define inadmissible and admissible estimators in the same way, replacing the MSE by the more general notion of a risk function R(θ, W ) = E θ L (τ(θ), W (X)). Examples: Again, suppose we observe X 1,..., X n iid N(θ, σ0 2), with σ0 2 known, and wish to estimate θ. Consider the estimator W 0 that always estimates θ by 0 regardless of the data X. This is a very bad estimator, but it is admissible because it is great when θ = 0. No non-degenerate estimator V can possibly beat W since it would have to satisfy MSE V (0) MSE W (0) E 0 (V 0) 2 E 0 (W 0) 2 E 0 V 2 0 P 0 (V = 0) = 1 V 0.

Now consider the estimator M sample median. We show later that the sample mean X has a uniformly smaller MSE than M so that M is inadmissible. (The two MSE functions are constant, i.e., flat.) However, M is not a bad estimate of θ, and might be used if there were doubts about the normality assumption (perhaps the true distribution has thicker tails) or concern about outliers. Bias, Variance, and MSE (for an estimator W of τ(θ)) Bias W (θ) = E θ (W τ(θ)) Var W (θ) = E θ (W E θ W ) 2 Var θ (W ) Fact: MSE W (θ) = Bias 2 W (θ) + Var W (θ) MSE = Bias 2 + Var (in brief) Proof: For any rv Y with finite second moment, we know Taking Y = W τ leads to since Var(W τ) = Var(W ). EY 2 = (EY ) 2 + Var(Y ). MSE = Bias 2 + Var Terminology: An estimator W with Bias W (θ) 0, that is, E θ W = τ(θ) θ Θ is said to be unbiased. If not, it is biased. For an unbiased estimator, MSE = Var.

Which is better? (according to MSE) Answer: Neither dominates the other. MSE Bayes (a) = p2 a(1 a) n < a(1 a) n = MSE MLE (a), MSE Bayes (0) = (1 p) 2 a 2 > 0 = MSE MLE (0), and similarly, MSE Bayes (1) > MSE MLE (1). Thus, the Bayes estimate is superior in the neighborhood of θ = a, and the MLE is superior near θ = 0 and 1. Note: both MSE Bayes (θ) and MSE MLE (θ) are parabolas (quadratic functions of θ). Note: Regarding (in)admissibility, the above remarks prove nothing. But it can be shown that both the Bayes estimate and MLE are admissible here. Typically, Bayes estimates (with proper priors) are admissible.

Proof: Recall: For any rv s X, Y with EX 2 <, we have EX = E(E(X Y )) Var(X) = E(Var(X Y )) + Var(E(X Y )) Now apply these facts: E θ [S (X)] = E θ [E θ (S(X) T (X))] = E θ S(X) = τ(θ) ] E θ (S τ) 2 = Var θ (S) = E[Var(S T )] +Var[E(S T ) }{{}}{{} 0 S Var(S ) = E θ (S τ(θ)) 2 Equality can occur only when E θ Var(S T ) = 0. But E θ Var(S T ) = E θ {E[(S E(S T )) 2 T ]} = E θ {(S E(S T )) 2 } = E θ {(S S ) 2 } = 0 iff P θ (S = S ) = 1. Arguing more loosely, E θ Var(S T ) = 0 Var(S T ) = 0 S is a function of T S E(S T ) = S.

Example: Observe X 1,..., X n iid Bernoulli(p). Find the UMVUE of p. T = i X i is a CSS. E(T/n) = p. Since T/n is an unbiased estimator of p which is a function of the CSS T, it is the UMVUE. Find the best unbiased estimator of p 2. ( ) ( T (T 1) E = p 2 Found indirectly using Rao-Blackwell. n(n 1) For a direct argument, see below. ( ) Since T (T 1) is an unbiased estimator of p 2 which is a function n(n 1) of the CSS T, it is the UMVUE. Checking unbiasedness: ET (T 1) = E(T 2 ) ET = Var(T ) + (ET ) 2 ET = np(1 p) + (np) 2 np = n(n 1)p 2 Comment: Estimate a parameter by its UMVUE is another approach to estimation, but not a very good one. Often, no unbiased estimator exists, or the only one that exists is bad. )

Example: Observe X 1,..., X n iid N(µ, σ 2 ) with θ = (µ, σ 2 ) unknown. Here T = ( x, s 2 ) is a CSS. (Recall the derivation: T is a 1-1 function of the natural SS for a 2pef.) Estimation of τ (µ, σ 2 ) = µ : x is unbiased (E x = µ) and a function of T x is UMVUE. MLE of θ is ˆθ = ( x, n 1 i (X i x) 2). says MLE of µ is τ(ˆθ) = x. So invariance principle MOM estimate is also x since E x = µ. Note: For estimating µ, the MLE, MOM, UMVUE all agree on X. But Bayes estimate is different. What about the sample median M? M is an unbiased estimator of µ. (Proof?) But it is not a function of the CSS T. Thus Rao-Blackwellizing M leads to the UMVUE (which we know is x) which has a strictly smaller variance than M. Thus: E(M T ) = x and Var(M) > Var( x) = σ 2 /n. Estimating τ (µ, σ 2 ) = σ 2 : Let SS = i (X i x) 2. s 2 = SS/(n 1) is an unbiased estimator of σ 2 and a function of the CSS T. Therefore s 2 is the UMVUE. By the invariance principle, the MLE of σ 2 is SS/n. This is slightly biased.

Estimation of τ (µ, σ 2 ) = µ 2 : The MLE of µ 2 is ( x) 2 by invariance of MLE s. x 2 is biased for µ 2 : E( x 2 ) = Var( x) + (E x) 2 = σ2 n + µ2 > µ 2. An unbiased estimate of µ 2 is W x 2 s2 n : ) ( ) E ( x 2 s2 σ 2 = n n + µ2 σ2 n = µ2. Subtracting s 2 /n removes (or corrects for) the bias in the MLE. W is the UMVUE since it is unbiased and a function of T. Which is better: x 2 or W? For n > 3, W has slightly smaller MSE than x 2. (Verify?) Thus x 2 is inadmissible for n > 3 (but is a perfectly reasonable estimator). But W is also inadmissible because it sometimes takes on impossible values. µ 2 0, but W can be negative! P (W < 0) is positive and will be sizeable when µ is small ( 1/2 when µ = 0). A better estimate is clearly W + = max(w, 0). Whenever W + W, we know W + value of µ 2. More formally is closer to the true E(W µ 2 ) 2 E(W + µ 2 ) 2 = E [ (W µ 2 ) 2 (W + µ 2 ) 2] = E[ {(W µ 2 ) 2 (0 µ 2 ) 2 }I(W < 0) ] > 0 }{{} always 0 and sometimes > 0 But W + is biased! Oh, well.

No unbiased estimator of µ 2 negative values. exists which does not take on Fact: There are situations where there are no unbiased estimators (and hence, no UMVUE exists). Example: Observe X 1,..., X n iid Poisson(λ). There exists no unbiased estimator of 1/λ. We know T = n i=1 is a CSS. Suppose S = S(X) with ES = 1/λ λ > 0. Then ψ(t ) = E(S T ) also satisfies Eψ(T ) = 1/λ λ > 0. This implies E[λψ(T ) 1] 0 so that (multiplying by n) E[nλψ(T ) n] 0. Now apply the following Lemma to T Poisson(nλ). Lemma: (See Theorem 3.6.8(a), on page 126) If Y Poisson(λ), then E[λg(Y )] = E[Y g(y 1)]. Thus E[T ψ(t 1) n] 0 so P {T ψ(t 1) n = 0} 1 by completeness of T. Thus T ψ(t 1) = n ψ(t ) = n ( ) But E n T +1 Contradiction! = 1 λ (1 e nλ ) 1/λ. T +1.

The R-B Theorem plus completeness sometimes gives easy proofs of otherwise difficult facts. Example: Suppose X 1,..., X n iid Poisson(λ). Let x and s 2 be the sample mean and variance. Note: x is a 1-1 function of the CSS T = X i and is therefore also a CSS. Note: E x = EX i = λ and Es 2 = Var(X i ) = λ. Since x is unbiased for λ and a function of the CSS x, we know it is best unbiased. But s 2 is also unbiased for λ. Since x is a CSS, E(s 2 x) must be the best unbiased estimator which is x. We conclude that E(s 2 x) = x.