Unbiased Estimation. February 7-12, 2008

Similar documents
IIT JAM Mathematical Statistics (MS) 2006 SECTION A

6. Sufficient, Complete, and Ancillary Statistics

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches

Chapter 6 Principles of Data Reduction

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Estimation for Complete Data

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Random Variables, Sampling and Estimation

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 6 Ecient estimators. Rao-Cramer bound.

Lecture 11 and 12: Basic estimation theory

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

This section is optional.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

1.010 Uncertainty in Engineering Fall 2008

Stat410 Probability and Statistics II (F16)

Lecture 7: Properties of Random Samples

Exponential Families and Bayesian Inference

SOLUTION FOR HOMEWORK 7, STAT np(1 p) (α + β + n) + ( np + α

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Maximum Likelihood Estimation

Lecture 12: September 27

Solutions: Homework 3

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

4. Partial Sums and the Central Limit Theorem

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0.

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

5. Likelihood Ratio Tests

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

2. The volume of the solid of revolution generated by revolving the area bounded by the

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

7.1 Convergence of sequences of random variables


Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Mathematical Statistics - MS

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Distribution of Random Samples & Limit theorems

Summary. Recap ... Last Lecture. Summary. Theorem

Problem Set 4 Due Oct, 12

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

SDS 321: Introduction to Probability and Statistics

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Machine Learning Brett Bernstein

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Simulation. Two Rule For Inverting A Distribution Function

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Expectation and Variance of a random variable

7.1 Convergence of sequences of random variables

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Lecture 20: Multivariate convergence and the Central Limit Theorem

EE 4TM4: Digital Communications II Probability Theory

Questions and Answers on Maximum Likelihood

Estimation of the Mean and the ACVF

Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

Last Lecture. Unbiased Test

Machine Learning Brett Bernstein

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

Properties of Point Estimators and Methods of Estimation

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Generalized Semi- Markov Processes (GSMP)

Stat 319 Theory of Statistics (2) Exercises

Topic 8: Expected Values

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Surveying the Variance Reduction Methods

AMS570 Lecture Notes #2

Lecture 19: Convergence

Approximations and more PMFs and PDFs

Element sampling: Part 2

Lecture 6: Coupon Collector s problem

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23

Lecture 13: Maximum Likelihood Estimation

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Probability and Statistics

November 2002 Course 4 solutions

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Maximum Likelihood Methods (Hogg Chapter Six)

MATHEMATICAL SCIENCES PAPER-II

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

4. Basic probability theory

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Lecture 18: Sampling distributions

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

TAMS24: Notations and Formulas

Kernel density estimator

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Stochastic Simulation

Solutions to Homework 6

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Transcription:

Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom variables, we shall use the term desity fuctio to refer to both cotiuous ad discrete radom variables. Thus, to each θ Θ, there exists ad desity fuctio which we deote Example (Parametric families of desities). f(x θ).. Biomial radom variables with kow umber of trials but ukow success probability parameter θ has desity ( ) f(x θ) = θ x ( θ) x. x 2. Normal radom variables with kow variace σ 0 but ukow mea µ has desity ) (x µ)2 f(x µ) =. exp ( σ 0 2π 2σ0 2. 3. Normal radom variables with ukow mea µ ad variace σ has desity f(x µ, θ) = ) ( σ 2π exp (x µ)2 2σ 2. Defiitio 2. A statistic is a fuctio of the radom variable that does ot deped o ay ukow parameter. The goal of estimatio is to determie which of the P θ is the source of the data X. I this case the actio space A is the same as the parameter space ad the estimator is the decisio fuctio d : data Θ. Example 3. If X = (X,..., X ) are idepedet Ber(θ) radom variables, the the simple choice for estimatig θ is d(x,..., x ) = (x + + x ) = x.

Ubiased Estimators Defiitio 4. A statistic d is called a ubiased estimator for a fuctio of the parameter g(θ) provided that for every θ Θ E θ d(x) = g(θ). Ay estimator that ot ubiased is called biased. If the image of g(θ) is a vector space, the the bias b d (θ) = E θ d(x) g(θ). Exercise 5. If X,..., X form a simple radom sample with ukow fiite mea µ, the X is a ubiased estimator of µ. If the X i have variace σ 2, the Var( X) = σ2. If we choose the quadratic loss fuctio L(θ, a) = (a θ) 2, the correspodig risk fuctio R 2 (g(θ), d) = E θ [(d(x) g(θ)) 2 ] = E θ [(d(x) E θ d(x) + b d (θ)) 2 ] = E θ [(d(x) E θ d(x)) 2 ] + 2b d (θ)(e θ [(d(x) E θ d(x)] + b d (θ) 2 = Var θ (d(x)) + b d (θ) 2 Note that the risk is the variace of a ubiased estimator ad the bias adds to the risk. I the example above, with d(x) = X, Thus, x is a ubiased estimator for θ. I additio, E θ X = (θ + + θ) = θ Var( X) = 2 (θ( θ) + + θ( θ)) = θ( θ). Example 6. If, i additio, the simple radom sample has ukow fiite variace σ 2, the, we ca cosider the sample variace S 2 = (X i X) 2. To fid the mea of S 2, we begi with the idetity (X i µ) 2 = = = ((X i X) + ( X µ)) 2 (X i X) 2 + (X i X)( X µ) + ( X µ) 2 (X i X) 2 + ( X µ) 2 2

The, Thus, ad is a ubiased estimator for σ 2. ES 2 = E [ ] (X i µ) 2 ( X µ) 2 = σ2 σ2 = σ2. [ ] E S2 = σ 2 S2 = (X i X) 2 Defiitio 7. A ubiased estimator d is a uiformly miimum variace ubiased estimator (UMVUE) if d(x) has fiite variace for every value θ of the parameter ad for every ubiased estimator d, The efficiecy of ubiased estimator d, Thus, the efficiecy is betwee 0 ad. 2 Cramér-Rao Boud Var θ d(x) Var θ d(x). e( d) = Var θd(x) Var θ d(x). First, we will review a bit o correlatio. For two radom variables Y ad Z, the correlatio ρ(y, Z) = Cov(Y, Z) Var(Y )Var(Z). () The correlatio takes values ρ(y, Z) ad takes the extreme values ± if ad oly if Y ad Z are liearly related, i.e., Z = ay + b for some costats a ad b. Cosequetly, Cov(Y, Z) 2 Var(Y )Var(Z). If the radom variable Z has mea zero, the Cov(Y, Z) = E[Y Z] ad E[Y Z] 2 Var(Y )Var(Z) = Var(Y )EZ 2. (2) We begi with data X = (X,..., X ) draw from a ukow probability P θ. The paramater space Θ R. Deote the joit desity of these radom variables f(x θ), where x = (x..., x ). 3

I the case that the date comes from a simple radom sample the the joit desity is the product of the margial desities. f(x θ) = f(x θ) f(x θ). (3) For cotiuous radom variables, we have = f(x θ) dx R (4) Now, let d be the ubiased estimator of g(θ), the g(θ) = E θ d(x) = d(x)f(x θ) dx R (5). If the fuctios i (4) ad (5) are differetiable with respect to the parameter θ ad we ca pass the derivative through the itegral, the [ ] f(x θ) l f(x θ) l f(x θ) 0 = dx = f(x θ) dx = E θ. (6) From a similar calculatio, R R [ ] g l f(x θ) (θ) = E θ d(x). (7) Now, retur to the review o correlatio with Y = d(x) ad the score fuctio Z = l f(x θ)/.. The, by equatio (6), EZ = 0, ad from equatios (7) ad (2), we fid that g (θ) 2 = E θ [ d(x) l f(x θ) ] [ 2 ( ) ] 2 l f(x θ) Var θ (d(x))e θ, or, where Var θ (d(x)) g (θ) 2 [ ( ) ] 2 l f(x θ) I(θ) = E θ I(θ). (8) is called the Fisher iformatio. Equatio (8), called the Cramér-Rao lower boud or the iformatio iequality, states that the lower boud for the variace of a ubiased estimator is the reciprocal of the Fisher iformatio. I other words, the higher the iformatio, the lower is the possible value of the variace of a ubiased estimator. If we retur to the case of a simple radom sample the l f(x θ) = l f(x θ) + + l f(x θ). Also, the radom variables {l f(x k θ); k } are idepedet ad have the same distributio. Thus, the Fisher iformatio. I(θ) = E[(l f(x θ)) 2 ]. 4

Example 8. For idepedet Beroulli radom variable with ukow success probability θ, E [ ( f(x θ) l f(x θ) = x l θ + ( x) l( θ), f(x θ) = x θ x θ = x θ θ( θ), ) ] 2 = θ 2 ( θ) 2 E[(X θ)2 ] = θ( θ) ad the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least θ( θ)/. However, if we take d(x) = x, the θ( θ) V ar µ d(x) = ad x is a uiformly miimum variace ubiased estimator. Example 9. For idepedet ormal radom variables with kow variace σ 2 0 ad ukow mea µ, (x µ) 2 l f(x µ) = l(σ 0 2π). 2σ 2 0 ad E [ ( µ f(x µ) µ f(x µ) = σ0 2 (x µ). ) ] 2 = σ0 4 E[(X µ) 2 ] = σ0 2. Agai, the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least σ 2 0/. However, if we take d(x) = x, the V ar µ d(x) = σ2 0. ad x is a uiformly miimum variace ubiased estimator. Recall that for the correlatio to be ±, the estimator d(x) ad the score fuctio l f(x θ)/. must be liearly related with probability. After itegratig, we obtai, l f(x θ) = a(θ)d(x) + b(θ) f(x θ) = c(θ)h(x) exp(π(θ)d(x)). (9) We shall call desity fuctios satisfyig equatio (9) a expoetial family with atural parameter π(θ). 5

Example 0 (Poisso radom variables). f(x λ) = λx x! e λ = e λ exp(x l λ). x! Thus, Poisso radom variables are a expoetial family. The score fuctio The Fisher iformatio λ f(x λ) = λ (x l λ l x! λ) = x λ. I(λ) = E λ [ (X λ ) 2 ] = λ 2 E λ[(x λ) 2 ] = λ. If X is P ois(λ), the E λ X = Var λ (X) = λ. For a simple radom sample havig observatios both X ad k= (X k X) 2 /( ) are ubiased estimators. However, Var λ ( X) = λ ad d(x) = x has efficiecy. This could have bee predicted. The desity of idepedet observatios is f(x λ) = e λ λ x +x x! x! = e λ λ x x! x! ad so the score fuctio x l f(x λ) = ( λ + x l λ) = + λ λ λ showig that the estimator ad the score fuctio are liearly related. 6