Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

Similar documents
Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Stat410 Probability and Statistics II (F16)

Summary. Recap ... Last Lecture. Summary. Theorem

Last Lecture. Wald Test

Introductory statistics

Unbiased Estimation. February 7-12, 2008

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

An Introduction to Asymptotic Theory

Last Lecture. Unbiased Test

Exponential Families and Bayesian Inference

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Lecture 11 and 12: Basic estimation theory

Solutions: Homework 3

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Distribution of Random Samples & Limit theorems

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

1.010 Uncertainty in Engineering Fall 2008

Parameter, Statistic and Random Samples

1. Parameter estimation point estimation and interval estimation. 2. Hypothesis testing methods to help decision making.

Probability and Statistics

Lecture 33: Bootstrap

Lecture 9: September 19

Lecture 12: September 27

Machine Learning Brett Bernstein

Estimation for Complete Data

Random Variables, Sampling and Estimation

Simulation. Two Rule For Inverting A Distribution Function

Lecture 19: Convergence

LECTURE 8: ASYMPTOTICS I

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Lecture 10 October Minimaxity and least favorable prior sequences

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

This section is optional.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Lecture 13: Maximum Likelihood Estimation

Lecture 15: Density estimation

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Statistical Theory MT 2009 Problems 1: Solution sketches

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Lecture 2: Monte Carlo Simulation

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ST5215: Advanced Statistical Theory

Statistical Theory MT 2008 Problems 1: Solution sketches

Let A and B be two events such that P (B) > 0, then P (A B) = P (B A) P (A)/P (B).

Machine Learning Brett Bernstein

Sieve Estimators: Consistency and Rates of Convergence

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 2: Statistical Decision Theory (Part I)

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

Mathmatical Statisticals

Maximum Likelihood Estimation and Complexity Regularization

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Agnostic Learning and Concentration Inequalities

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

TAMS24: Notations and Formulas

Lecture Stat Maximum Likelihood Estimation

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

STATISTICAL INFERENCE

Lecture 8: Convergence of transformations and law of large numbers

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Notes 19 : Martingale CLT

AMS570 Lecture Notes #2

4.1 Non-parametric computational estimation

Statistical inference: example 1. Inferential Statistics

Mathematical Statistics Anna Janicka

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Questions and Answers on Maximum Likelihood

Lecture Chapter 6: Convergence of Random Sequences

ECE 901 Lecture 13: Maximum Likelihood Estimation

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

SDS 321: Introduction to Probability and Statistics

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 20: Multivariate convergence and the Central Limit Theorem

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Lecture 7: Properties of Random Samples

Sequences and Series of Functions

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

STAT Homework 1 - Solutions

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Element sampling: Part 2

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

7.1 Convergence of sequences of random variables

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Machine Learning Theory (CS 6783)

STA Object Data Analysis - A List of Projects. January 18, 2018

Transcription:

Last Lecture Biostatistics 60 - Statistical Iferece Lecture 16 Evaluatio of Bayes Estimator Hyu Mi Kag March 14th, 013 What is a Bayes Estimator? Is a Bayes Estimator the best ubiased estimator? Compared to other estimators, what are advatages of Bayes Estimator? What is cojugate family? What are the cojugate families of Biomial, oisso, ad Normal distributio? Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 1 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 / 8 - Bayes Estimator - Example θ : parameter π(θ) : prior distributio θ f (x θ) : samplig distributio osterior distributio of θ x Joit π(θ x) Margial f (x θ)π(θ) m(x) m(x) f(x θ)π(θ)dθ (Bayes rule) Bayes Estimator of θ is E(θ x) θπ(θ x)dθ θ 1,, iid Beroulli(p) π(p) Beta(α, β) rior guess : ˆp α α+β osterior distributio : π(p x) Beta( x i + α, x i + β) Bayes estimator ˆp α + x i α + β + xi α + β + + α α + β α + β α + β + Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 3 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 4 / 8

Loss Fuctio Optimality Loss Fuctio Let L(θ, ˆθ) be a fuctio of θ ad ˆθ Squared error loss The mea squared error (MSE) is defied as MSE(ˆθ) Eˆθ θ Let ˆθ is a estimator If ˆθ θ, it makes a correct decisio ad loss is 0 If ˆθ θ, it makes a mistake ad loss is ot 0 L(ˆθ, θ) (ˆθ θ) MSE Average Loss EL(θ, ˆθ) which is the expectatio of the loss if ˆθ is used to estimate θ Absolute error loss L(ˆθ) ˆθ θ A loss that pealties overestimatio more tha uderestimatio L(θ, ˆθ) (ˆθ θ) I(ˆθ < θ) + 10(ˆθ θ) I(ˆθ θ) Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 5 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 6 / 8 Risk Fuctio - Average Loss Alterative defiitio of R(θ, ˆθ) EL(θ, ˆθ()) θ If L(θ, ˆθ) (ˆθ θ), R(θ, ˆθ) is MSE A estimator with smaller R(θ, ˆθ) is preferred Defiitio : Bayes risk is defied as the average risk across all values of θ give prior π(θ) R(θ, ˆθ)π(θ)dθ The Bayes rule with respect to a prior π is the optimal estimator with respect to a Bayes risk, which is defied as the oe that miimize the Bayes risk R(θ, ˆθ)π(θ)dθ EL(θ, ˆθ())π(θ)dθ f(x θ)l(θ, ˆθ(x))dx π(θ)dθ f(x θ)l(θ, ˆθ(x))π(θ)dx dθ π(θ x)m(x)l(θ, ˆθ(x))dx dθ L(θ, ˆθ())π(θ x)dθ m(x)dx Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 7 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 8 / 8

osterior Expected Loss Bayes Estimator based o squared error loss osterior expected loss is defied as π(θ x)l(θ, ˆθ(x))dθ L(ˆθ, θ) (ˆθ θ) osterior expected loss (θ ˆθ) π(θ x)dθ E(θ ˆθ) x So, the goal is to miimize E(θ ˆθ) x A alterative defiitio of Bayes rule estimator is the estimator that miimizes the posterior expected loss E (θ ˆθ) x E (θ E(θ x) + E(θ x) ˆθ) x E (θ E(θ x)) x + E E (θ E(θ x)) x + (E(θ x) ˆθ) x E(θ x) ˆθ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 9 / 8 which is miimized whe ˆθ E(θ x) Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 10 / 8 so far Bayes Estimator based o absolute error loss Loss fuctio L(θ, ˆθ) eg (ˆθ θ), ˆθ θ Risk fuctio R(θ, ˆθ) is average of L(θ, ˆθ) across all x For squared error loss, risk fuctio is the same to MSE Bayes risk Average risk across all θ, based o the prior of θ Alteratively, average posterior error loss across all x Bayes estimator ˆθ Eθ x Based o squared error loss, Miimize Bayes risk Miimize osterior Expected Loss Suppose that L(θ, ˆθ) θ ˆθ The posterior expected loss is EL(θ, ˆθ(x)) θ ˆθ(x) π(θ x)dθ E θ ˆθ x ˆθ (θ ˆθ)π(θ x)dθ + ˆθ EL(θ, ˆθ(x)) 0, ad ˆθ is posterior media ˆθ (θ ˆθ)π(θ x)dθ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 11 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 1 / 8

Two Bayes Rules Example Cosider a poit estimatio problem for real-valued parameter θ For squared error loss, the posterior expected loss is (θ ˆθ) π(θ x)dθ E(θ ˆθ) x This expected value is miimized by ˆθ E(θ x) So the Bayes rule estimator is the mea of the posterior distributio For absolute error loss, the posterior expected loss is E( θ ˆθ x) As show previously, this is miimized by choosig ˆθ as the media of π(θ x) 1,, iid Beroulli(p) π(p) Beta(α, β) The posterior distributio follows Beta( x i + α, x i + β) Bayes estimator that miimizes posterior expected squared error loss is the posterior mea xi + α ˆp α + β + Bayes estimator that miimizes posterior expected absolute error loss is the posterior media ˆθ 0 Γ(α + β + ) Γ( x i + α)γ( x i + β) p x i +α 1 (1 p) xi +β 1 dp 1 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 13 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 14 / 8 Asymptotic Evaluatio of oit Estimators Tools for provig cosistecy Whe the sample size approaches ifiity, the behaviors of a estimator are ukow as its asymptotic properties Defiitio - Let W W ( 1,, ) W () be a sequece of estimators for τ(θ) We say W is cosistet for estimatig τ(θ) if W τ(θ) uder θ for every θ W τ(θ) (coverges i probability to τ(θ)) meas that, give ay ϵ > 0 lim τ(θ) ϵ) 0 lim τ(θ) < ϵ) 1 Whe W τ(θ) < ϵ ca also be represeted that W is close to τ(θ) implies that the probability of W close to τ(θ) approaches to 1 as goes to Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 15 / 8 Use defiitio (complicated) Chebychev s Iequality r( W τ(θ) ϵ) r((w τ(θ)) ϵ ) EW τ(θ) ϵ MSE(W ) ϵ Bias (W ) + Var(W ) ϵ Need to show that both Bias(W ) ad Var(W ) coverges to zero Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 16 / 8

Theorem for cosistecy Weak Law of Large Numbers Theorem 1013 If W is a sequece of estimators of τ(θ) satisfyig lim > Bias(W ) 0 lim > Var(W ) 0 for all θ, the W is cosistet for τ(θ) Theorem 55 Let 1,, be iid radom variables with E() µ ad Var() σ < The coverges i probability to µ ie µ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 17 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 18 / 8 Cosistet sequece of estimators Example Theorem 1015 Let W is a cosistet sequece of estimators of τ(θ) Let a, b be sequeces of costats satisfyig 1 lim a 1 lim b 0 The U a W + b is also a cosistet sequece of estimators of τ(θ) Cotiuous Map Theorem If W is cosistet for θ ad g is a cotiuous fuctio, the g(w ) is cosistet for g(θ) roblem 1,, are iid samples from a distributio with mea µ ad variace σ < 1 Show that is cosistet for µ Show that 1 i1 ( i ) is cosistet for σ 3 Show that 1 1 i1 ( i ) is cosistet for σ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 19 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 0 / 8

Example - Solutio Solutio - cosistecy for σ roof: is cosistet for µ By law of large umbers, is cosistet for µ Bias( ) E( ) µ µ µ 0 ( ) i1 Var( ) Var i 1 i1 Var( i) σ / σ lim Var() lim 0 By Theorem 1013 is cosistet for µ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 1 / 8 (i ) By law of large umbers, 1 i ( i + i ) i + i1 i i E µ + σ Note that is a fuctio of Defie g(x) x, which is a cotiuous fuctio The g() is cosistet for µ Therefore, (i ) i (µ + σ ) µ σ Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 / 8 Solutio - cosistecy for σ (cot d) Example - Expoetial Family From the preious slide, ( i ) / is cosistet for σ Defie S 1 1 (i ), ad (S ) 1 (i ) S 1 (i ) (S 1 ) 1 Because (S ) was show to be cosistet for σ previously, ad a 1 1 as, by Theorem 1015, S is also cosistet for σ roblem iid Suppose 1,, Expoetial(β) 1 ropose a cosistet estimator of the media ropose a cosistet estimator of r( c) where c is costat Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 3 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 4 / 8

Cosistet estimator for the media Cosistet estimator of r( c) First, we eed to express the media i terms of the parameter β m 0 1 β e x/β dx 1 e x/β m 1 0 1 e m/β 1 media m β log By law of large umbers, is cosistet for E β Applyig cotiuous mappig Theorem to g(x) x log, g() log is cosistet for g(β) β log (media) r( c) c 0 1 β e x/β dx 1 e c/β As is cosistet for β, 1 e c/β is cotiuous fuctio of β By cotiuous mappig Theorem, g() 1 e c/ is cosistet for r( c) 1 e c/β g(β) Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 5 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 6 / 8 Cosistet estimator of r( c) - Alterative Method Defie Y i I( i c) The Y i iid Beroulli(p) where p r( c) Y 1 Y i 1 i1 I( i c) i1 is cosistet for p by Law of Large Numbers Today Fuctios Law of Large Numbers Next Lecture Cetral Limit Theorem Slutsky Theorem Delta Method Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 7 / 8 Hyu Mi Kag Biostatistics 60 - Lecture 16 March 14th, 013 8 / 8