Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Similar documents
f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Special distributions

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Chapters 9. Properties of Point Estimators

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

HT Introduction. P(X i = x i ) = e λ λ x i

Mathematical statistics

Theory of Statistics.

Introduction to Simple Linear Regression

MS&E 226: Small Data

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Statistics II Lesson 1. Inference on one population. Year 2009/10

Statistics and Econometrics I

Regression Estimation Least Squares and Maximum Likelihood

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

EIE6207: Estimation Theory

STAT 512 sp 2018 Summary Sheet

Evaluating the Performance of Estimators (Section 7.3)

Chapter 3: Maximum Likelihood Theory

Mathematical statistics

Review and continuation from last week Properties of MLEs

Elements of statistics (MATH0487-1)

Stat410 Probability and Statistics II (F16)

Review of Discrete Probability (contd.)

Statistics 3858 : Maximum Likelihood Estimators

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Theory of Maximum Likelihood Estimation. Konstantin Kashin

From Model to Log Likelihood

Loglikelihood and Confidence Intervals

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Central Limit Theorem ( 5.3)

Math 152. Rumbos Fall Solutions to Assignment #12

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

SOLUTION FOR HOMEWORK 8, STAT 4372

ECE 275A Homework 7 Solutions

Review. December 4 th, Review

Inference in non-linear time series

Introduction to Estimation Methods for Time Series models Lecture 2

Math 494: Mathematical Statistics

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Lecture 2: Statistical Decision Theory (Part I)

Regression #3: Properties of OLS Estimator

Advanced Signal Processing Introduction to Estimation Theory

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Introduction to Maximum Likelihood Estimation

5.2 Fisher information and the Cramer-Rao bound

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

2.6.3 Generalized likelihood ratio tests

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

A Very Brief Summary of Statistical Inference, and Examples

simple if it completely specifies the density of x

A General Overview of Parametric Estimation and Inference Techniques.

Chapter 8.8.1: A factorization theorem

Estimation of Parameters

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Statistical Inference Using Progressively Type-II Censored Data with Random Scheme

Statistical Computing with R MATH , Set 6 (Monte Carlo Methods in Statistical Inference)

On the efficiency of two-stage adaptive designs

ISyE 6644 Fall 2014 Test 3 Solutions

1 Degree distributions and data

Better Bootstrap Confidence Intervals

Parameter Estimation

MATH4427 Notebook 2 Fall Semester 2017/2018

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Lecture 3 September 1

2.3 Methods of Estimation

Institute of Actuaries of India

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Rowan University Department of Electrical and Computer Engineering

A Few Notes on Fisher Information (WIP)

Chapter 8: Estimation 1

Chapter 2. Discrete Distributions

Basic Concepts of Inference

IIT JAM : MATHEMATICAL STATISTICS (MS) 2013

1. Point Estimators, Review

Mathematical statistics

STAT 400 Homework 09 Spring 2018 Dalpiaz UIUC Due: Friday, April 6, 2:00 PM

1 Inference, probability and estimators

Bias Variance Trade-off

STAT 461/561- Assignments, Year 2015

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Week 2: Review of probability and statistics

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Transcription:

Estimation September 24, 2018 STAT 151 Class 6 Slide 1

Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 1 A probability model for treatment outcome How can we estimate p and 1 p? Outcome Probability 1 (recovers) p 0 (does not recover) 1 p STAT 151 Class 6 Slide 2

Possible solutions Some assumptions: P(success) = p, 0 < p < 1 is the same for every trial we can combine all 100 patients to evaluate the drug efficacy The outcomes of the trials are independent of one another to simplify calculations A few possible models: P(X) 0 1 0 1 X p = 0.5 P(X) 0 1 0 1 X p = 0.6 P(X) 0 1 0 1 X p = 0.3 STAT 151 Class 6 Slide 3

Maximum likelihood estimation (MLE) Key ideas: (a) The best model for the observed data is the best model for the population (b) The best model is the most likely explanation of the observed data (c) (a) and (b) leads to the a method called maximum likelihood estimation Some notations and terminologies: We draw a (independent and identically distributed, iid) sample: X 1, X 2,..., X n to estimate p Each observation X i is an observation from a probability model, Bernoulli(p) Write PDF of X as f (X θ) for both discrete and continuous variables where θ is a generic symbol for parameter(s) For any quantity Q, we use ˆQ to denote its estimate (estimator) STAT 151 Class 6 Slide 4

MLE (2) Our data consist of (X 1,..., X 100 ) = (1, 1, 1, 0, 0,..., 0, 1) }{{} 60 1s and 40 0s The probability (likelihood) that X 1 = 1 is p The likelihood that X 2 = 1 is p The likelihood that X 3 = 1 is p The likelihood that X 4 = 0 is 1 p, etc. The likelihood that (X 1,..., X 100 )=(1, 1, 1, 0, 0,..., 0, 1) is L(p X 1,..., X 100 ) = L(p X 1 ) L(p X 2 ) L(p X 99 ) L(p X 100 ) = p p p (1 p) (1 p) (1 p) p = p 60 (1 p) 40 L(p X 1,..., X 100 ) L(p) is called a likelihood function and it is a function of p. L(p) can be considered as the likelihood of the observed data for a particular value of p The maximum likelihood estimate (MLE) of p, is the value of p that gives the highest likelihood for the observed data STAT 151 Class 6 Slide 5

Finding MLE method 1 p L(p) = p 60 (1 p) 40 0 0 0.1 1.4 10 62 0.2 1.5 10 46 0.3 2.7 10 38 0.4 1.8 10 33 0.5 7.9 10 31 0.6 5.9 10 30 0.7 6.2 10 31 0.8 1.7 10 34 0.9 1.8 10 43 1.0 0 STAT 151 Class 6 Slide 6

MLE [Ronald Aylmer (R.A.) Fisher, 1890-1962] For iid X 1,..., X n with PDF f (x θ), the likelihood of θ is: L(θ) = L(θ X 1,..., X n ) = L(θ X 1 )... L(θ X n ) n = f (X 1 θ)... f (X n θ) f (X i θ) L(θ) is the likelihood of observing X 1,..., X n for a particular θ The MLE of θ, is the value ˆθ that gives the highest likelihood for the data, among all possible values of θ MLE is usually obtained by maximizing log L(θ) l(θ). Since the logarithmic function is a monotone function of its argument, maximizing the likelihood or the log-likelihood yield the same ˆθ When possible, it is best to draw a figure of L(θ) or l(θ) STAT 151 Class 6 Slide 7

Likelihood vs. log-likelihood finding MLE method 2 Likelihood Log likelihood p 60 (1 p) 40 0e+00 2e 30 4e 30 6e 30 0.0 0.2 0.4 0.6 0.8 1.0 p 60log(p) + 40log(1 p) 250 200 150 100 0.0 0.2 0.4 0.6 0.8 1.0 p STAT 151 Class 6 Slide 8

Finding MLE method 3 X 1,..., X n iid Bernoulli(p), then f (X i p) = p Xi (1 p) 1 Xi L(p) = p 60 (1 p) 40 = n=100 l(p) = log L(p) = log = p Xi (1 p) 1 Xi n p Xi (1 p) 1 Xi n [X i log p + (1 X i )log(1 p)] MLE ˆp is the value that maximises the log-likelihood dl(p) dp dl(ˆp) = 0 p=ˆp dp STAT 151 Class 6 Slide 9 n [ Xi ˆp + 1 X ] i 1 ˆp ( 1) = 0 n n (1 ˆp) X i ˆp (1 X i ) = 0 n n (1 ˆp) X i (ˆp)n + ˆp X i = 0 ˆp = X = 60 100 = 0.6

Financial crises data 82 Mexican 84 S&L 87 91 Black Mon. Comm. RE 97 98 AsianLTCM 00 Dotcom 07 Subprime 12? Euro Data 3 3 2 1??? 1980 1990 2000 2010 2020 2030 2040 X = # crisis per unit time, e.g., a decade possible values for X : 0, 1, 2,..., assume crises occur (i) independently and (ii) at a constant rate a (probability) model for # random events over time is Poisson(λ), λ > 0 is the rate of crises per unit time How can we use the data X 1, X 2, X 3, X 4 = 3, 3, 2, 1 to learn about?? Which Poisson model is best for the data and?? What is the best λ? STAT 151 Class 6 Slide 10

Financial crises data (2) Original data (X 1, X 2, X 3, X 4 ) = (3, 3, 2, 1), let s ignore X 4 for now # crises (X ) in n = 3 decades are: (X 1, X 2, X 3 ) = (3, 3, 2) The likelihood that the first observation is 3 is λ3 3! e λ The likelihood that the second observation is 3 is λ3 3! e λ The likelihood that the third observation is 2 is λ2 2! e λ The likelihood of (X 1, X 2, X 3 )=(3,3,2) for a particular λ is L(λ) = λ3 3! e λ λ3 3! e λ λ2 2! e λ = λ3+3+2 3!3!2! e 3λ Which value of λ makes the observed data most probable? STAT 151 Class 6 Slide 11

Financial crises data (3) λ L(λ) = λ3+3+2 3!3!2! e 3λ 2.50 0.011721 2.55 0.011821 2.60 0.011884 2.65 0.011912 2.70 0.011907 2.75 0.011869 2.80 0.011799 2.85 0.011701 STAT 151 Class 6 Slide 12

Financial crises data (4) Likelihood Log likelihood exp( 3λ) λ 3+3+2 6 6 2 0.000 0.004 0.008 0.012 0 2 4 6 8 10 8 log(λ) log(6 6 2) 3λ 20 15 10 5 0 2 4 6 8 10 λ λ STAT 151 Class 6 Slide 13

Financial crises data (5) L(λ) = 3 f (X i λ) = l(λ) = 3 λ X i X i! e λ 3 {X i log(λ) λ log(x i!)} = log(λ) 3 X i 3λ 3 log(x i!). Let ˆλ be the MLE of λ, then ˆλ is determined as follows: d dλ l(ˆλ) = 1ˆλ 3 X i 3 = 0 ˆλ = 3 X i 3 = X = 8 3 STAT 151 Class 6 Slide 14

Invariance property: Financial crises data (6) MLE of λ, average # crises in a decade is ˆλ = X = 8 3 Other characteristics of X might be of interest (a) Average time between crises, E(T ) = 1/λ (recall link to Exp(λ)) Ê(T ) = 1/ˆλ = 1/( 8 ) 0.375 decades or 3.75 years 3 (b) Probability of no crises in the next decade, P(X = 0) P(X = 0) = λ0 e λ = e λ 0! P(X = 0) = e ˆλ = e 8/3 0.07 If ˆθ is the MLE of θ, then for any function, g(θ), the MLE of g(θ) is g(ˆθ). This is called the invariance property of MLE STAT 151 Class 6 Slide 15

Special cases: Financial crises data (7) Original data (X 1, X 2, X 3, X 4 ) = (3, 3, 2, 1); X 4 is an observation from 2010-2017 = 0.7 decade, X 4 is called censored. Assuming censoring is random, X 4 Poisson(0.7λ) The likelihood that the first observation is 3 is λ3 3! e λ. The likelihood that the second observation is 3 is λ3 3! e λ. The likelihood that the third observation is 2 is λ2 2! e λ. The likelihood that the fourth observation is 1 is (0.7λ)1 e 0.7λ. 1! The likelihood that (X 1, X 2, X 3, X4 )=(3,3,2,1) is L(λ) = λ3 3! e λ λ3 3! e λ λ2 which is the new likelihood 2! e λ (0.7λ)1 1! e 0.7λ = 0.7λ3+3+2+1 e 3.7λ, 3!3!2!1! STAT 151 Class 6 Slide 16

Financial crises data (8) L(λ) = 0.7λ3+3+2+1 e 3.7λ 3!3!2!1! l(λ) = log[l(λ)] = log(0.7) + 9 log(λ) 3.7λ log(3!3!2!1!) Let ˆλ be the MLE of λ, then ˆλ is determined as follows: d dλ l(ˆλ) = 9ˆλ 3.7 = 0 ˆλ = Total # events {}}{ 9 3.7 }{{} Total time STAT 151 Class 6 Slide 17

Pandemic data (2) MLE suggests estimating p using ˆp = X = n X i n ˆp = X is called an estimator because it can be applied to any sample X 1,..., X n Our sample (X 1,..., X 100 ) = (1, 1, 1, 0, 0,, 0, 1) gives ˆp = 1 + 1 + 1 + 0 + 0 +... + 0 + 1 100 so our estimate of p is 0.6 = 60 100 An estimate is the value of an estimator applied to a particular sample STAT 151 Class 6 Slide 18

Estimate vs. estimator Using a sample (X 1,..., X 100 ) = (1, 1, 1, 0, 0,, 0, 1), our estimate of p is ˆp = X = 0.6 Our estimate come from a sample. Its sampling error estimate parameter = 0.6 p is unknown and not estimable since p is unknown. We study the performance of the estimator that produces our estimate one sample {}}{ ˆp p =? many samples {}}{ E( ˆp p) = Average sampling error = Bias E[{ˆp E(ˆp)} 2 ] = Differences in estimates between samples = Variance E(ˆp p) 2 = Average distance of estimates to p = MSE STAT 151 Class 6 Slide 19

Bias average sampling error For an estimator ˆθ of θ, the bias is the average sampling error using ˆθ over different samples of size n bias(ˆθ) = }{{} E (ˆθ θ) average An estimator is unbiased if bias(ˆθ) = 0. Otherwise, it is biased. A biased estimator systematically overestimates or underestimates θ Some estimators may be biased when the sample size n is small but bias(ˆθ) 0 for large values of n. Those estimators are called consistent estimators. In practice, it is often sufficient to look for a consistent rather than an unbiased estimator Unbiased Biased θ Estimates from different samples ˆθ ˆθ STAT 151 Class 6 Slide 20

Variance does the estimate vary much with the sample? The variance measures how ˆθ estimates the same θ using different samples of size n: var(ˆθ) = Recall that var(ˆθ) is the sampling variation }{{} E [{ˆθ E(ˆθ)} 2 ] average A large var(ˆθ) suggests the estimator s estimate of the (same) unknown θ varies a lot with the sample chosen. So an estimator with a large variance is bad Small variance Large variance Estimates from different samples ˆθ E(ˆθ) ˆθ Note that the reference is E(ˆθ), not θ, so an estimator with a small variance does not guarantee that its estimate will be close to the unknown θ STAT 151 Class 6 Slide 21

Mean squared error (MSE) is our estimate close to the unknown? Mean square error (MSE) measures, on average, the distance between the estimate and θ: MSE(ˆθ) = {bias(ˆθ)} 2 + var(ˆθ) For an unbiased estimator, bias(ˆθ) = 0 and MSE(ˆθ) = var(ˆθ), for all n For a consistent estimator, bias(ˆθ) 0 and MSE(ˆθ) var(ˆθ), for large n For consistent or unbiased estimators, variance is the best measure of performance We illustrate the concept using unbiased estimators, so MSE(ˆθ) = var(ˆθ) Small MSE Large MSE θ Estimates from different samples ˆθ ˆθ Estimator with lower MSE has a higher chance of producing an estimate close to θ STAT 151 Class 6 Slide 22

Bias vs. Variance: Financial crises data (9) Two estimators of λ : (a) ˆλ = X1+X2+X3 3 (b) ˆλ = X1+X2+X3+X 4 3.7 } Both are MLE; which is better? (1) Recall for Y Poisson(µ), E(Y ) = var(y ) = µ (2) X 1 +X 2 +X 3 +X4 {# events in (3 + t) decades} Poisson((3+t)λ), t > 0: ( X1 + X 2 + X 3 + X ) 4 E = E(X 1 + X 2 + X 3 + X4 ) (3 + t)λ = = λ Bias = 0 3 + t 3 + t 3 + t var ( X1 + X 2 + X 3 + X 4 3 + t (3) Variance decreases with a larger t (b) is better (4) Using a larger sample is better (c.f., Class 7) ) = var(x 1 + X 2 + X 3 + X4 ) (3 + t)λ (3 + t) 2 = (3 + t) 2 = λ 3 + t, t > 0 STAT 151 Class 6 Slide 23

Summary Consistent or unbiased estimators are desirable Among consistent or unbiased estimators, the estimator with the smallest variance is efficient Under most circumstances, as the sample size increases (asymptotically), if ˆθ is the MLE and ˆθ is any other unbiased estimator of θ var(ˆθ) var(ˆθ ) ˆθ is at least as good as ˆθ so the MLE is efficient Invariance: if we are interested in estimating any function of θ, say g(θ), the following also holds var[g(ˆθ)] var[g(ˆθ )] g(ˆθ) is at least as good as g(ˆθ ) STAT 151 Class 6 Slide 24