STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Similar documents
STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Mathematical statistics

ECE 275A Homework 7 Solutions

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Mathematical statistics

MAS113 Introduction to Probability and Statistics

Loglikelihood and Confidence Intervals

Chapters 9. Properties of Point Estimators

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Review. December 4 th, Review

Statistics 135 Fall 2007 Midterm Exam

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Mathematical statistics

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Using R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

HT Introduction. P(X i = x i ) = e λ λ x i

Theory of Statistics.

STA 260: Statistics and Probability II

Introduction to Bayesian Methods

Review and continuation from last week Properties of MLEs

Solution-Midterm Examination - I STAT 421, Spring 2012

Math 494: Mathematical Statistics

Central Limit Theorem ( 5.3)

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Chapter 8.8.1: A factorization theorem

Generalized Linear Models Introduction

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Probability and Estimation. Alan Moses

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Survival Analysis. Stat 526. April 13, 2018

STAT 512 sp 2018 Summary Sheet

1 Degree distributions and data

2017 Financial Mathematics Orientation - Statistics

Choosing among models

Parameter Estimation

Problem 1 (20) Log-normal. f(x) Cauchy

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Maximum Likelihood Large Sample Theory

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Review of Discrete Probability (contd.)

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Regression Estimation Least Squares and Maximum Likelihood

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Problem Selected Scores

MATH4427 Notebook 2 Fall Semester 2017/2018

Lecture 13 Fundamentals of Bayesian Inference

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Statistics and Econometrics I

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

BTRY 4090: Spring 2009 Theory of Statistics

5.2 Fisher information and the Cramer-Rao bound

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Evaluating the Performance of Estimators (Section 7.3)

Chapter 3: Maximum Likelihood Theory

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

A Very Brief Summary of Statistical Inference, and Examples

STA 2201/442 Assignment 2

Statistics & Data Sciences: First Year Prelim Exam May 2018

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

SUFFICIENT STATISTICS

POLI 8501 Introduction to Maximum Likelihood Estimation

Maximum Likelihood Estimation

DA Freedman Notes on the MLE Fall 2003

Hypothesis testing: theory and methods

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Robustness and Distribution Assumptions

Parametric Inference

ECE531 Lecture 10b: Maximum Likelihood Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

CSE 512 Machine Learning: Homework 1

IE 303 Discrete-Event Simulation

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

STAT 830 Bayesian Estimation

Statistics. Statistics

Statistics 3858 : Maximum Likelihood Estimators

Better Bootstrap Confidence Intervals

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Graduate Econometrics I: Maximum Likelihood I

Statistics Ph.D. Qualifying Exam

First Year Examination Department of Statistics, University of Florida

Lecture 10: Generalized likelihood ratio test

Week 2: Review of probability and statistics

5601 Notes: The Sandwich Estimator

All other items including (and especially) CELL PHONES must be left at the front of the room.

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Module 6: Methods of Point Estimation Statistics (OA3102)

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Transcription:

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015

Maximum likelihood estimation (a reminder)

Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,..., X n, where the X i are IID. Then the Maximum likelihood estimator for θ: calculate a single value which estimates the true value of θ 0 by maximizing the likelihood function with respect to θ i.e. find the value of θ that maximizes the likelihood of observing the data given. How do we write down the likelihood function? The (non-rigorous) idea: lik(θ) = P(X 1 = x 1,..., X n = x n ) = P(X 1 = x 1 )...P(X n = x n ) n = f θ (X i ) i=1

Maximum likelihood estimation What is the likelihood function? The likelihood function, lik(θ), is a function of θ which corresponds to the probability of observing our sample for various values of θ. How to find the value of θ that maximizes the likelihood function?

Maximum likelihood estimation: Asymptotic results Asymptotic results: what happens when our sample size, n, gets really large (n )

It turns out that the MLE has some very nice asymptotic results 1. Consistency: as n, our ML estimate, ˆθ ML,n, gets closer and closer to the true value θ 0. 2. Normality: as n, the distribution of our ML estimate, ˆθ ML,n, tends to the normal distribution (with what mean and variance?).

It turns out that the MLE has some very nice asymptotic results 1. Consistency: as n, our ML estimate, ˆθ ML,n, gets closer and closer to the true value θ 0. 2. Normality: as n, the distribution of our ML estimate, ˆθ ML,n, tends to the normal distribution (with what mean and variance?).

1. Consistency An estimate, ˆθ n, of θ 0 is called consistent if: ˆθ n p θ0 as n where ˆθ n p θ0 technically means that, for all ɛ > 0, P( ˆθ n θ 0 > ɛ) 0 as n But you don t need to worry about that right now... just think of it as as n gets really large, the probability that ˆθ n differs from θ 0 becomes increasingly small.

1. Consistency The MLE, ˆθ ML,n is a consistent estimator for the parameter, θ, that it is estimating, so that ˆθ ML,n p θ0 as n This nice property also implies that the MLE is asymptotically unbiased: E(ˆθ ML,n ) θ 0 as n

It turns out that the MLE has some very nice asymptotic results 1. Consistency: as n, our ML estimate, ˆθ ML,n, gets closer and closer to the true value θ 0. 2. Normality: as n, the distribution of our ML estimate, ˆθ ML,n, tends to the normal distribution (with what mean and variance?).

2. Normality An estimate, ˆθ n, of θ is called asymptotically normal if, as n, we have that ˆθ n N(µ θ0, σ 2 θ 0 ) where θ 0 is the true value of the parameter θ. What else have we seen with this property?

2. Normality It turns out that our ML estimate, ˆθ ML,n, of θ is asymptotically normal: as n, we have that ˆθ ML,n N(µ θ0, σ 2 θ 0 ) We want to find out, what are µ θ0 and σ 2 θ 0?

2. Normality First, here is a fun definition of Fisher Information [ ( ) ] 2 I (θ 0 ) = E θ log(f θ(x)) θ0 or alternatively, [ 2 ] I (θ 0 ) = E 2 θ log(f θ(x)) θ0 What the #$%& does that mean? (we will soon find that the asymptotic variance is related to this quantity)

2. Normality Fisher Information: [ 2 ] I (θ 0 ) = E 2 θ log(f θ(x)) θ0 Wikipedia says that Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ upon which the probability of X depends

2. Normality (example) Recall last week we showed that if we have a sample X 1, X 2,..., X n where X i Bernoulli(p 0 ) for each i = 1, 2,..., n, then ˆp MLE = X n What is the fisher information for X i? [ 2 ] I (p 0 ) = E 2 p log(f p(x)) p0

2. Normality (example) X i Bernoulli(p 0 ) for each i = 1, 2,..., n. What is the fisher information for X? [ 2 ] I (p 0 ) = E 2 p log(f p(x )) p0 f p (X ) = p X (1 p) 1 X [ 2 ] I (p 0 ) = E 2 p log(f p(x )) = p0 1 p 0 (1 p 0 )

2. Normality It turns out that our ML estimate, ˆθ ML,n, of θ is asymptotically normal: as n, we have that ˆθ ML,n N(µ θ0, σ 2 θ 0 ) We want to find out, what are µ θ0 and σθ 2 0? Any ideas as to what µ θ0 might be? (Hint: what is the asymptotic expected value of ˆθ ML,n?)

2. Normality ˆθ ML,n, of θ is asymptotically normal: as n, we have that ˆθ ML,n N(µ θ0, σ 2 θ 0 ) The consistency of ˆθ ML,n tells us that ˆθ ML,n p θ0, so as n, E(ˆθ ML,n ) E(θ 0 ) = θ 0 Thus the asymptotic mean of the MLE is given by µ θ0 = θ 0

2. Normality ˆθ ML,n, of θ is asymptotically normal: as n, we have that ˆθ ML,n N(µ θ0, σ 2 θ 0 ) The asymptotic variance of the MLE is given by σ 2 θ 0 = 1 ni (θ 0 )

2. Normality So in summary, we have: ˆθ ML,n, of θ is asymptotically normal: as n, we have that ˆθ ML,n N ( ) 1 θ 0, ni (θ 0 )

(example) For large samples, the ML estimate of θ is approximately normally distributed: ˆθ ML,n N ( ) 1 θ 0, ni (θ 0 ) For our X i Bernoulli(p 0 ), i = 1,..., n example. Recall: I (p 0 ) = ˆp ML = X n 1 p 0 (1 p 0 ) Thus, when X i Bernoulli(p 0 ), for large n ˆp ML = X n N ( p 0, p ) 0(1 p 0 ) n Why do we believe this result? How else could we have obtained it?

Exercise

(exercise) In class, you showed that if we have a sample X i Poisson(λ 0 ), the MLE of λ is ˆλ ML = X n = 1 n X i n 1. What is the asymptotic distribution of ˆλ ML (You will need to calculate the asymptotic mean and variance of ˆλ ML )? i=1 2. Generate N = 10000 samples, X 1, X 2,..., X 1000 of size n = 1000 from the Poisson(3) distribution. 3. For each sample, calculate the ML estimate of λ. Plot a histogram of the ML estimates 4. Calculate the variance of your ML estimate, and show that this is close to the asymptotic value derived in part 1

Method of Moments (MOM) (An alternative to MLE)

Method of Moments Maximum likelihood estimator for θ: calculate a single value which estimates the true value of θ by maximizing the likelihood function with respect to θ. Method of moments estimator for θ: By equating the theoretical moments to the empirical (sample) moments, derive equations that relate the theoretical moments to θ. The equations are then solved for θ. Suppose X follows some distribution. The kth moment of the distribution is defined to be µ k = E[X k ] = g k (θ) which will be some function of θ.

Method of Moments MOM works by equating the theoretical moments (which will be a function of θ) to the empirical moments. Moment Theoretical Moment Empirical Moment first moment E[X ] second moment E[X 2 ] third moment E[X 3 ] n i=1 X i n n i=1 X i 2 n n i=1 X i 3 n

Method of Moments MOM is perhaps best described by example. Suppose that X Bernoulli(p). Then the first moment is given by E[X ] = 0 P(X = 0) + 1 P(X = 1) = p Moreover, we can estimate the E[X ] by taking a sample X 1,..., X n and calculating the sample mean : X = 1 n n i=1 X i We approximate the first theoretical moment, E[X ], by the first empirical moment, X, i.e. ˆp MOM = X which is the same as the MLE estimator! (note that this is not always the case...)

Exercise

Exercise Question 43, Chapter 8 (page 320) from Rice The file gamma-arrivals contains a set of gamma-ray data consisting of the times between arrivals (interarrival times) of 3,935 photons (units are seconds) The gamma distribution can be written as f α,β (x) = βα Γ(α) x α 1 e βx 1. Make a histogram of the interarrival times. Does it appear that a gamma distribution would be a plausible model? 2. Fit the parameters by the method of moments and by maximum likelihood. How do the estimates compare? (Hint: the MLE for α has no closed-form solution use ˆα MLE = 1) 3. Plot the two fitted gamma densities on top of the histogram. Do the fits look reasonable?