Parameter Estimation

Similar documents
Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Introduction to Estimation Methods for Time Series models Lecture 2

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

The loss function and estimating equations

Chapter 3: Maximum Likelihood Theory

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Linear Model Under General Variance

Introduction to Maximum Likelihood Estimation

Estimation, Inference, and Hypothesis Testing

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Chapter 3. Point Estimation. 3.1 Introduction

6. MAXIMUM LIKELIHOOD ESTIMATION

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Chapter 4: Unconstrained nonlinear optimization

System Identification, Lecture 4

System Identification, Lecture 4

Concentration of Measures by Bounded Couplings

ECE531 Lecture 10b: Maximum Likelihood Estimation

λ(x + 1)f g (x) > θ 0

Maximum Likelihood Estimation

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Estimators as Random Variables

Central Limit Theorem ( 5.3)

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Problem 1 (20) Log-normal. f(x) Cauchy

Parametric Techniques Lecture 3

STA 260: Statistics and Probability II

Mathematical statistics

STAT215: Solutions for Homework 2

Parametric Techniques

Graduate Econometrics I: Unbiased Estimation

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

2 Statistical Estimation: Basic Concepts

Concentration of Measures by Bounded Size Bias Couplings

ECE 275A Homework 7 Solutions

Exercises and Answers to Chapter 1

A Very Brief Summary of Statistical Inference, and Examples

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Brief Review on Estimation Theory

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Support Vector Machines

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Estimation Theory Fredrik Rusek. Chapters

An Introduction to Parameter Estimation

Greene, Econometric Analysis (6th ed, 2008)

2.3 Methods of Estimation

Econometrics I, Estimation

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

1. Point Estimators, Review

Statistical inference

Maximum Likelihood Estimation

1 Appendix A: Matrix Algebra

A General Overview of Parametric Estimation and Inference Techniques.

A Few Notes on Fisher Information (WIP)

( 1 k "information" I(X;Y) given by Y about X)

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Financial Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

STAT 512 sp 2018 Summary Sheet

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

V. Properties of estimators {Parts C, D & E in this file}

Advanced Quantitative Methods: maximum likelihood

Regression Estimation Least Squares and Maximum Likelihood

6.1 Variational representation of f-divergences

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Advanced Signal Processing Introduction to Estimation Theory

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Estimation of Dynamic Regression Models

Generalized Linear Models. Kurt Hornik

Lecture 7 Introduction to Statistical Decision Theory

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

ST5215: Advanced Statistical Theory

Advanced Quantitative Methods: maximum likelihood

y = 1 N y i = 1 N y. E(W )=a 1 µ 1 + a 2 µ a n µ n (1.2) and its variance is var(w )=a 2 1σ a 2 2σ a 2 nσ 2 n. (1.

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Mathematical statistics

Chapters 9. Properties of Point Estimators

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Math 152. Rumbos Fall Solutions to Assignment #12

Maximum Likelihood (ML) Estimation

Least Squares Estimation-Finite-Sample Properties

F & B Approaches to a simple model

DA Freedman Notes on the MLE Fall 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

David Giles Bayesian Econometrics

Modern Methods of Data Analysis - WS 07/08

Statistics. Statistics

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Lecture 3 September 1

Transcription:

Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y t, t = 1,,, are independently and identically distributed (often written i.i.d. ). his means that the joint probability density function of the sample is f (y 1, y 2,, y ) = f(y 1 ) f(y 2 ) f(y 3 ) f(y ) where f(y t ) is the marginal probability function of y t, t = 1, 2,,. Estimators Consider the estimation of a (K 1) parameter vector θ based on a sample (y 1, y 2,, y ). An estimator of the (K 1) parameter vector θ is a function θ e (y 1, y 2,, y ), where y t is the t-th random variable in the sample. Since it is a function of random variables, an estimator θ e (y 1, y 2,, y ) is itself a vector of random variables. An estimate of the (K 1) parameter vector θ is given θ e (y 1, y 2,, y ), where y t is the observed value of the random variable y t, t = 1, 2,,. Since the y t s are observed, the estimate θ e (y 1, y 2,, y ) is not a vector of random variables. he observed values y t s depend on the sample, implying that the estimate q e (y 1, y 2,, y ) will take a specific value for a given sample, but vary across samples. How do we identify the function q e ( )? he Method of Moments: Consider the r-th sample moment of the random variable Y µ e r = y r t /, r = 1, 2,, K. Assume that the K sample moments of Y are related to the (K 1) parameter vector θ by the known function: µ r = h r (θ), r = 1, 2,, K. he method of moments consists in equating the sample moments µ r e with the true moments µ r, and solving the resulting system of K equations, µ r e = h r (θ), r = 1,, K, for q e. he resulting estimator is the method of moment estimator.

Let y 1, y 2,, y be a sample, where y t is distributed with mean β and variance σ 2, t = 1, 2,, or y ~(β σ 2 ). herefore θ = (β, σ 2 ). We have: E(Y) = µ 1 = β, and E[(Y - β) 2 ] = E(Y 2 ) E(Y) 2 = µ 2 - (µ 1 ) 2 = σ 2. hen, equating the sample moments to the population moments gives µ e 1 = y t / = β µ 2 e = y t 2 / = σ 2 + β 2. Solving these two equations for θ = (β, σ 2 ) gives the method of moment estimator for the mean β: β m = y t /, and the variance σ 2 : 2 σ m = ( y 2 t /) - (β m ) 2 = (y t - β m ) 2 /. he Maximum Likelihood Method Let the joint probability density function of the sample (y 1, y 2,, y ) be f (y 1, y 2,, y θ), where θ is a (K 1) vector of unknown parameters that belongs to the parameter space Ω, θ Ω. Define the likelihood function of the sample as l(θ y 1, y 2,, y ) = f (y 1, y 2,, y θ). he maximum likelihood estimator is the value θ l e that solves the following maximization problem Max θ [l(θ Y 1, Y 2,, Y ): θ Ω]. Define the log-likelihood function of the sample as L = ln[l(θ y 1, y 2,, y )]. Since the logarithmic function is monotonic, l( ) and [ln l( )] attain their maxima at the same value of θ. As a result, it will often be convenient to define the maximum likelihood estimator of θ as the value θ l e that Max θ L = ln[l(θ Y 1, Y 2,, Y )], θ Ω. Let (y 1, y 2,, y ) be a random sample, where y t N(β, σ 2 ), θ = (β, σ 2 ), σ 2 > 0. hus the probability density function for Y t is f(y t β, σ 2 ) = exp[(-1/2) (y t - β) 2 /σ 2 ]/[2π σ 2 ] 1/2 and l(β, σ 2 y 1, y 2,, y )] = f (y 1, y 2,, y θ) = Π f(y t β, σ 2 ). 2

It follows that the log-likelihood function of the sample is L = ln[l(β, σ 2 Y 1, Y 2,, Y )] = ln[ Π f(y t β, σ 2 ) ] = ln[ Π exp[(-1/2) (Y t - β) 2 /σ 2 ]/[2π σ 2 ] 1/2 ] = t= 1[(-1/2) (Y t - β) 2 /σ 2 - (1/2) ln[2π σ 2 ]] = (-/2) ln(2π) - (/2) ln(σ 2 ) - (1/2) t= 1(Y t - β) 2 /σ 2. Note that L is a concave function of θ = (β, σ 2 ). he first-order conditions for a maximum with respect to θ = (β, σ 2 ) are: L/ β = [( y t ) - β]/σ 2 = 0 L/ (σ 2 ) = -/(2 σ 2 ) + [ (y t - β) 2 ]/(2 σ ). Solving these two equations for θ = (β, σ 2 ) gives the maximum likelihood estimator for the mean β: β l = y t /, and the variance σ 2 : σ 2 l = (y t - β e l ) 2 /. he maximum likelihood method requires knowing the probability distribution of the y t s. (Other methods can be less demanding ) In this case, β l = β m, and σ l 2 = σ m 2, i.e. the method of moment and the maximum likelihood estimation method give identical estimators for θ = (β, σ 2 ). Least-Squares Method Assume that we know a function h(θ) satisfying E(Y t ) = h t (θ), t = 1, 2,,, where θ is a (K 1) vector of parameters, θ Ω. Define the error term e t = y t - h t (θ). It follows that e t is a random variable (since it is a function of the random variable y t ) and has mean zero: E(e t ) = E(y t ) - h t (θ) = 0. Define the error sum of squares, S: S(y 1, y 2,, y, θ) = [y t - h t (θ)] 2. he least squares estimator of θ is the value θ s e that solves the following minimization problem Min θ [S(y 1, y 2,, y, θ): θ Ω]. Let y 1, y 2,, y be a sample, where Y t is distributed with mean β and some finite variance, t = 1, 2,,. Given E(y t ) = β, let h t (β) = β. hen, the least squares estimator of β is obtained by minimizing S = (y t - β) 2. Note that S is a convex function of β. he first-order necessary condition for a minimum of S is: 3

S/ β = -2 (y t - β) = 0. Solving this equation for β gives the least squares estimator of β β s = y t /. Note: In this case, β l = β m = β s, i.e. the method of moment, the maximum likelihood estimation method, and the least squares method all give identical estimators for the mean β. Properties of Estimators Again, we consider an estimator θ e (y 1, y 2,, y ) for a (K 1) parameter vector θ based on a sample (y 1, y 2,, y ). Finite sample properties Based on sample of size Unbiased estimator An estimator θ e (y 1, y 2,, y ) of θ is unbiased if E(θ e ) = θ. If E(θ e ) θ, then the estimator θ e is said to be biased, its bias being [E(θ e ) - θ] 0. Efficient estimator An estimator θ e (y 1, y 2,, y ) of θ is efficient if it is unbiased and if it has the smallest possible variance among all unbiased estimators. Cramer-Rao Lower Bound An unbiased estimator θ e is efficient if its variance satisfies V(θ e ) = -[E( 2 L(θ)/ θ 2 )] -1 = I(θ) -1 where: L(θ) = ln[l(θ Y 1, Y 2,, Y )] is the log-likelihood function of the sample I(θ) = -[E( 2 L(θ)/ θ 2 )] is a (K K) matrix called the information matrix and -[E( 2 L(θ)/ θ 2 )] -1 = I(θ) -1 is called the Cramer-Rao lower bound. Note: his requires knowing the probability function of the Y t s. Best Linear Unbiased Estimator (BLUE) An estimator θ e is best linear unbiased: if it is linear, i.e. if θ e = a t Y t, for some a t s, if it is unbiased, i.e. if E(θ e ) = θ, and if it has the smallest variance among all linear unbiased estimators. his does not require knowing the probability function of the y t s.

Let (y 1, y 2,, y ) be a random sample of size, where y t ~ (β, σ 2 ) t = 1, 2,,. By definition of the mean, we have E(y t ) = β. Consider the following estimator of the mean β β e = y t /. We have E(β e ) = E( y t /) = It follows that β e = E(y t )/ = β/ = β. y t / is an unbiased estimator of β. he variance of β e is V(β e ) = V( y t / ) = (1/ 2 ) V(y t ) (his assumes the y t s are independent in a random sample) = (1/ 2 ) σ 2 since V(y t ) = σ 2, t = (1,2,, ) = (1/ 2 ) σ 2 = σ 2 / = V(β e ). Noting that the estimator β e = y t / is linear, it can be shown to have the smallest variance among all linear unbiased estimators. hus, β e is the best linear unbiased estimator (BLUE) of β. If we know that y t is normally distributed (i.e., y t N(β, σ 2 )), then if can be shown that I(θ) = -E[ 2 L/ θ 2 0 2 ] = σ, where θ = (β, σ 2 ). 0 2σ It follows that the Cramer-Rao lower bound is 2 σ I(θ) -1 0 =, 2σ 0 Since the variance of β e, V(β e ) = σ 2 /, is equal to the Cramer-Rao lower bound, it follows that β e is efficient. his means that, under the normality assumption, β e has the smallest variance among all unbiased estimators (whether they are linear or not). Since β e = β m = β l = β s = y t /, it follows that the estimator of the mean β obtained from either the method of moments, the maximum likelihood method, or the least square method, is unbiased, BLUE, as well as efficient under a normal distribution. 5

Let Y 1, Y 2,, Y be a random sample of size, where Y t is distributed with mean β and variance σ 2, t = 1, 2,,. Consider the following estimator of the variance σ 2 (σ 2 ) e = (Y t - β e ) 2 /, where β e = Y t /. Since (σ 2 ) e = σ 2 m = σ 2 l, the estimator (σ 2 ) e is identical to both the method of moment estimator and the maximum likelihood estimator (under normality) of σ 2. Note that E(σ 2 ) e = E[(1/) = E[(1/) = E[(1/) (Y t - β e ) 2 ] [(Y t - β) - (β e - β)] 2 ] [(Y t - β) 2 + (β e - β) 2-2 (Y t - β)(β e - β)]] = E[ (Y t - β) 2 / + (β e - β) 2-2 (Y t - β)(β e - β)/] = E[ (Y t - β) 2 / + (β e - β) 2-2 (β e - β) 2 ] since β e = = E[ (Y t - β) 2 / - (β e - β) 2 ] = = Y t / E(Y t - β) 2 / - E(β e - β) 2 σ 2 / - V(β e ), since σ 2 = E(Y t - β) 2, and E(β e - β) 2 = V(β e ) = σ 2 - σ 2 /, since V(β e ) = σ 2 / = σ 2 (-1)/. It follows that E(σ 2 ) e = [(-1)/] σ 2 < σ 2. his implies that (σ 2 ) e = σ 2 m = σ 2 l is a biased estimator of the variance σ 2. hus, the estimation of the variance σ 2 obtained by either the method of moment or the maximum likelihood method gives biased estimator. his suggests that an unbiased estimator of the variance σ 2 is σ u 2 = (Y t - β e ) 2 /(-1), where β e = Y t /. he estimator σ u 2 is unbiased since σ u 2 = (σ 2 ) e [/(-1)], which implies E(σ u 2 ) = E(σ 2 ) e [/(-1)] = σ 2. Asymptotic Properties he sample size becomes large, approaches infinity Again, we consider an estimator θ e (y 1, y 2,, y ) for a (K 1) parameter vector θ based on a sample of size. Consistent estimator An estimator θ e of θ is said to be consistent if lim P( θ e - θ < ε) = 1 where ε is an arbitrarily small positive number. Equivalently, the estimator θ e is consistent if it converges in probability to the constant θ, where θ is said to be the probability limit of θ e : 6

plim θ e = θ. Note: Sufficient conditions for θ e to be a consistent estimator of θ are that lim E(θ e ) = θ (i.e. θ e is asymptotic unbiased) and lim V(θ e ) = 0. Let (y 1, y 2,, y ) be a sample where Y t has mean β and variance σ 2. Consider the estimator β e = y t / of β. We have shown that the estimator β e has mean β and variance σ 2 /. We know that the estimator β e is unbiased (i.e., E(β e ) = β) for any sample size. It is thus also asymptotically unbiased (as becomes large). In addition its variance V(β e ) = σ 2 / clearly goes to zero as becomes large. hus, β e = Y t / is a consistent estimator of β. Central Limit heorem Let (y 1, y 2,, y ) be a random sample where y t ~ (β, σ 2 ), t = 1, 2,,. Let β e = Y t /. hen, as, () 1/2 (β e - β) converges in distribution to a N(0, σ 2 ) random variable: () 1/2 (β e - β) d N(0, σ 2 ). Implications: his result obtains for any distribution of the y t s. he Central Limit heorem says that, if the sample size is reasonably large (e.g., > 0), then () 1/2 (β e - β) is approximately normally distributed with mean 0 and variance σ 2. Equivalently stated, when is reasonably large, β e is approximately normally distributed: β e N(β, σ 2 /). Note that this result is consistent with our earlier results that β e has mean β and variance σ 2 /. What is new here is the asymptotic normality of β e (or of () 1/2 (β e - β)) for any distribution of the Y t s. Asymptotic Efficiency An estimator θ e of θ is said to be asymptotically efficient if it is consistent and if it has the smallest possible asymptotic variance among all consistent estimators. An estimator θ e of θ is asymptotically efficient if it satisfies () 1/2 (θ e - θ) d N(0, lim [(1/) I(θ)] -1 ) where I(θ) = -E[ 2 L/ θ 2 ] is the information matrix defined above. Under fairly general conditions, the maximum likelihood estimator θ l of θ is Consistent Asymptotically Normal 7

Asymptotically Unbiased Asymptotically Efficient Let (y 1, y 2,, y ) be a random sample of size, where y t ~ N(β, σ 2 ), t = 1, 2,,. he maximum likelihood estimator θ e l = (β l, σ 2 l ) of θ = (β, σ 2 ) is β l = Y t / and σ 2 l = (1/) (Y t - β l ) 2. From the above results, the maximum likelihood estimator θ e l = (β l, σ 2 l ) of θ is consistent, asymptotically normal, and asymptotically efficient. In addition, we have seen that the information matrix is I(θ) = -E[ 2 L/ θ 2 ] = 0 2 σ. It follows that the asymptotic distribution of θ e l = (β l, σ 2 l ) is 0 2σ () 1/2 (θ e l - θ) d N(0, lim [(1/) I(θ)] -1 2 ) = σ 0 N 0,. 0 2σ his shows that the asymptotic variance of θ e l is: V(β l ) σ 2 / (which is identical to the one derived earlier) and V(σ 2 l ) 2 σ / (which is a new result). 8