Chapter 7 - Section 8, Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd

Similar documents
Chapter 6 - Sections 5 and 6, Morris H. DeGroot and Mark J. Schervish, Probability and

Large Sample Properties of Estimators in the Classical Linear Regression Model

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Maximum Likelihood (ML) Estimation

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Exercises and Answers to Chapter 1

Primer on statistics:

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

Parameter Estimation

Finite. U-46 Curriculum Scope and Sequence. Reporting Strand Instructional Focus Standards Semester. Represent linear equations in matrices.

Let X and Y denote two random variables. The joint distribution of these random

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Tests and Their Power

1. Fisher Information

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Estimation Theory Fredrik Rusek. Chapters

Topic 12 Overview of Estimation

Econ 583 Final Exam Fall 2008

. Find E(V ) and var(v ).

A Few Notes on Fisher Information (WIP)

Chapter 5 Class Notes

Regression. Oscar García

Least Squares Regression

Minimum Error-Rate Discriminant

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Chapter 3: Maximum Likelihood Theory

Exercises Chapter 4 Statistical Hypothesis Testing

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Mathematical statistics

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

The Multivariate Gaussian Distribution [DRAFT]

LIST OF FORMULAS FOR STK1100 AND STK1110

Multiple Random Variables

Notes on the Multivariate Normal and Related Topics

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

First Year Examination Department of Statistics, University of Florida

Tutorial: Statistical distance and Fisher information

MATH4427 Notebook 2 Fall Semester 2017/2018

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Inferring from data. Theory of estimators

Random variables (discrete)

Chapter 3. Point Estimation. 3.1 Introduction

Econometrics I, Estimation

STATISTICS SYLLABUS UNIT I

Lecture Notes Part 2: Matrix Algebra

Multivariate Distributions

1. Point Estimators, Review

TAMS39 Lecture 2 Multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

2 (Statistics) Random variables

On the convergence of the iterative solution of the likelihood equations

5 Operations on Multiple Random Variables

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

V. Properties of estimators {Parts C, D & E in this file}

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Notes on Random Vectors and Multivariate Normal

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Solution. (i) Find a minimal sufficient statistic for (θ, β) and give your justification. X i=1. By the factorization theorem, ( n

Discrete Probability Refresher

1 INFO Sep 05

STAT 730 Chapter 4: Estimation

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Mathematical statistics

(Multivariate) Gaussian (Normal) Probability Densities

Theory of Maximum Likelihood Estimation. Konstantin Kashin

ECE 275A Homework 6 Solutions

An Introduction to Parameter Estimation

The Logit Model: Estimation, Testing and Interpretation

Asymptotic Statistics-III. Changliang Zou

01 Probability Theory and Statistics Review

Estimation of Parameters of the Weibull Distribution Based on Progressively Censored Data

Efficient Monte Carlo computation of Fisher information matrix using prior information

HIGHER ORDER CUMULANTS OF RANDOM VECTORS, DIFFERENTIAL OPERATORS, AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES

M.Sc. (F) Mathematics Model Paper - A Paper -I (Analysis and Advanced Calculus) Time allowed : 3 Hrs Max. Marks : 100

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

2. Matrix Algebra and Random Vectors

Parametric Models: from data to models

ECE 275A Homework 7 Solutions

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Probability and Estimation. Alan Moses

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Introduction to Probability Theory for Graduate Economics Fall 2008

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Math 3013 Problem Set 6

Quick Tour of Basic Probability Theory and Linear Algebra

Probability Theory and Statistics. Peter Jochumzen

A. Motivation To motivate the analysis of variance framework, we consider the following example.

Partial effects in fixed effects models

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Transcription:

References Chapter 7 - Section 8, Mris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3 rd Edition, Addison-Wesley, Boston. Chapter 5 - Section 1.3, Bernard W. Lindgren, Statistical They, 3 rd Edition, MacMillan, New Yk. Properties of the Sce Equations The expected value of the sce equations is zero. Since the probability of the entire sample space is one, we have: If(X)dX = 1 where the integral denote an n-dimensional integral over the sample space of X. Note that f(x) is a function of 2. Differentiating with respect to the vect 2 gives M[If(X)dX ]/M2 = 0 Assuming that the der of differentiation and integration may be reversed: I[Mf(X)/M2]dX = 0 (The region of integration should not be a function of 2. See Amemiya f details.) The chain rule gives Mlnf(X)/M2 = [f(x)] -1 [Mf(X)/M2] Substituting into the previous equation gives: I[Mlnf(X)/M2]f(X)dX = 0 since Mlnf(X)/M2 = MlnL(2)/M2 I[MlnL(2)/M2]f(X)dX = 0 which states E X S(2) = 0

The Infmation Matrix The Infmation Matrix is the variance covariance matrix of the sce. Since E X S(2) = 0, I(2) = E X {[MlnL(2)/M2][MlnL(2)/M2']} by definition. The Infmation Matrix may be expressed in terms of the Hessian matrix of lnl(2). Since the mean sce is zero, we have: I[MlnL(2)/M2]f(X)dX = 0 Differentiating with respect to 2', again assuming the der of differentiation and integration may be reversed, and applying the product rule, gives the (kxk) matrix equality I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][Mf(X)/M2']dX = 0 since Mlnf(X)/M2 = [f(x)] -1 [Mf(X)/M2], I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][Mlnf(X)/M2']f(X)dX = 0 I[M 2 lnl(2)/m2m2']f(x)dx + I[MlnL(2)/M2][MlnL(2)/M2']f(X)dX = 0 In terms of expectations, E X {[MlnL(2)/M2][MlnL(2)/M2']} = - E X [M 2 lnl(2)/m2m2'] Hence, we have an alternative expression f the infmation matrix. I(2) = - E X [M 2 lnl(2)/m2m2'] = - E X [H(2)]

The Cramer-Rao Bound The CR bound establishes a lower bound f the variance of unbiased estimats. Letting t(x) denote an arbitrary unbiased estimat of the parameter 2, we know that E X [t(x)] = 2 f any sample size n and any valid 2. In the continuous case we have It(X)f(X)dX = 2. Differentiating with respect to 2' gives This states that: It(X)[Mf(X)/M2']dX = I K It(X)[Mlnf(X)/M2']f(X)dX = I K. It(X)[MlnL(2)/M2']f(X)dX = I K. E X [t(x)s(2)'] = I K. Since the mean sce is a zero vect, the covariance matrix of any unbiased estimat and the sce is an identity matrix. Thus, the covariance matrix of the stacked vect +, +, * t(x) * is * E t I K * * S(2) * * I K I(2) * where E t denote the covariance matrix of the estimat t(x). If we denote the covariance matrix of the stacked vect by C, then Z'CZ$0 f arbitrary Z 0, since any covariance matrix is positive semi-definite. This inequality must hold f an arbitrary non-zero 2K-vect Z, including +, * W * * - I(2) -1 W * where W is an arbitrary non-zero k-vect. F this choice of Z, the inequality above reduces to

Z'CZ = W'[E t -I(2) -1 ]W $ 0 f W 0. That is, the difference between the covariance matrix of an unbiased estimat t(x) and the inverse of the infmation matrix is a positive semi-definite matrix. Since the diagonal elements of a positive semi-definite matrix must be non-negative, the variance of an unbiased estimat is not less than the cresponding diagonal element of the inverse of the infmation matrix. Theem ( Properties of ML Estimates ) Let X represent a random sample from a population with joint density f(x) of known fm (not necessarily nmal). Then subject to certain regularity conditions, ML estimats are consistent, asymptotically efficient, and asymptotically nmal. (Most notable among the regularity conditions, the sample space of X must not depend on 2, and the joint density must be differentiable with respect to 2. Details may be found in Econometrics, Peter Schmidt, 1976, Marcel Dekker, New Yk.)

Application Consider a SI sequence of Bernoulli trials. Since the observations are statistically independent, the joint density is the product of the marginal densities. f(x) = A n i=1f(x i ) The log likelihood function is thus lnl(p) = E n i=1[ X i ln(p) + (1-X i ) ln(1-p) ] and the sce equation is MlnL(p)/Mp = E n i=1[(x i /p)-(1-x i )/(1-p)] The ML estimat solves S(p$) = E n i=1[(x i /p$)-(1-x i )/(1-p$)] = 0 Multiplying both sides by p$(1-p$) gives E n i=1[x i (1-p$)-(1-X i )p$] = 0 E n i=1x i - p$ E n i=1x i -n p$ + p$ E n i=1x i = 0 E n i=1x i -n p$ = 0 Thus, p$ = [E n i=1x i /n] = &X, and the ML estimat of p is just the sample mean. We have seen that &X has mean : X and variance F 2 x/n, regardless of the underlying distribution. F the Bernoulli trial, : x = p and F 2 x = p(1-p). Consequently, E(p$) = E(&X) = p and V(p$) = V(&X) = p(1-p)/n. This implies that the ML estimat p$ is an unbiased estimat of p.

The Hessian matrix -- a scalar in this case -- is M 2 lnl(p)/mp 2 = -[E n i=1(x i /p 2 ) + E n i=1(1-x i )/(1-p) 2 ] Since E[E n i=1x i ]=np, the infmation matrix is E[-H(p)] = np/(p 2 ) + n(1-p)/[(1-p) 2 ] = n/p + n/(1-p) = n(1-p)/[p(1-p)] + np/[p(1-p)] = n/[p(1-p)] The CR bound f unbiased estimats of p is thus p(1-p)/n. Since p$ is an unbiased estimat of p and has variance that meets the CR bound, p$ is efficient. Application Assume that X i are iidn(:,f 2 ). That is, X~N(:,F 2 I n ). (Note that we are using the same notation to denote the scalar parameter : and the vect of common means :.) The joint density of the random vect X is f(x) = (2B) -n/2 (F 2 ) -n/2 exp[-½ (X-:)'(F 2 ) -1 (X-:)] The log-likelihood function is thus lnl($,f 2 ) = - (n/2)ln(2b) - (n/2)ln(f 2 ) - ½ (X-:)'(F 2 ) -1 (X-:) Note that : enters the log-likelihood function only through the last term. In der to find the sce equations, we will need to find the vect of partial derivatives of (X-:)'(X-:) with respect to :. This is done most easily, by recognizing that (X-:)'(X-:) = E n i=1(x i -:) 2

Consequently, M(X-:)'(X-:)/M: = - 2 E n i=1(x i -:) The first sce equation is thus MnL(:,F 2 )/M: = - ½ (F 2 ) -1 [- 2 E n i=1(x i -:)] = (F 2 ) -1 [E n i=1(x i -:)] and the second sce equation is MlnL(:,F 2 )/MF 2 = - (n/2)(f 2 ) -1 + ½(F 2 ) -2 [(X-:)'(X-:)] = - (n/2f 2 ) + (1/2F 4 )[(X-:)'(X-:)] The ML estimats solve S($2)=0. F this problem we have (1/F$ 2 )[E n i=1(x i -$:)] = 0 and -(n/2 F$ 2 ) + (1/2 F$ 4 )[(X-$:)'(X-$:)] = 0 The first sce equation may be solved f $: = E n i=1(x i /n) = &X Given $:, the final sce equation may solved f F$ 2 = (X-$:)'(X-$:)/n. At this point, it is convenient to find the Hessian matrix, the matrix of second partials and cross-partials of the log-likelihood function. The first diagonal element is given by M 2 lnl(:,f 2 )/M: 2 = - (n/f 2 ) The off-diagonal element is given by M 2 lnl(:,f 2 )/M:MF 2 = - (F 2 ) -2 [E n i=1(x i -:)] = - (1/F 4 )[E n i=1(x i -:)]

The second diagonal element is M 2 lnl(:,f 2 )/MF 4 = (n/2)(f 2 ) -2 - (F 2 ) -3 [(X-:)'(X-:)] = (n/2f 4 ) - (1/F 6 )[(X-:)'(X-:)] The Hessian matrix is thus given by +, * -(n/f 2 ) -(1/F 4 )[E n i=1(x i -:)] * * * * -(1/F 4 )[E n i=1(x i -:)] (n/2f 4 ) - (1/F 6 )[(X-:)'(X-:)] * The infmation matrix is -E X [H(2)]. Note that E(X i -:) = 0, and E[(X-:)'(X-:)] = nf 2. Thus, the infmation matrix f this problem is +, * (n/f 2 ) 0 * * * * 0 - (n/2f 4 ) + (nf 2 /F 6 ) * which reduces to +, * (n/f 2 ) 0 * * * * 0 (n/2f 4 ) * The CR bound is the inverse of the infmation matrix. Since the infmation matrix f this problem is block diagonal, the CR bound is simply +, * F 2 /n 0 * * * * 0 (2F 4 /n) * We saw earlier that $: = &X ~ N(:,F 2 /n). Since $: is unbiased f : and has variance that meets the CR bound, $: is efficient. We will see later that F$ 2 =(X-$:)'(X-$:)/n has mean [(n-1)/n]f 2 F 2, and consequently, cannot be efficient.