Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Similar documents
Statistics 3657 : Moment Approximations

4. Distributions of Functions of Random Variables

Asymptotic Theory. L. Magee revised January 21, 2013

STA 2101/442 Assignment 3 1

Random Variables and Their Distributions

STA 4322 Exam I Name: Introduction to Statistics Theory

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Statistics 3657 : Moment Generating Functions

1.12 Multivariate Random Variables

Statistics 3858 : Maximum Likelihood Estimators

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Order Statistics and Distributions

Multiple Random Variables

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

The Delta Method and Applications

Stat 5101 Notes: Algorithms

MAS223 Statistical Inference and Modelling Exercises

Chapter 5 continued. Chapter 5 sections

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

A Probability Review

Multivariate Distributions

6.1 Moment Generating and Characteristic Functions

Random Vectors, Random Matrices, and Matrix Expected Value

Multivariate Random Variable

Probability and Statistics Notes

Stat 206: Sampling theory, sample moments, mahalanobis

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Exam 2. Jeremy Morris. March 23, 2006

18.440: Lecture 28 Lectures Review

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Chapter 2. Discrete Distributions

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Asymptotic Statistics-III. Changliang Zou

Correlation and Regression

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability Theory and Statistics. Peter Jochumzen

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Theory of Statistics.

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Exercises and Answers to Chapter 1

Stat 5101 Notes: Brand Name Distributions

18.440: Lecture 28 Lectures Review

5601 Notes: The Sandwich Estimator

STAT Chapter 5 Continuous Distributions

IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Week 9 The Central Limit Theorem and Estimation Concepts

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

A Bahadur Representation of the Linear Support Vector Machine

Review of Basic Probability Theory

Chapter 4 continued. Chapter 4 sections

A Very Brief Summary of Statistical Inference, and Examples

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

2 Functions of random variables

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

4. CONTINUOUS RANDOM VARIABLES

Formulas for probability theory and linear models SF2941

MAS223 Statistical Modelling and Inference Examples

1.1 Review of Probability Theory

Notes on Random Vectors and Multivariate Normal

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Sampling Distributions

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Lecture 28: Asymptotic confidence sets

Simple Linear Regression

This does not cover everything on the final. Look at the posted practice problems for other topics.

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Northwestern University Department of Electrical Engineering and Computer Science

COMPSCI 240: Reasoning Under Uncertainty

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Gaussian vectors and central limit theorem

Lecture 1: August 28

Expectation and Variance

Continuous Random Variables

DUBLIN CITY UNIVERSITY

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

We begin by thinking about population relationships.

Homework 10 (due December 2, 2009)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 5. Chapter 5 sections

Mathematical Preliminaries

Convergence in Distribution

ADDITIONAL MATHEMATICS

MATH2715: Statistical Methods

Lecture 2: Review of Probability

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Structural Reliability

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

The Multivariate Normal Distribution. In this case according to our theorem

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Probability on a Riemannian Manifold

Lecture 11. Multivariate Normal theory

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Transcription:

Delta Method Often estimators are functions of other random variables, for example in the method of moments. These functions of random variables can sometimes inherit a normal approximation from the underlying random variables. Earlier in the course we obtained a result where a continuous function of a sequence of consistent estimators also inherited the property of being consistent estimators. The delta method allows a normal approximation (a normal central limit type or result, that is convergence in distribution to a normal distribution) for a continuous and differentiable function of a sequence of r.v.s that already has a normal limit in distribution. Example : Method of Moments for Exponential Distribution. X i, i = 1,,..., n are iid exponential, λ with pdf f(x; λ) = λe λx I(x > 0) The first moment is then µ 1 (λ) = 1. The the method of moments estimator is λ ˆλ n = 1 Xn Notice this is of the form ˆλ n = g( X) where g : R + R + with g(x) = 1 x. Theorem 1 (Delta Method) Suppose X n has an asymptotic normal distribution, that is n( Xn µ) N(0, γ ) 1

in distribution as n. Suppose g is a function that is continuous and also has a derivative g at µ, and that g (µ) 0. Then n ( g( Xn ) g(µ) ) N(0, (g (µ)) γ ) Remark : The condition g (µ) 0 is actually only needed so that n ( g( Xn ) g(µ) ) in distribution as n. g (µ) γ Remark : Theorem 1 is called the delta method. Proof (Outline) N(0, 1) The first order Taylor approximation of g about the point µ, and evaluated at the random variable X n is g( X n ) g(µ) + g (µ) ( Xn µ ) Subtract g(µ) from both sides and multiply by n gives n ( g( Xn ) g(µ) ) g (µ) n ( Xn µ ) N(0, (g (µ)) γ ) Remark : A more careful study of Taylor s formula with remainder is needed to justify all steps in this approximation. For our purposes in this course these details are not needed. Remark : Notice this delta method is an extension of the idea used earlier to approximate moments, specifically to approximate means and variances. Example Continued : For the exponential example we have ( n Xn 1 ) N(0, 1 λ λ ) Using the function g(x) = 1/x expanded about µ = 1, and noting λ ˆλ n = g( X n ) = 1 Xn and g(µ) = λ

we thus have as an application of the delta method (Theorem 1) n (ˆλn λ ) 1 ( 1 n Xn 1 ) ( ) N 0, λ4 = N(0, λ ) λ λ λ In our discussion of the Central Limit Theorem (CLT) we studied in the case of iid sampling from a distribution with moments that n( Xn µ) σ converges in distribution to N(0, 1). The student should refer back to that section of the text and the notes to review the notation and conditions and the statement of the CLT. We also discussed, but without proof, that one can replace the population variance σ in the denominator with the r.v. S n (the sample variance) and still obtain convergence in distribution to N(0, 1). This is due to S n being a consistent estimator of σ, that is S n σ in probability as n. One can replace the population variance σ in the denominator with any other consistent estimator. Therefore in the exponential example above we have two random variables that may be used Z 1,n = Z 1,n = n( Xn 1 ) λ 1 ˆλ n n( Xn 1 ) λ S n both of which converge in distribution to N(0, 1). Both of these may be used to construct an approximate or asymptotic confidence interval for µ = E(X). This will be discussed later in the course. 3

There is also a delta method for random vectors. It is described in the same fashion. For simplicity of writing and since we only apply it in our course for dimension, we write it in this form. For this we need a preliminary theorem. Theorem Suppose X i, i 1 are iid with finite 4-th moments. Write n n ˆµ 1,n = 1 X i and ˆµ,n = 1 n i=1 n and let µ k be the k-th population moments. Then Xi i=1 n (ˆµ1,n µ 1, ˆµ,n µ ) BVN(0, A) in distribution as n, where BVN means bivariate normal with mean vector 0 and variance (or covariance) matrix A = µ µ 1 µ 3 µ 1 µ µ 3 µ 1 µ µ 4 µ Remark : This is a consequence of a multivariate central limit theorem, and is not proven in the course. In the Theorem above the matrix A is a variance matrix. What are some properties of this matrix that we can use below? Suppose that b 1, b are two constants. Set b = (b 1, b ), a 1 by matrix (row vector). Suppose also that (X, Y ) T is a by 1 matrix (column vector) with components the r.v.s X, Y and variance matrix A. Then b X Y = b 1 X + b Y. Then Var(b 1 X + b Y ) = b 1Var(X) + b 1 b Cov(X, Y ) + b Var(X) = bab T. 4

The student should write out the matrix multiplication and verify this. Similarly if B is a matrix of appropriate size then W = B(X, Y ) T is a random vector. In this case the random vector W has variance matrix BAB T. End of Remark 5

Theorem 3 (below) is the delta method applied to a function of (ˆµ 1,n, ˆµ,n ). We state this rather than the general delta method to avoid more complicated notation. The idea is the same as used in Theorem 1, but is based on working with bivariate normal distributions, and more generally with multivariate normal distributions. Theorem 3 Suppose the conditions of Theorem. Suppose g is a function of two variables mapped to two variables, that is continuous and also has a derivative g at (µ 1, µ ), and that g (µ 1, µ ) is non zero. Then n (g(ˆµ1,n, ˆµ,n ) g(µ 1, µ )) N(0, g (µ 1, µ )Ag (µ 1, µ ) T ) Remark : If g maps pairs into R, then we interpret g (µ 1, µ ) as a row vector of length. In this case g (µ 1, µ )Ag (µ 1, µ ) T is a (1 ) ( ) ( 1) = 1 1 matrix, that is a real number. If g maps pairs into R R, then we interpret g (µ 1, µ ) as a by matrix. g (µ 1, µ )Ag (µ 1, µ ) t is a product of a by matrix g, a by matrix A and by matrix (g ) T (the transpose of the first by matrix). Since variance matrices are positive definite this resulting matrix is also positive definite, that is it is a variance matrix. End of Remark For an application of this result, see the rainfall data example and the method of moments for that example. Part of this example is discussed in more detail later in this handout. In that example we use the following fact, that for a bivariate normal distribution the marginal distribution of each component is normal. For example for the first component we have a normal distribution with variance given by the (1,1) component of the variance matrix. 6

When we discussed the central limit theorem (CLT) we stated without proof, that one can replace the population variance σ with a consistent estimator of σ, in that case s n the sample variance, and still retain the convergence in distribution to N(0,1). This same property carries over more generally. In our delta method this corresponding result allows one to replace (the first two for g( X n ) and the last two for g(ˆµ 1,n, ˆµ,n )) σ = Var(X) by s n g (µ) by g ( X n ) The matrix A = Var(X, X ) by  n = ˆµ,n ˆµ 1,n ˆµ 3,n ˆµ 1,nˆµ,n ˆµ 3,n ˆµ 1,nˆµ,n ˆµ 4,n ˆµ,n g (µ 1, µ ) by g (ˆµ 1,n, ˆµ,n ) We do not write this in a technically proper manner in this course. 7

Example : Delta Method applied to Gamma method of moment estimator Recall method of moments estimator for the Gamma parameters are λ n = ˆµ 1,n ˆµ,n ˆµ 1,n α n = λ nˆµ 1,n = ˆµ 1,nˆµ 1,n ˆµ,n ˆµ 1,n Thus we have where the function g is given by λ n = g(ˆµ 1,n, ˆµ,n ) g(x, y) = x y x. Thus we find g x = y + x (y x ) g x = y (y x ) Thus the matrix (or row vector) for the partial derivative of g is g (x, y) = = g(x, y) (x, y) ( ) y + x (y x ), x (y x ). Aside If one writes the partial derivative as a column vector g (x, y) = y+x (y x ) x (y x ) 8

then one has to be careful to use the appropriate transposes in the matrix multiplications above. It does not matter as long as one is consistent. In this handout we will use the row vector. End of Aside Recall λ = g(µ 1, µ ), by the construction of our method of moments estimator. The first order Taylor s formula approximation for λ n is then λ n λ = g(ˆµ 1,n, ˆµ,n ) g(µ 1, µ ) g (µ 1, µ ) ˆµ 1,n µ 1 ˆµ,n µ Then n( λn λ) g (µ 1, µ ) n g (µ 1, µ )W ˆµ 1,n µ 1 ˆµ,n µ as n, and where W has the bivariate normal distribution with mean vector 0 and covariance matrix A (see earlier section). The symbol is used her for shorthand convergence in distribution. Thus n( λn λ) g (µ 1, µ )W N(0, g (µ 1, µ )Ag (µ 1, µ ) T ). In particular we have that for large n, the r.v. n( λ n λ) has an approximate normal distribution. Thus our estimator has an asymptotic normal distribution approximation. In particular we can use this to construct confidence intervals for λ. 9

There are a few additional ideas that are needed to make use of the delte method, Theorem 3, in practice. Recall that it gives a limit normal distribution n (g(ˆµ1,n, ˆµ,n ) g(µ 1, µ )) N(0, g (µ 1, µ )Ag (µ 1, µ ) T ) so that g (µ 1, µ )Ag (µ 1, µ ) T (1) is the variance matrix for the limiting normal distribution. We also need a sample estimate of this; this will play the role of the sample variance in our simpler 1 dimensional standardized sample mean t = n( X µ) S. () If the X i are iid normal then () has a student s t distribution. If not, but the sample size n is large, then () has approximately a standard normal distribution. In (1) we estimate A by using the sample momengts  = ˆµ ˆµ ˆµ 3 ˆµ 1ˆµ ˆµ 3 ˆµ 1ˆµ ˆµ 4 ˆµ and estimate g (µ 1, µ ) by also using the sample moments g (ˆµ 1, ˆµ ). Here the estimate of the sample moment with data x 1, x,..., x n is ˆµ k = 1 n x k i n i=1 Earlier when we needed to keep track of the sample size in our notation we used ˆµ k,n, and as we noted earlier we drop the extra subscript n except where we need to pay attention to this sample size. 10