Introduction to Estimation Methods for Time Series models Lecture 2

Similar documents
Introduction to Estimation Methods for Time Series models. Lecture 1

Greene, Econometric Analysis (6th ed, 2008)

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

6. MAXIMUM LIKELIHOOD ESTIMATION

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Parameter Estimation

Introduction to ARMA and GARCH processes

simple if it completely specifies the density of x

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Mathematical statistics

Chapter 3: Maximum Likelihood Theory

Graduate Econometrics I: Maximum Likelihood I

Maximum Likelihood (ML) Estimation

A Very Brief Summary of Statistical Inference, and Examples

The outline for Unit 3

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Problem Selected Scores

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Chapter 3. Point Estimation. 3.1 Introduction

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Advanced Quantitative Methods: maximum likelihood

Brief Review on Estimation Theory

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Mathematical statistics

Ch. 5 Hypothesis Testing

Advanced Quantitative Methods: maximum likelihood

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Advanced Econometrics

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

A Very Brief Summary of Statistical Inference, and Examples

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Maximum Likelihood Estimation

Econ 583 Final Exam Fall 2008

Estimation, Inference, and Hypothesis Testing

Inference in non-linear time series

ECON 4160, Autumn term Lecture 1

Statistics and econometrics

Chapters 9. Properties of Point Estimators

10. Linear Models and Maximum Likelihood Estimation

5.2 Fisher information and the Cramer-Rao bound

Econometrics I, Estimation

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Generalized Method of Moment

Central Limit Theorem ( 5.3)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Econometrics II - EXAM Answer each question in separate sheets in three hours

Sampling distribution of GLM regression coefficients

Lecture 3 September 1

ECON 5350 Class Notes Nonlinear Regression Models

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Elements of statistics (MATH0487-1)

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

13.2 Example: W, LM and LR Tests

LECTURE 18: NONLINEAR MODELS

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Maximum Likelihood Estimation

2 Statistical Estimation: Basic Concepts

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Statistics Ph.D. Qualifying Exam

Problem Set 6 Solution

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Association studies and regression

Outline of GLMs. Definitions

Master s Written Examination

Quick Review on Linear Multiple Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

GMM - Generalized method of moments

Lecture 17: Likelihood ratio and asymptotic tests

Graduate Econometrics I: Maximum Likelihood II

Regression #3: Properties of OLS Estimator

Estimation of Dynamic Regression Models

GARCH Models Estimation and Inference

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

POLI 8501 Introduction to Maximum Likelihood Estimation

Generalized Linear Models. Kurt Hornik

Statistics 3858 : Maximum Likelihood Estimators

Generalized Method of Moments (GMM) Estimation

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

8. Hypothesis Testing

DA Freedman Notes on the MLE Fall 2003

ECE531 Lecture 10b: Maximum Likelihood Estimation

Maximum Likelihood Estimation

HT Introduction. P(X i = x i ) = e λ λ x i

Economics 583: Econometric Theory I A Primer on Asymptotics

Graduate Econometrics I: Unbiased Estimation

Chapter 8.8.1: A factorization theorem

Regression #4: Properties of OLS Estimator (Part 2)

2014/2015 Smester II ST5224 Final Exam Solution

Transcription:

Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21

Estimators: Large Sample Properties Purposes: study the behavior of ˆθ n when n approximate unknown finite sample distributions of ˆθ n Being ˆθ n a distribution n, how to define ˆθ n θ 0? Convergence in Probability (plim): The random variable ˆθ n converges in probability to a constant θ 0 if ɛ > 0 lim n P( ˆθ n θ 0 < ɛ) = 1 If plim ˆθ n = θ 0 the estimator is Consistent Convergence in Quadratic Mean: lim E(ˆθ n θ 0 ) 2 = lim MSE[ˆθ n n n] = 0 Being MSE[ˆθ n] = Var[ˆθ n]+bias[ˆθ n] 2 we have Convergence in Quadratic Mean lim Bias[ˆθ n n] = 0 and lim Var[ˆθ n n] = 0 Convergence in Quadratic Mean Convergence in Probability Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 2 / 21

Consistency Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 3 / 21

OLS without Normality: Large Sample Theory When H.4 of Normality is violated, OLS is still unbiased and BLUE, however confidence intervals and test statistics are not valid Assumptions for the large sample theory: A.1 E[x i ɛ i ] = 0 regressors uncorrelated with errors (weaker than strict exogeneity) 1 A.2 lim n n X X = Q definite positive Being ( n ) 1 ( n ) ˆβ = β +(X X) 1 X ɛ = x i x i x i ɛ i Consistency (convergence in Probability): plim β n = β ( ) 1 n 1 ( ) β n = β + x i x 1 n i x i ɛ i β n n }{{}}{{} Q 1 (A.2) 0 (A.1+LLN) Asymptotic Normality ( ) 1 n 1 ( ) 1 n n ( βn β) = x i x i x i ɛ i N(0, Q 1 (σ 2 Q)Q 1 ) = N(0,σ 2 Q 1 ) n n }{{}}{{} Q 1 (A.2) N(0,σ 2 Q) (A.2+CLT) Hence β n N(β,σ 2 (X X) 1 ) as in the Normal case (H.4) but only asymptotically. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 4 / 21

NLS (the idea) The Nonlinear Regression model is y i = h(x i,β)+ɛ i with E[ɛ i x i ] = 0 Nonlinear Least Square (NLS) estimator: β NLS = arg min β n ɛ 2 i = arg min β n (y i h(x i,β)) 2 FOC: NLS estimator β NLS satisfy n [ y i h(x i, β ] h(xi, NLS ) β NLS ) = 0 β In general, no close form solutions numerical minimization Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 5 / 21

Maximum Likelihood Basic, strong assumption: distribution of the data known up to θ. Likelihood function: The joint density of an i.i.d random sample (x 1, x 2,...,x n) from f(x;θ 0 ) f(x 1, x 2,...,x n;θ) = f(x 1 ;θ)f(x 2 ;θ)...f(x n;θ) a different perspective: see the joint density as a function of the parameters θ (as opposed to the sample) n L(θ X) f(x 1, x 2,...,x n;θ) = f(x i ;θ) it is usually simpler to work with the log of the likelihood l(θ x 1, x 2,...,x n) ln L(θ x 1, x 2,...,x n) = n ln f(x i ;θ) ML Estimator: Given sample data generated from parametric model, find parameters that maximize probability of observing that sample. F.0.C. the Score: θ ML = arg max θ L(θ x 1, x 2,...,x n) = arg max θ l(θ) θ = 0 l(θ x 1, x 2,...,x n) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 6 / 21

Maximum Likelihood: Example Consider a Univariate Normal model: f(y,θ) = N(µ,σ 2 ) = ( 1 exp 1 2πσ 2 2 (y i µ) 2 ) The log-likelihood is n l(µ,σ 2 ) = f(y i ;θ) = n 2 ln(2π) n 2 ln(σ2 ) 1 n (y i µ) 2 2 σ 2 and then the score of the µ and σ 2 parameters are l(µ,σ 2 ) σ 2 l(µ,σ 2 ) µ = 1 n σ 2 (y i µ) = 0 = n 2σ 2 + 1 2σ 4 σ 2 n (y i µ) 2 = 0 Therefore, by first solving for µ and inserting it in the score of σ 2 we get the ML estimators µ ML = 1 n y i n σ ML 2 = 1 n (y i µ ML ) 2 n Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 7 / 21

Maximum Likelihood: Properties Consistency: plim θ ML = θ 0 Asymptotic normality: a θ ML N (θ 0,[I(θ 0 )] 1) [ 2 ] ln L(θ) where I(θ) = E θ θ I(θ) is the Fisher Information matrix Asymptotic efficiency: has the smallest asymptotic variance being I(θ) 1 the Cramér-Rao lower bound Invariance: if θ is the MLE for θ 0, and if g(θ) is any (invertible) transformation of θ, then the MLE for g(θ 0 ) = g( θ). Ex: precision parameter γ 2 = 1/σ 2 γ 2 ML = 1/σ2 ML Bottom line: MLE makes best use of information (asymptotically) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 8 / 21

Maximum Likelihood: Properties of regular density Under some regularity conditions (whose goal is to use Taylor approximation and interchange differentiation and expectation) D1: ln f(y i ;θ), g i = ln f(y i;θ), H θ i = 2 ln f(y i ;θ) θθ are random sample D2: E 0 [g i (θ 0 )] = 0 D3: Var 0 [g i (θ 0 )] = E 0 [H i (θ 0 )] D1 implied by assumption: y i, i = 1,..., n is random sample D2 is a consequence of ln f(y i ;θ 0 )dy i = 1 since by differencing both sides by θ 0 f(yi ;θ 0 ) ln f(yi ;θ 0 ) 0 = dy i = f(y i ;θ 0 )dy i = E 0 [g i (θ 0 )] θ 0 θ 0 D3 is obtained by differencing once more w.r.t θ 0 [ D1 (random sample) Var n 0 g i(θ 0 ) ] = n Var 0[g i (θ 0 )], thus [ ] [ ] ln L(θ0 ; y i ) 2 ln L(θ 0 ; y i ) J(θ 0 ) Var 0 = E 0 θ 0 θ 0 θ 0 I(θ 0 ) }{{} Information matrix equality Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 9 / 21

Asymptotic normality of MLE being the max of ln L, MLE satisfy by construction the likelihood equation g(ˆθ) = n g i(ˆθ) = 0 define H(θ 0 ) = 2 ln L(θ 0 ;y i ) θ 0 θ 0 = n 2 ln f(y i ;θ 0 ) θ 0 θ 0 = n H i(θ 0 ) 1 take first order Taylor expansion of the score g(ˆθ) around θ 0 g(ˆθ) = g(θ 0 )+H(θ 0 )(ˆθ θ 0 )+R 1 = 0, 2 rearrange and scale by n ( ) n(ˆθ θ0 ) = 1 n 1 H i (θ 0 ) n }{{} E 0[ 1 n H(θ 0)] n 1 n g i (θ 0 ) n }{{} N(0,Var 0[ n 1 g(θ 0)]) + R 1 }{{} 0 3 Use LLN (on first term) and CLT (on second term) [ ] 1 1 [ ]) 1 n(ˆθ θ0 ) E 0 n H(θ 0) N( 0, Var 0 n g(θ 0) ( N 0, n I(θ 0 ) 1 J(θ 0 )I(θ 0 ) 1) ( ) 1 1 = n I(θ 0) N( 0, 1 ) n J(θ 0) if information matrix equality J(θ 0 ) = I(θ 0 ) holds then ˆθ a N (θ 0, I(θ 0 ) 1) Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 10 / 21

Estimating asymptotic covariance matrix of MLE Three asymptotically equivalent estimators of the Asy.Var[ˆθ]: 1 Calculate E 0 [H(θ 0 )] (very difficult) and evaluate it at ˆθ to estimate [ ]} 1 {I(ˆθ)} 1 = { E 2 ln L(ˆθ; y i ) 0 ˆθˆθ 2 Calculate H(θ 0 ) (still quite difficult) and evaluate it at ˆθ to get { } 1 {Î(ˆθ)} 1 2 ln L(ˆθ; y i ) = ˆθˆθ 3 BHHH or OPG estimator (easy): use information matrix equality I(θ 0 ) = J(θ 0 ) { [ ]} 1 { n } 1 {Ĩ(ˆθ)} 1 ln L(ˆθ; y i ) = Var = g i (ˆθ)g i (ˆθ) ˆθ Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 11 / 21

Hypothesis testing (idea) Test of hypothesis H 0 : c(θ) = 0 Three tests, asymptotically equivalent (not in finite sample): Likelihood ratio test : If c(θ) = 0 is valid, then imposing it should not lead to a large reduction in the log-likelihood function. Therefore, we base the test on the difference, 2(ln L ln L R ) χ 2 df Both unrestricted ln L and restricted ln L R ML estimators are required Wald test: If c(θ) = 0 is valid, then c(θ ML ) 0 Only unrestricted (ML) estimator is required Lagrange multiplier test: If c(θ) = 0 is valid, then the restricted estimator should be near the point that maximizes the ln L. Therefore, the slope of ln L should be near zero at the restricted estimator Only restricted estimator is required Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 12 / 21

Hypothesis testing Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 13 / 21

Application of MLE: Linear regression model Model: y i = x i β +ɛ i and y i x i N(x i β,σ2 ) Log-likelihood based on n conditionally independent observations: Likelihood equations ln L = n 2 ln(2π) n 2 lnσ2 1 n (y i x i β) 2 σ 2 = n 2 ln(2π) n 2 lnσ2 (y Xβ) (y Xβ) 2σ 2 solving likelihood equations ln L β ln L σ 2 = X (y Xβ) σ 2 = 0 = n 2σ 2 + (y Xβ) (y Xβ) 2σ 4 = 0 ˆβ ML = (X X) 1 X Y and ˆσ 2 ML = e e n ˆβ ML = ˆβ OLS OLS has all desirable asymptotic properties of MLE Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 14 / 21

Maximum Likelihood in time series: AR model In a time series y t, the innovations ɛ t are usually not i.i.d. It is then very convenient to use the prediction error decomposition of the likelihood: L(y T, y T 1,..., y 1 ;θ) = f(y T Ω T 1 ;θ) f(y T 1 Ω T 2 ;θ)... f(y 1 Ω 0 ;θ) For example for the AR(1) the full log-likelihood can be written as y t = φ 1 y t 1 +ɛ t T l(φ) = f Y1 (y 1;φ) + f Y t Y t 1 (y t y t 1;φ) = f Y1 (y T T 1;φ) }{{} 2 log(2π) logσ 2 1 T (y t φy t 1) 2 2 σ 2 t=2 t=1 t=2 marginal 1 st obs }{{} conditional likelihood under normality OLS=MLE Hence, maximizing the conditional likelihood for φ is equivalent to minimize which is the OLS criteria. T (y t φy t 1 ) 2 t=2 In general for AR(p) process OLS are consistent and, under gaussianity, asymptotically equivalent to MLE asymptotically efficient Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 15 / 21

Maximum Likelihood in time series: ARMA model For a general ARMA(p,q) Y t = φ 1 Y t 1 +...+φ py t p +ɛ t +θ 1 ɛ t 1 +...+θ qɛ t q Y t 1 is correlated with ɛ t 1,...,ɛ t q OLS not consistent. MLE with numerical optimization procedures. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 16 / 21

Maximum Likelihood in time series: GARCH model A GARCH process with gaussian innovation: has conditional densities: r t Ω t 1 N(µ t(θ),σ 2 t (θ)) f(r t Ω t 1 ;θ) = 1 ( σt 1 (θ) exp 1 2π 2 (r t µ t(θ)) 2 ) σ 2 t (θ) using the prediction error decomposition the log-likelihood becomes: log L(r T, r T 1,..., r 1 ;θ) = T T 2 log(2π) logσt 2 (θ) 1 T (r t µ t(θ)) 2 2 σ 2 t=1 t=1 t (θ) Non linear function in θ Numerical optimization techniques. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 17 / 21

Quasi Maximum Likelihood ML requires complete specification of f(y i x i ;θ), usually Normality is assumed. Nevertheless, even if the true distribution is not Normal, assuming Normality gives consistency and asymptotic normality provided that the conditional mean and variance processes are correctly specified. However, the information matrix equality does not hold anymore i.e. J(θ 0 ) I(θ 0 ) hence, the covariance matrix of θ [ ML is not I(θ 0 ) 1 = E 2 ] 1 l(θ0 ) but θ 0 θ 0 θ QML a N ( θ 0,[I(θ 0 )] 1 J(θ 0 )[I(θ 0 )] 1) where 1 J(θ 0 ) = lim n T T t=1 [ lt(θ 0 ) l t(θ 0 ) ] E θ 0 θ 0 ] 1 ] 1 the std error provided by [Î(θ0 ) Ĵ(θ0 ) [Î(θ0 ) are called the robust standard errors. Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 18 / 21

GMM (idea) Idea: Sample moments p Population moments = function(parameters) A vector function g(θ, w t) which under the true value θ 0 satisfies E 0 [g(θ 0, w t)] = 0 is called a set of orthogonality or moment conditions Goal: estimate θ 0 from the informational content of the moment conditions semiparametric approach i.e. no distributional assumptions! replace population moment conditions with sample moments ĝ T (θ) = 1 T T g(θ, w t) t=1 and minimize, w.r.t. θ, a quadratic form of ĝ n(θ) with a certain weighting matrix W ( ( ) 1 T 1 T θ GMM = arg min g(θ, w t)) W g(θ, w t) θ T T t=1 t=1 Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 19 / 21

GMM: optimal weighting matrix exactly identified (Method of Moment, MM): # orthogonality conditions = # parameters moment equations satisfied exactly, i.e. 1 T g( θ, w t) = 0 W irrelevant T t=1 overidentified (Generalize MM): # orthogonality conditions > # parameters W is relevant. The optimal weighting matrix W is the inverse of the asymptotic var-cov of g(θ 0, w t) W = Var[g(θ 0, w t)] 1 but it depends on the unknonw θ 0 Feasible two-step procedure: Step 1. Use W = I to obtain a consistent estimator, ˆθ 1, then estimate ˆΦ = 1 T g(ˆθ 1, w t)g(ˆθ 1, w t) T t=1 Step 2. Compute second step GMM estimator using the weighting matrix ˆΦ 1 ( ) ( ) 1 T θ GMM = arg min g(θ, w t) ˆΦ 1 1 T g(θ, w t) θ T T t=1 t=1 The two-step estimator θ GMM is asymptotically efficient in the GMM class Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 20 / 21

GMM, OLS and ML Many estimation methods can be seen as GMM Examples: 1) OLS is a GMM with ortoghonality condition E[x i ɛ i ] = E [ x i (y i x i θ)] = 0 2) ML is a GMM on the score [ ] log f(yt;θ) E = E[g(θ, w t)] = 0 θ Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 21 / 21