Estimation of Dynamic Regression Models

Similar documents
Exogeneity and Causality

University of Pavia. M Estimators. Eduardo Rossi

Maximum Likelihood Estimation

Econometrics I, Estimation

Dynamic Regression Models

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference

Graduate Econometrics I: Maximum Likelihood I

Instrumental Variables

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Generalized Method of Moments (GMM) Estimation

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Introduction to Estimation Methods for Time Series models Lecture 2

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Introduction to Stochastic processes

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Discrete time processes

Testing for Regime Switching: A Comment

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

System Identification, Lecture 4

Econometrics of Panel Data

Graduate Econometrics I: Maximum Likelihood II

ECON 4160, Lecture 11 and 12

δ -method and M-estimation

Estimating Unnormalised Models by Score Matching

Stochastic Dynamic Programming: The One Sector Growth Model

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Linear Regression with Time Series Data

Instrumental Variables

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

More Empirical Process Theory

Lecture 2: Consistency of M-estimators

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 4: Asymptotic Properties of the MLE

Lecture 6 Basic Probability

System Identification, Lecture 4

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

STAT Financial Time Series

Inference in non-linear time series

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Parameter Estimation

Dynamic Discrete Choice Structural Models in Empirical IO

2 Statistical Estimation: Basic Concepts

ECON 3150/4150, Spring term Lecture 6

Brief Review on Estimation Theory

Estimation, Inference, and Hypothesis Testing

Stochastic process for macro

Ch.10 Autocorrelated Disturbances (June 15, 2016)

ECON 4160, Autumn term Lecture 1

ECE531 Lecture 10b: Maximum Likelihood Estimation

Next, we discuss econometric methods that can be used to estimate panel data models.

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Empirical Risk Minimization

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ARIMA Modelling and Forecasting

Short T Panels - Review

If we want to analyze experimental or simulated data we might encounter the following tasks:

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

Follow links for Class Use and other Permissions. For more information send to:

Model Selection for Geostatistical Models

1. Stochastic Processes and Stationarity

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Lecture 26: Likelihood ratio tests

MEI Exam Review. June 7, 2002

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Lecture 2: Univariate Time Series

Chapter 6 Stochastic Regressors

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

POLI 8501 Introduction to Maximum Likelihood Estimation

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Time Series Analysis Fall 2008

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Integrating Correlated Bayesian Networks Using Maximum Entropy

Stability of optimization problems with stochastic dominance constraints

Linear Regression with Time Series Data

Nishant Gurnani. GAN Reading Group. April 14th, / 107

On the convergence of sequences of random variables: A primer

Density Estimation. Seungjin Choi

On the Power of Tests for Regime Switching

Econ 623 Econometrics II Topic 2: Stationary Time Series

Notes on Measure Theory and Markov Processes

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Lecture 1: Introduction

Verifying Regularity Conditions for Logit-Normal GLMM

ECON 4160, Spring term Lecture 12

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Multivariate Analysis and Likelihood Inference

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Model comparison and selection

Final Exam. Economics 835: Econometrics. Fall 2010

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Transcription:

University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia

Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy. The econometric analysis will focus on explaining a subset of variables, y t, in terms of the history of the system and of a contemporaneous subset z t, treated as given. Treating z t as given is motivated by the assumption that z t causes y t. What is causality in econometrics? Eduardo Rossi c - Macroeconometria 07 2

Factorization of the density y t,z t. y t subset of x t ; z t subset of x t. w t variables in x t that / y t,z t. D = D w,y,z = D w y,z D y z D z (1) ( ) presence (absence) of contemporaneous casuality ( ) presence (absence) of simultaneous relations Eduardo Rossi c - Macroeconometria 07 3

Factorization of the density If z t y t w t y t The factor represents D y z completely represents the stochastic mechanism generating y t. Note that y t w t is allowed, and no restriction on the relationships between w t and z t is required, for this to be true. Eduardo Rossi c - Macroeconometria 07 4

Factorization of the density Denote by W t 1 = σ (w t 1,w t 2,...) Y t 1 = σ (y t 1,y t 2,...) Z t 1 = σ (z t 1,z t 2,...) Assume there exists a partition of θ into two subvectors θ 1 Θ 1 and θ 2 Θ 2, such that Θ = Θ 1 Θ 2 and D w y,z = D w y,z (w t y t,z t, W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (2) D y z = D y z (y t z t, Y t 1, Z t 1 ;d t, θ 1 ) (3) D z = D z (z t W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (4) D w y,z and D z must not depend on θ 1. D y z must not depend on w t j for j > 0, in the sense that either conditioning or not conditioning on these variables has the same effect. Eduardo Rossi c - Macroeconometria 07 5

Factorization of the density Under the condition Θ = Θ 1 Θ 2 the admissible values of θ 1 may not depend on θ 2, so that knowledge of the latter cannot improve influences about the former. In this case θ 1 and θ 2 are said to be variation free. Under those conditions nothing need be known about the forms of D w y,z and D z to analyze D y z since these do not depend on θ 1. The analysis is conducted conditioning on z t and marginalizing on w t. Sequential cut: The separation of θ into two sets. θ 1 parameters of interest for the investigation θ 2 parameters that are not of interest. Eduardo Rossi c - Macroeconometria 07 6

Weak exogeneity Suppose that D y z depends on a vector φ of parameter of interest, whose values are the focus of investigation. To make the desired factorization of the DGP, it is only necessary that there exists some parameterization θ such that (2) holds, with θ 1 and θ 2 variation free, and φ = g (θ 1 ). In this analysis, the y t are called endogenous, z t are called weakly exogenous for φ. Weak exogeneity is a relationship between parameters and variables, and is not a property of variables as such. Without the required cut of the parameters, the factorization (1) is not relevant to the investigation. Eduardo Rossi c - Macroeconometria 07 7

Other notions of exogeneity Exogeneity is sometimes defined in terms of the independence of the variables in question from the disturbances in a model. In the regression model y t = x tβ + ε t t = 1,...,T x t is independent of ε t+j, j 0, E[x t ε t ] = 0. x t is said to be predetermined. If the independence holds for all j, x t is said to be strictly exogenous. Eduardo Rossi c - Macroeconometria 07 8

Setup Variables are related to their own lags in the sequence of observations, it is necessary to introduce conditioning assumptions. I t set of conditioning variables (the smallest σ-field of events containing the σ-fields generated by the conditioning variables). The model y t = x tβ + ǫ t Assumptions: 1. E[ǫ t I t ] = 0 a.s.. 2. E[ǫ 2 t I t ] = σ 2 a.s. Eduardo Rossi c - Macroeconometria 07 9

Setup The set I t includes deterministic variables (intercept, seasonal dummies, ecc.) lagged variables, dated t j, j > 0 current dated variables that are weakly exogenous for (β, σ 2 ) Any Borel-measurable function of variable in I t is also in I t : ǫ t j I t. ǫ t j = y t j x t jβ Implication of Assumption 1 is that the disturbances must be serially uncorrelated. Eduardo Rossi c - Macroeconometria 07 10

Example Suppose (y t,z t ) is a vector of variables generated by a dynamic DGP represented by the density factorization D t (y t,z t Z t 1, Y t 1 ; φ) = D t (y t z t, Z t 1, Y t 1 ; φ 1 )D t (z t Z t 1, Y t 1 ; φ 2 ) I t = σ(z t ) Z t 1 Y t 1 E[y t z t, Z t 1, Y t 1 ; φ 1 ] = x tβ where x t is composed of elements of z t j j 0 and y t j j > 0, if D t (y t z t, Z t 1, Y t 1 ; φ 1 ) is Gaussian then φ 1 = (β, σ 2 ). Eduardo Rossi c - Macroeconometria 07 11

The Method of Maximum Likelihood Notation: Let X 1 n = [x 1,...,x T ] (T m), or simply X, denote a matrix of random variables x t S R m X S T where S T R Tm is the sample space. Supposing the data are continuously distributed, let the joint p.d.f. of these data be denoted by D(X; θ 0 ), a member of a family of functions D( ; θ), θ Θ. D( ; θ) : S T R representing the density associated with each point in S T, for a given θ. Eduardo Rossi c - Macroeconometria 07 12

The Method of Maximum Likelihood For a given X S T : D(X; ) : Θ R is called the likelihood function. It is denoted by l(,x). X is to be thought of as a sample that has been observed, and l(θ,x) represents the p.d.f. that would be associated with the sample X had it been generated by the data generation process (DGP) with parameters θ. Eduardo Rossi c - Macroeconometria 07 13

The Method of Maximum Likelihood The likelihood function can provide the basis for the inferences from a sample X about the unknown θ. The maximum likelihood estimator is θ = arg max θ Θ L(θ;X) The sample X is representative of the distribution from which it was drawn so that the value of θ for which L is largest is most likely in the sense of attributing the highest probability density to X. Eduardo Rossi c - Macroeconometria 07 14

The Method of Maximum Likelihood Economic theory can specify only the first two moments of the distribution, while Gaussian distribution is assumed without any special justification. In this case, the estimator is called quasi-maximum likelihood (QML). Eduardo Rossi c - Macroeconometria 07 15

The Classical Gaussian Regression Model When the data are independently sampled from a large population, the joint density is merely the product of the marginal densities of the observations. Considering the partition X 1 T = [y1 T,Z1 T ] (respectively, the first and last m-1 columns) suppose the joint density can be factored so that the parameters of interest are all in the conditional factor D(y 1 T,Z 1 T;θ, ψ) = = D y Z (yt 1 Z 1 T;θ)D Z (Z 1 T;ψ) T D(y t z t ; θ)d(z 1 T; ψ) t=1 under the Gaussianity assumption D(y t z t ; θ) = { 1 exp (y t z tβ) 2 2πσ 2 2σ 2 } Eduardo Rossi c - Macroeconometria 07 16

The Classical Gaussian Regression Model The likelihood function is L(β, σ 2 ; X) = ( 1 2πσ 2 ) T exp { S(β) } 2σ 2 where S(β) = T t=1 (y t z tβ) 2. The log-likelihood is L(β, σ 2 ) = T 2 lnσ2 S(β) 2σ 2 The MLE of β is the OLS estimator: β = (Z 1 T Z 1 T) 1 Z 1 T y 1 T The MLE of σ 2 is σ 2 = ǫ ǫ T. Eduardo Rossi c - Macroeconometria 07 17

Properties of MLE In general, in case of independent observations the log-likelihood for the t-th observation is: L t (θ) = log D t (x t ; θ) θ Θ it is assumed that, for some θ 0 int(θ), D t ( ; θ 0 ) represents, with probability 1, the true probability function of x t. Eduardo Rossi c - Macroeconometria 07 18

MLE of The Dynamic Regression Model The dynamic regression model with the specific conditional Gaussian assumption: { D(y t I t ; β, σ 2 1 ) = exp (y t x tβ) 2 } 2πσ 2 2σ 2 l(β, σ 2 ) = T t=p+1 D(y t I t ; β, σ 2 ) p represents the maximum lag on any variables contained in x t. This an approximation to the likelihood function. It is not a joint density function. Since z t may depend on lagged values of y t (weakly but not strongly exogenous) the marginal factors D(z t Z t 1, Y t 1 ) are needed to describe the joint distribution of (y p+1,...,y T ) Eduardo Rossi c - Macroeconometria 07 19

MLE of The Dynamic Regression Model We can regard the maximizers of T t=p+1 D(y t I t ; β, σ 2 ) as ML estimators because the joint density depends on (β, σ 2 ) only through the terms in T t=p+1 D(y t I t ; β, σ 2 ). The OLS estimates are asymptotically equivalent to the MLE when the disturbances are Gaussian. Eduardo Rossi c - Macroeconometria 07 20

Properties of MLE Given {x t, I t }, I t = σ(x t,x t 1,...). The loglikelihood of a closed dynamic model, conditioned only on the past, without the factoring-out of weakly exogenous components: L t (θ) = lnd t (x t I t 1 ; θ) θ Θ θ 0 Θ; D t (x t I t 1 ; θ) represents, with prob.1, the true conditional probability function of x t. The parameters of interest θ are confined in D t. Eduardo Rossi c - Macroeconometria 07 21

Properties of MLE Under dependent sampling the log-likelihood is the sum of the l t s over the sample plus a term representing the initial conditions. For the asymptotic analysis, we can ignore it. Given the assumptions, it is of smaller order as T. When the probability function is evaluated at x t L t (θ) : Θ Ω R For each fixed ω Ω L t (θ) : Θ R. And for each fixed θ it is a I t -measurable random variable. Eduardo Rossi c - Macroeconometria 07 22

Properties of MLE Considering a fixed x t, like x, each l(θ,x) is a mapping from Θ S Ω to R and is a I t -measurable random variable. The same characterization applies to the various partial derivatives w.r.t. the elements of θ. Eduardo Rossi c - Macroeconometria 07 23

Information Inequality x continuously distributed with joint density D(x). Let G(x) G(ξ)dξ = 1 S Let S be the support of D: D(x) > 0 : x S. Suppose that D and G have the same support G(x) = 0 if and only if D(x) = 0 They are to be equivalent. G is an equivalent p.d.f. and can be a candidate to approximate D. Given the Jensen s inequality: [ E log G ] ( ) G(ξ) = log D(ξ)dξ log G(ξ)dξ = 0 D D(ξ) S S Eduardo Rossi c - Macroeconometria 07 24

Kullback-Leibler information criterion Since log is strictly concave, the inequality holds as an equality only in the case where D(x) = G(x) for almost every x S, the exceptions form a set of measure 0 in S. E [ log G D] measures the the closeness of G to D over the sample space is called the Kullback-Leibler information criterion (KLIC). Obvious choices of G include the others members, with θ θ 0, of the family of densities representing the model. Eduardo Rossi c - Macroeconometria 07 25

Kullback-Leibler information criterion The information inequality holds, almost surely, for the case of conditional expectations. With E[ I t 1 ] ( )D t (ξ I t 1 ; θ 0 )dξ E [ log D ] t(x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 log E [ ] Dt (x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 = 0 a.s. Eduardo Rossi c - Macroeconometria 07 26

Identification E[L T (θ)] is maximized at θ 0 Given this result, consistency of ML estimator follows from the following theorem Theorem 1. Θ is compact 2. 1 T L T(θ) p E[L T (θ)] (a non-stochastic function of θ) uniformly in Θ 3. θ 0 int(θ) is the unique maximum of E[L T (θ)] then θ T p θ0. Condition 2 can also be stated in the form 1 T L T(θ) E[L T (θ)] p 0 (5) sup θ Θ Eduardo Rossi c - Macroeconometria 07 27

Structures θ 1 and θ 2 are said to be observationally equivalent if L T (θ 1,X) = L T (θ 2,X) for almost all X S T, and all T 1. A model is said to be globally (locally) identified if the true structure θ 0 is not observationally equivalent to any other point of Θ (of an open neighborhood of θ 0 ). The KLIC for the complete sample E 0 [L T (θ)] E 0 [L T (θ 0 )] E 0 [ ] denotes the expected value under the true distribution. Underidentification implies that 1 T L T(θ) fails the uniqueness requirement of condition (3). Underidentification means that no consistent estimator exists, and the parameters are simply inaccessible to empirical investigation. Eduardo Rossi c - Macroeconometria 07 28

Asymptotic Normality The results hinge on the properties of the gradient of L T (score vector) at θ 0. Define the operator E θ ( I t 1 ) = ( )D t (ξ I t 1 ; θ)dξ (6) representing the conditional expectation of any function of x t when θ is the true parameter. l t () is twice continuously differentiable with respect to θ everywhere on int(θ) S with prob.1, and the derivatives are bounded uniformly in t. Lemma ( E l t θ ) It 1 θ=θ0 = 0 a.s. (7) Eduardo Rossi c - Macroeconometria 07 29

Asymptotic Normality Proof Given that l t θ = 1 D t D t θ We can write ( ) l t E θ I t 1 θ=θ0 lt = θ D t(ξ I t 1 ; θ)dξ Dt (ξ I t 1 ; θ) = dξ θ = D t (ξ I t 1 ; θ)dξ θ }{{} =1 = 0 a.s. (8) Eduardo Rossi c - Macroeconometria 07 30

Asymptotic Normality Interchanging the order of differentiation and integration. This equality holds for the case θ = θ 0. { } l The adapted sequence t θ 0, I t is a vector m.d. Applying the CLT, we have that 1 L T D T θ N(0, I0 ) (9) θ0 where and I 0 = lim T T 1 I T0 (10) I T0 = E [ L T θ L T θ θ0 ] = [ T E t=1 l t θ l t θ θ0 ] (11) Eduardo Rossi c - Macroeconometria 07 31

Asymptotic Normality The matrix I T0 is called the information matrix, being thought of as measuring the amount of information about θ 0 in the sample. I 0 is the limiting information matrix. Theorem I T0 = E [ 2 L T θ θ θ0 ] (12) Eduardo Rossi c - Macroeconometria 07 32

Asymptotic Normality Proof t E θ ( lt θ ) l t θ 0 = lt θ θ D t(ξ; θ)dξ ( 2 l t = θ θ + l t l t θ θ ) = E θ ( 2 l t θ θ = t [ I T0 = E ( lt + E θ θ ( 2 ) l t E θ θ θ I t 1 2 L T θ θ ] I t 1 θ0 ) D t (ξ; θ)dξ ) l t θ Finally, T( θ θ0 ) D N(0, I 1 0 ) (13) Eduardo Rossi c - Macroeconometria 07 33