Estimation of Dynamic Regression Models

University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia

Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy. The econometric analysis will focus on explaining a subset of variables, y t, in terms of the history of the system and of a contemporaneous subset z t, treated as given. Treating z t as given is motivated by the assumption that z t causes y t. What is causality in econometrics? Eduardo Rossi c - Macroeconometria 07 2

Factorization of the density y t,z t. y t subset of x t ; z t subset of x t. w t variables in x t that / y t,z t. D = D w,y,z = D w y,z D y z D z (1) ( ) presence (absence) of contemporaneous casuality ( ) presence (absence) of simultaneous relations Eduardo Rossi c - Macroeconometria 07 3

Factorization of the density If z t y t w t y t The factor represents D y z completely represents the stochastic mechanism generating y t. Note that y t w t is allowed, and no restriction on the relationships between w t and z t is required, for this to be true. Eduardo Rossi c - Macroeconometria 07 4

Factorization of the density Denote by W t 1 = σ (w t 1,w t 2,...) Y t 1 = σ (y t 1,y t 2,...) Z t 1 = σ (z t 1,z t 2,...) Assume there exists a partition of θ into two subvectors θ 1 Θ 1 and θ 2 Θ 2, such that Θ = Θ 1 Θ 2 and D w y,z = D w y,z (w t y t,z t, W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (2) D y z = D y z (y t z t, Y t 1, Z t 1 ;d t, θ 1 ) (3) D z = D z (z t W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (4) D w y,z and D z must not depend on θ 1. D y z must not depend on w t j for j > 0, in the sense that either conditioning or not conditioning on these variables has the same effect. Eduardo Rossi c - Macroeconometria 07 5

Factorization of the density Under the condition Θ = Θ 1 Θ 2 the admissible values of θ 1 may not depend on θ 2, so that knowledge of the latter cannot improve influences about the former. In this case θ 1 and θ 2 are said to be variation free. Under those conditions nothing need be known about the forms of D w y,z and D z to analyze D y z since these do not depend on θ 1. The analysis is conducted conditioning on z t and marginalizing on w t. Sequential cut: The separation of θ into two sets. θ 1 parameters of interest for the investigation θ 2 parameters that are not of interest. Eduardo Rossi c - Macroeconometria 07 6

Weak exogeneity Suppose that D y z depends on a vector φ of parameter of interest, whose values are the focus of investigation. To make the desired factorization of the DGP, it is only necessary that there exists some parameterization θ such that (2) holds, with θ 1 and θ 2 variation free, and φ = g (θ 1 ). In this analysis, the y t are called endogenous, z t are called weakly exogenous for φ. Weak exogeneity is a relationship between parameters and variables, and is not a property of variables as such. Without the required cut of the parameters, the factorization (1) is not relevant to the investigation. Eduardo Rossi c - Macroeconometria 07 7

Other notions of exogeneity Exogeneity is sometimes defined in terms of the independence of the variables in question from the disturbances in a model. In the regression model y t = x tβ + ε t t = 1,...,T x t is independent of ε t+j, j 0, E[x t ε t ] = 0. x t is said to be predetermined. If the independence holds for all j, x t is said to be strictly exogenous. Eduardo Rossi c - Macroeconometria 07 8

Setup Variables are related to their own lags in the sequence of observations, it is necessary to introduce conditioning assumptions. I t set of conditioning variables (the smallest σ-field of events containing the σ-fields generated by the conditioning variables). The model y t = x tβ + ǫ t Assumptions: 1. E[ǫ t I t ] = 0 a.s.. 2. E[ǫ 2 t I t ] = σ 2 a.s. Eduardo Rossi c - Macroeconometria 07 9

Setup The set I t includes deterministic variables (intercept, seasonal dummies, ecc.) lagged variables, dated t j, j > 0 current dated variables that are weakly exogenous for (β, σ 2 ) Any Borel-measurable function of variable in I t is also in I t : ǫ t j I t. ǫ t j = y t j x t jβ Implication of Assumption 1 is that the disturbances must be serially uncorrelated. Eduardo Rossi c - Macroeconometria 07 10

Example Suppose (y t,z t ) is a vector of variables generated by a dynamic DGP represented by the density factorization D t (y t,z t Z t 1, Y t 1 ; φ) = D t (y t z t, Z t 1, Y t 1 ; φ 1 )D t (z t Z t 1, Y t 1 ; φ 2 ) I t = σ(z t ) Z t 1 Y t 1 E[y t z t, Z t 1, Y t 1 ; φ 1 ] = x tβ where x t is composed of elements of z t j j 0 and y t j j > 0, if D t (y t z t, Z t 1, Y t 1 ; φ 1 ) is Gaussian then φ 1 = (β, σ 2 ). Eduardo Rossi c - Macroeconometria 07 11

The Method of Maximum Likelihood Notation: Let X 1 n = [x 1,...,x T ] (T m), or simply X, denote a matrix of random variables x t S R m X S T where S T R Tm is the sample space. Supposing the data are continuously distributed, let the joint p.d.f. of these data be denoted by D(X; θ 0 ), a member of a family of functions D( ; θ), θ Θ. D( ; θ) : S T R representing the density associated with each point in S T, for a given θ. Eduardo Rossi c - Macroeconometria 07 12

The Method of Maximum Likelihood For a given X S T : D(X; ) : Θ R is called the likelihood function. It is denoted by l(,x). X is to be thought of as a sample that has been observed, and l(θ,x) represents the p.d.f. that would be associated with the sample X had it been generated by the data generation process (DGP) with parameters θ. Eduardo Rossi c - Macroeconometria 07 13

The Method of Maximum Likelihood The likelihood function can provide the basis for the inferences from a sample X about the unknown θ. The maximum likelihood estimator is θ = arg max θ Θ L(θ;X) The sample X is representative of the distribution from which it was drawn so that the value of θ for which L is largest is most likely in the sense of attributing the highest probability density to X. Eduardo Rossi c - Macroeconometria 07 14

The Method of Maximum Likelihood Economic theory can specify only the first two moments of the distribution, while Gaussian distribution is assumed without any special justification. In this case, the estimator is called quasi-maximum likelihood (QML). Eduardo Rossi c - Macroeconometria 07 15

The Classical Gaussian Regression Model When the data are independently sampled from a large population, the joint density is merely the product of the marginal densities of the observations. Considering the partition X 1 T = [y1 T,Z1 T ] (respectively, the first and last m-1 columns) suppose the joint density can be factored so that the parameters of interest are all in the conditional factor D(y 1 T,Z 1 T;θ, ψ) = = D y Z (yt 1 Z 1 T;θ)D Z (Z 1 T;ψ) T D(y t z t ; θ)d(z 1 T; ψ) t=1 under the Gaussianity assumption D(y t z t ; θ) = { 1 exp (y t z tβ) 2 2πσ 2 2σ 2 } Eduardo Rossi c - Macroeconometria 07 16

The Classical Gaussian Regression Model The likelihood function is L(β, σ 2 ; X) = ( 1 2πσ 2 ) T exp { S(β) } 2σ 2 where S(β) = T t=1 (y t z tβ) 2. The log-likelihood is L(β, σ 2 ) = T 2 lnσ2 S(β) 2σ 2 The MLE of β is the OLS estimator: β = (Z 1 T Z 1 T) 1 Z 1 T y 1 T The MLE of σ 2 is σ 2 = ǫ ǫ T. Eduardo Rossi c - Macroeconometria 07 17

Properties of MLE In general, in case of independent observations the log-likelihood for the t-th observation is: L t (θ) = log D t (x t ; θ) θ Θ it is assumed that, for some θ 0 int(θ), D t ( ; θ 0 ) represents, with probability 1, the true probability function of x t. Eduardo Rossi c - Macroeconometria 07 18

MLE of The Dynamic Regression Model The dynamic regression model with the specific conditional Gaussian assumption: { D(y t I t ; β, σ 2 1 ) = exp (y t x tβ) 2 } 2πσ 2 2σ 2 l(β, σ 2 ) = T t=p+1 D(y t I t ; β, σ 2 ) p represents the maximum lag on any variables contained in x t. This an approximation to the likelihood function. It is not a joint density function. Since z t may depend on lagged values of y t (weakly but not strongly exogenous) the marginal factors D(z t Z t 1, Y t 1 ) are needed to describe the joint distribution of (y p+1,...,y T ) Eduardo Rossi c - Macroeconometria 07 19

MLE of The Dynamic Regression Model We can regard the maximizers of T t=p+1 D(y t I t ; β, σ 2 ) as ML estimators because the joint density depends on (β, σ 2 ) only through the terms in T t=p+1 D(y t I t ; β, σ 2 ). The OLS estimates are asymptotically equivalent to the MLE when the disturbances are Gaussian. Eduardo Rossi c - Macroeconometria 07 20

Properties of MLE Given {x t, I t }, I t = σ(x t,x t 1,...). The loglikelihood of a closed dynamic model, conditioned only on the past, without the factoring-out of weakly exogenous components: L t (θ) = lnd t (x t I t 1 ; θ) θ Θ θ 0 Θ; D t (x t I t 1 ; θ) represents, with prob.1, the true conditional probability function of x t. The parameters of interest θ are confined in D t. Eduardo Rossi c - Macroeconometria 07 21

Properties of MLE Under dependent sampling the log-likelihood is the sum of the l t s over the sample plus a term representing the initial conditions. For the asymptotic analysis, we can ignore it. Given the assumptions, it is of smaller order as T. When the probability function is evaluated at x t L t (θ) : Θ Ω R For each fixed ω Ω L t (θ) : Θ R. And for each fixed θ it is a I t -measurable random variable. Eduardo Rossi c - Macroeconometria 07 22

Properties of MLE Considering a fixed x t, like x, each l(θ,x) is a mapping from Θ S Ω to R and is a I t -measurable random variable. The same characterization applies to the various partial derivatives w.r.t. the elements of θ. Eduardo Rossi c - Macroeconometria 07 23

Information Inequality x continuously distributed with joint density D(x). Let G(x) G(ξ)dξ = 1 S Let S be the support of D: D(x) > 0 : x S. Suppose that D and G have the same support G(x) = 0 if and only if D(x) = 0 They are to be equivalent. G is an equivalent p.d.f. and can be a candidate to approximate D. Given the Jensen s inequality: [ E log G ] ( ) G(ξ) = log D(ξ)dξ log G(ξ)dξ = 0 D D(ξ) S S Eduardo Rossi c - Macroeconometria 07 24

Kullback-Leibler information criterion Since log is strictly concave, the inequality holds as an equality only in the case where D(x) = G(x) for almost every x S, the exceptions form a set of measure 0 in S. E [ log G D] measures the the closeness of G to D over the sample space is called the Kullback-Leibler information criterion (KLIC). Obvious choices of G include the others members, with θ θ 0, of the family of densities representing the model. Eduardo Rossi c - Macroeconometria 07 25

Kullback-Leibler information criterion The information inequality holds, almost surely, for the case of conditional expectations. With E[ I t 1 ] ( )D t (ξ I t 1 ; θ 0 )dξ E [ log D ] t(x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 log E [ ] Dt (x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 = 0 a.s. Eduardo Rossi c - Macroeconometria 07 26

Identification E[L T (θ)] is maximized at θ 0 Given this result, consistency of ML estimator follows from the following theorem Theorem 1. Θ is compact 2. 1 T L T(θ) p E[L T (θ)] (a non-stochastic function of θ) uniformly in Θ 3. θ 0 int(θ) is the unique maximum of E[L T (θ)] then θ T p θ0. Condition 2 can also be stated in the form 1 T L T(θ) E[L T (θ)] p 0 (5) sup θ Θ Eduardo Rossi c - Macroeconometria 07 27

Structures θ 1 and θ 2 are said to be observationally equivalent if L T (θ 1,X) = L T (θ 2,X) for almost all X S T, and all T 1. A model is said to be globally (locally) identified if the true structure θ 0 is not observationally equivalent to any other point of Θ (of an open neighborhood of θ 0 ). The KLIC for the complete sample E 0 [L T (θ)] E 0 [L T (θ 0 )] E 0 [ ] denotes the expected value under the true distribution. Underidentification implies that 1 T L T(θ) fails the uniqueness requirement of condition (3). Underidentification means that no consistent estimator exists, and the parameters are simply inaccessible to empirical investigation. Eduardo Rossi c - Macroeconometria 07 28

Asymptotic Normality The results hinge on the properties of the gradient of L T (score vector) at θ 0. Define the operator E θ ( I t 1 ) = ( )D t (ξ I t 1 ; θ)dξ (6) representing the conditional expectation of any function of x t when θ is the true parameter. l t () is twice continuously differentiable with respect to θ everywhere on int(θ) S with prob.1, and the derivatives are bounded uniformly in t. Lemma ( E l t θ ) It 1 θ=θ0 = 0 a.s. (7) Eduardo Rossi c - Macroeconometria 07 29

Asymptotic Normality Proof Given that l t θ = 1 D t D t θ We can write ( ) l t E θ I t 1 θ=θ0 lt = θ D t(ξ I t 1 ; θ)dξ Dt (ξ I t 1 ; θ) = dξ θ = D t (ξ I t 1 ; θ)dξ θ }{{} =1 = 0 a.s. (8) Eduardo Rossi c - Macroeconometria 07 30

Asymptotic Normality Interchanging the order of differentiation and integration. This equality holds for the case θ = θ 0. { } l The adapted sequence t θ 0, I t is a vector m.d. Applying the CLT, we have that 1 L T D T θ N(0, I0 ) (9) θ0 where and I 0 = lim T T 1 I T0 (10) I T0 = E [ L T θ L T θ θ0 ] = [ T E t=1 l t θ l t θ θ0 ] (11) Eduardo Rossi c - Macroeconometria 07 31

Asymptotic Normality The matrix I T0 is called the information matrix, being thought of as measuring the amount of information about θ 0 in the sample. I 0 is the limiting information matrix. Theorem I T0 = E [ 2 L T θ θ θ0 ] (12) Eduardo Rossi c - Macroeconometria 07 32

Asymptotic Normality Proof t E θ ( lt θ ) l t θ 0 = lt θ θ D t(ξ; θ)dξ ( 2 l t = θ θ + l t l t θ θ ) = E θ ( 2 l t θ θ = t [ I T0 = E ( lt + E θ θ ( 2 ) l t E θ θ θ I t 1 2 L T θ θ ] I t 1 θ0 ) D t (ξ; θ)dξ ) l t θ Finally, T( θ θ0 ) D N(0, I 1 0 ) (13) Eduardo Rossi c - Macroeconometria 07 33