University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia
Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy. The econometric analysis will focus on explaining a subset of variables, y t, in terms of the history of the system and of a contemporaneous subset z t, treated as given. Treating z t as given is motivated by the assumption that z t causes y t. What is causality in econometrics? Eduardo Rossi c - Macroeconometria 07 2
Factorization of the density y t,z t. y t subset of x t ; z t subset of x t. w t variables in x t that / y t,z t. D = D w,y,z = D w y,z D y z D z (1) ( ) presence (absence) of contemporaneous casuality ( ) presence (absence) of simultaneous relations Eduardo Rossi c - Macroeconometria 07 3
Factorization of the density If z t y t w t y t The factor represents D y z completely represents the stochastic mechanism generating y t. Note that y t w t is allowed, and no restriction on the relationships between w t and z t is required, for this to be true. Eduardo Rossi c - Macroeconometria 07 4
Factorization of the density Denote by W t 1 = σ (w t 1,w t 2,...) Y t 1 = σ (y t 1,y t 2,...) Z t 1 = σ (z t 1,z t 2,...) Assume there exists a partition of θ into two subvectors θ 1 Θ 1 and θ 2 Θ 2, such that Θ = Θ 1 Θ 2 and D w y,z = D w y,z (w t y t,z t, W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (2) D y z = D y z (y t z t, Y t 1, Z t 1 ;d t, θ 1 ) (3) D z = D z (z t W t 1, Y t 1, Z t 1 ;d t, θ 2 ) (4) D w y,z and D z must not depend on θ 1. D y z must not depend on w t j for j > 0, in the sense that either conditioning or not conditioning on these variables has the same effect. Eduardo Rossi c - Macroeconometria 07 5
Factorization of the density Under the condition Θ = Θ 1 Θ 2 the admissible values of θ 1 may not depend on θ 2, so that knowledge of the latter cannot improve influences about the former. In this case θ 1 and θ 2 are said to be variation free. Under those conditions nothing need be known about the forms of D w y,z and D z to analyze D y z since these do not depend on θ 1. The analysis is conducted conditioning on z t and marginalizing on w t. Sequential cut: The separation of θ into two sets. θ 1 parameters of interest for the investigation θ 2 parameters that are not of interest. Eduardo Rossi c - Macroeconometria 07 6
Weak exogeneity Suppose that D y z depends on a vector φ of parameter of interest, whose values are the focus of investigation. To make the desired factorization of the DGP, it is only necessary that there exists some parameterization θ such that (2) holds, with θ 1 and θ 2 variation free, and φ = g (θ 1 ). In this analysis, the y t are called endogenous, z t are called weakly exogenous for φ. Weak exogeneity is a relationship between parameters and variables, and is not a property of variables as such. Without the required cut of the parameters, the factorization (1) is not relevant to the investigation. Eduardo Rossi c - Macroeconometria 07 7
Other notions of exogeneity Exogeneity is sometimes defined in terms of the independence of the variables in question from the disturbances in a model. In the regression model y t = x tβ + ε t t = 1,...,T x t is independent of ε t+j, j 0, E[x t ε t ] = 0. x t is said to be predetermined. If the independence holds for all j, x t is said to be strictly exogenous. Eduardo Rossi c - Macroeconometria 07 8
Setup Variables are related to their own lags in the sequence of observations, it is necessary to introduce conditioning assumptions. I t set of conditioning variables (the smallest σ-field of events containing the σ-fields generated by the conditioning variables). The model y t = x tβ + ǫ t Assumptions: 1. E[ǫ t I t ] = 0 a.s.. 2. E[ǫ 2 t I t ] = σ 2 a.s. Eduardo Rossi c - Macroeconometria 07 9
Setup The set I t includes deterministic variables (intercept, seasonal dummies, ecc.) lagged variables, dated t j, j > 0 current dated variables that are weakly exogenous for (β, σ 2 ) Any Borel-measurable function of variable in I t is also in I t : ǫ t j I t. ǫ t j = y t j x t jβ Implication of Assumption 1 is that the disturbances must be serially uncorrelated. Eduardo Rossi c - Macroeconometria 07 10
Example Suppose (y t,z t ) is a vector of variables generated by a dynamic DGP represented by the density factorization D t (y t,z t Z t 1, Y t 1 ; φ) = D t (y t z t, Z t 1, Y t 1 ; φ 1 )D t (z t Z t 1, Y t 1 ; φ 2 ) I t = σ(z t ) Z t 1 Y t 1 E[y t z t, Z t 1, Y t 1 ; φ 1 ] = x tβ where x t is composed of elements of z t j j 0 and y t j j > 0, if D t (y t z t, Z t 1, Y t 1 ; φ 1 ) is Gaussian then φ 1 = (β, σ 2 ). Eduardo Rossi c - Macroeconometria 07 11
The Method of Maximum Likelihood Notation: Let X 1 n = [x 1,...,x T ] (T m), or simply X, denote a matrix of random variables x t S R m X S T where S T R Tm is the sample space. Supposing the data are continuously distributed, let the joint p.d.f. of these data be denoted by D(X; θ 0 ), a member of a family of functions D( ; θ), θ Θ. D( ; θ) : S T R representing the density associated with each point in S T, for a given θ. Eduardo Rossi c - Macroeconometria 07 12
The Method of Maximum Likelihood For a given X S T : D(X; ) : Θ R is called the likelihood function. It is denoted by l(,x). X is to be thought of as a sample that has been observed, and l(θ,x) represents the p.d.f. that would be associated with the sample X had it been generated by the data generation process (DGP) with parameters θ. Eduardo Rossi c - Macroeconometria 07 13
The Method of Maximum Likelihood The likelihood function can provide the basis for the inferences from a sample X about the unknown θ. The maximum likelihood estimator is θ = arg max θ Θ L(θ;X) The sample X is representative of the distribution from which it was drawn so that the value of θ for which L is largest is most likely in the sense of attributing the highest probability density to X. Eduardo Rossi c - Macroeconometria 07 14
The Method of Maximum Likelihood Economic theory can specify only the first two moments of the distribution, while Gaussian distribution is assumed without any special justification. In this case, the estimator is called quasi-maximum likelihood (QML). Eduardo Rossi c - Macroeconometria 07 15
The Classical Gaussian Regression Model When the data are independently sampled from a large population, the joint density is merely the product of the marginal densities of the observations. Considering the partition X 1 T = [y1 T,Z1 T ] (respectively, the first and last m-1 columns) suppose the joint density can be factored so that the parameters of interest are all in the conditional factor D(y 1 T,Z 1 T;θ, ψ) = = D y Z (yt 1 Z 1 T;θ)D Z (Z 1 T;ψ) T D(y t z t ; θ)d(z 1 T; ψ) t=1 under the Gaussianity assumption D(y t z t ; θ) = { 1 exp (y t z tβ) 2 2πσ 2 2σ 2 } Eduardo Rossi c - Macroeconometria 07 16
The Classical Gaussian Regression Model The likelihood function is L(β, σ 2 ; X) = ( 1 2πσ 2 ) T exp { S(β) } 2σ 2 where S(β) = T t=1 (y t z tβ) 2. The log-likelihood is L(β, σ 2 ) = T 2 lnσ2 S(β) 2σ 2 The MLE of β is the OLS estimator: β = (Z 1 T Z 1 T) 1 Z 1 T y 1 T The MLE of σ 2 is σ 2 = ǫ ǫ T. Eduardo Rossi c - Macroeconometria 07 17
Properties of MLE In general, in case of independent observations the log-likelihood for the t-th observation is: L t (θ) = log D t (x t ; θ) θ Θ it is assumed that, for some θ 0 int(θ), D t ( ; θ 0 ) represents, with probability 1, the true probability function of x t. Eduardo Rossi c - Macroeconometria 07 18
MLE of The Dynamic Regression Model The dynamic regression model with the specific conditional Gaussian assumption: { D(y t I t ; β, σ 2 1 ) = exp (y t x tβ) 2 } 2πσ 2 2σ 2 l(β, σ 2 ) = T t=p+1 D(y t I t ; β, σ 2 ) p represents the maximum lag on any variables contained in x t. This an approximation to the likelihood function. It is not a joint density function. Since z t may depend on lagged values of y t (weakly but not strongly exogenous) the marginal factors D(z t Z t 1, Y t 1 ) are needed to describe the joint distribution of (y p+1,...,y T ) Eduardo Rossi c - Macroeconometria 07 19
MLE of The Dynamic Regression Model We can regard the maximizers of T t=p+1 D(y t I t ; β, σ 2 ) as ML estimators because the joint density depends on (β, σ 2 ) only through the terms in T t=p+1 D(y t I t ; β, σ 2 ). The OLS estimates are asymptotically equivalent to the MLE when the disturbances are Gaussian. Eduardo Rossi c - Macroeconometria 07 20
Properties of MLE Given {x t, I t }, I t = σ(x t,x t 1,...). The loglikelihood of a closed dynamic model, conditioned only on the past, without the factoring-out of weakly exogenous components: L t (θ) = lnd t (x t I t 1 ; θ) θ Θ θ 0 Θ; D t (x t I t 1 ; θ) represents, with prob.1, the true conditional probability function of x t. The parameters of interest θ are confined in D t. Eduardo Rossi c - Macroeconometria 07 21
Properties of MLE Under dependent sampling the log-likelihood is the sum of the l t s over the sample plus a term representing the initial conditions. For the asymptotic analysis, we can ignore it. Given the assumptions, it is of smaller order as T. When the probability function is evaluated at x t L t (θ) : Θ Ω R For each fixed ω Ω L t (θ) : Θ R. And for each fixed θ it is a I t -measurable random variable. Eduardo Rossi c - Macroeconometria 07 22
Properties of MLE Considering a fixed x t, like x, each l(θ,x) is a mapping from Θ S Ω to R and is a I t -measurable random variable. The same characterization applies to the various partial derivatives w.r.t. the elements of θ. Eduardo Rossi c - Macroeconometria 07 23
Information Inequality x continuously distributed with joint density D(x). Let G(x) G(ξ)dξ = 1 S Let S be the support of D: D(x) > 0 : x S. Suppose that D and G have the same support G(x) = 0 if and only if D(x) = 0 They are to be equivalent. G is an equivalent p.d.f. and can be a candidate to approximate D. Given the Jensen s inequality: [ E log G ] ( ) G(ξ) = log D(ξ)dξ log G(ξ)dξ = 0 D D(ξ) S S Eduardo Rossi c - Macroeconometria 07 24
Kullback-Leibler information criterion Since log is strictly concave, the inequality holds as an equality only in the case where D(x) = G(x) for almost every x S, the exceptions form a set of measure 0 in S. E [ log G D] measures the the closeness of G to D over the sample space is called the Kullback-Leibler information criterion (KLIC). Obvious choices of G include the others members, with θ θ 0, of the family of densities representing the model. Eduardo Rossi c - Macroeconometria 07 25
Kullback-Leibler information criterion The information inequality holds, almost surely, for the case of conditional expectations. With E[ I t 1 ] ( )D t (ξ I t 1 ; θ 0 )dξ E [ log D ] t(x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 log E [ ] Dt (x t I t 1 ; θ) D t (x t I t 1 ; θ 0 ) I t 1 = 0 a.s. Eduardo Rossi c - Macroeconometria 07 26
Identification E[L T (θ)] is maximized at θ 0 Given this result, consistency of ML estimator follows from the following theorem Theorem 1. Θ is compact 2. 1 T L T(θ) p E[L T (θ)] (a non-stochastic function of θ) uniformly in Θ 3. θ 0 int(θ) is the unique maximum of E[L T (θ)] then θ T p θ0. Condition 2 can also be stated in the form 1 T L T(θ) E[L T (θ)] p 0 (5) sup θ Θ Eduardo Rossi c - Macroeconometria 07 27
Structures θ 1 and θ 2 are said to be observationally equivalent if L T (θ 1,X) = L T (θ 2,X) for almost all X S T, and all T 1. A model is said to be globally (locally) identified if the true structure θ 0 is not observationally equivalent to any other point of Θ (of an open neighborhood of θ 0 ). The KLIC for the complete sample E 0 [L T (θ)] E 0 [L T (θ 0 )] E 0 [ ] denotes the expected value under the true distribution. Underidentification implies that 1 T L T(θ) fails the uniqueness requirement of condition (3). Underidentification means that no consistent estimator exists, and the parameters are simply inaccessible to empirical investigation. Eduardo Rossi c - Macroeconometria 07 28
Asymptotic Normality The results hinge on the properties of the gradient of L T (score vector) at θ 0. Define the operator E θ ( I t 1 ) = ( )D t (ξ I t 1 ; θ)dξ (6) representing the conditional expectation of any function of x t when θ is the true parameter. l t () is twice continuously differentiable with respect to θ everywhere on int(θ) S with prob.1, and the derivatives are bounded uniformly in t. Lemma ( E l t θ ) It 1 θ=θ0 = 0 a.s. (7) Eduardo Rossi c - Macroeconometria 07 29
Asymptotic Normality Proof Given that l t θ = 1 D t D t θ We can write ( ) l t E θ I t 1 θ=θ0 lt = θ D t(ξ I t 1 ; θ)dξ Dt (ξ I t 1 ; θ) = dξ θ = D t (ξ I t 1 ; θ)dξ θ }{{} =1 = 0 a.s. (8) Eduardo Rossi c - Macroeconometria 07 30
Asymptotic Normality Interchanging the order of differentiation and integration. This equality holds for the case θ = θ 0. { } l The adapted sequence t θ 0, I t is a vector m.d. Applying the CLT, we have that 1 L T D T θ N(0, I0 ) (9) θ0 where and I 0 = lim T T 1 I T0 (10) I T0 = E [ L T θ L T θ θ0 ] = [ T E t=1 l t θ l t θ θ0 ] (11) Eduardo Rossi c - Macroeconometria 07 31
Asymptotic Normality The matrix I T0 is called the information matrix, being thought of as measuring the amount of information about θ 0 in the sample. I 0 is the limiting information matrix. Theorem I T0 = E [ 2 L T θ θ θ0 ] (12) Eduardo Rossi c - Macroeconometria 07 32
Asymptotic Normality Proof t E θ ( lt θ ) l t θ 0 = lt θ θ D t(ξ; θ)dξ ( 2 l t = θ θ + l t l t θ θ ) = E θ ( 2 l t θ θ = t [ I T0 = E ( lt + E θ θ ( 2 ) l t E θ θ θ I t 1 2 L T θ θ ] I t 1 θ0 ) D t (ξ; θ)dξ ) l t θ Finally, T( θ θ0 ) D N(0, I 1 0 ) (13) Eduardo Rossi c - Macroeconometria 07 33