Financial Econometrics Prof. Massimo Guidolin

CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis stage of your studies in statistics and econometrics you sould be (at least) familiar wit linear regression models of te kind: = 0 + 1 + In order to estimate tis model by ordinary least squares (OLS), several assumptions need to be in place. 1.1. Weak OLS Hypoteses ( {1 2}): 1. ( )=0; 2. ( ) = 0, wic can also be written ( )=( ) (deterministic regressors); 3. ( )=( )= 2 (omoskedasticity); 4. ( ) = 0 (no autocorrelation in residuals). Hypotesis 3 and 4 can be summarized by saying tat ( 1: )=I 2.TeStrong OLS Hypoteses set implies te four conditions above plus an assumption on te distribution of residuals: v (0 2 ) {1 2}. However, data from many real-life problems fail to fulfill te OLS conditions, even in teir weak form. In fact: Hypotesis 1 is trivially verified if you just add an intercept to your model. However, Hypotesis 2 seems considerably more problematic. For instance, suppose tat you observe your independent variable wit a random, wite noise measurement error: = +. In tis case your regressor is no longer deterministic, i.e., fixed in repeated samples. Even if you know tat ( ) = 0, wat about ( )? Tere is a wide literature dealing wit 1

stocastic regressors, wic uses Two Stage Least Squares and Instrumental Variables. However, we will not discuss tese estimators in tis set of notes because tese problems are not specific of finance applications, but you sould be aware of te fact tat Hypotesis 2 may be often problematic in applications. Te potential issues wit Hypoteses 3 and 4 are on te contrary particularly relevant in finance. For many financial time series, te conditional variance ( +1 )(were is an information set or an information structure determined by te financial application at and) is not constant over time. In addition, wen fitting OLS regression models to financial time series, te null of ( ) = 0 is often rejected. We sall see below tat if Hypoteses 3 and 4 are violated, te OLS estimator is not any longer te best linear unbiased estimator (BLUE), even if it is still consistent. In fact, tere is anoter unbiased estimator wose variance is smaller tan te one of OLS: it is te Generalized Least Squares (GLS) estimator described below. In addition, tere is a furter point tat i deserves to be discussed in more dept. Consider a matrix 0 of time series Y = 1 2 (also called a Panel ) and a set of regressors 1:, 1:, 1:,... How would you estimate te relationsip between dependent variables and regressors in tis case? A first way to do tat (since you are familiar wit OLS) is to run a separate OLS regression for eac time series, =1. In fact, you know tat te OLS estimator is BLUE among te estimators wic are a linear function of. However...now your problem involves more tan just one dependent variable. Is it possible to build an unbiased estimator wic is a function of bot 1, 2,... and is more efficient tan te OLS? Te answer is yes, and we will talk more extensively later about models wit multiple equations. In te following sections we will quickly review some estimators used in te econometrics literature, wic you migt find elpful wen dealing wit problems were OLS are no longer a feasible (or at least, an efficient) solution. 2. Generalized (and Feasible Generalized) Least Squares As already noted, in te OLS framework we assume tat ( 1: ) = (²) = I 2. Now, let s abandon tis ypotesis and let s say tat (²) =Σ, wereσ is any valid (i.e., symmetric and positive definite) variance-covariance matrix. If te matrix Σ is known (we will come back to tis point later), it is possible to derive an estimator for regression coefficients in a way similar to wat we normally do under OLS. As you sould remember, if te covariance matrix is positive definite (notice: tis does not old in case Σ is semi-positive definite), you can always invert it and write Σ 1 = DD 0. You can ten write your standard Least Squares problem in matrix form as: Y = Xβ + ², Were Y is te 1 vector collecting observations for te dependent variables, X is a matrix of te regressors, β is a 1 vector of coefficients to be estimated, and ² is a vector of residuals. 1 If 1 If needed, X migt be expanded to also include a vector of ones, tat will ten represent te constant of te regression model to be estimated as te corresponding element of te 1vectorofcoefficients. 2

we pre-multiply te regression model by te matrix D we get: D 0 Y = D 0 Xβ + D 0 ² or Ỹ = Xβ + ², were Ỹ D 0 Y, X D 0 X,and ² D 0 ².Notetat( ²) =( ² ² 0 )=D 0 (²² 0 )D = D 0 (DD 0 ) 1 D = D 0 (D 0 ) 1 D 1 D = I because Σ 1 = DD 0 implies tat (²² 0 )=Σ =(DD 0 ) 1. 2 Coefficients can now be estimated by OLS, since any eteroskedasticity as been removed. Te estimator will be: Tis estimator is unbiased: ˆβ = ( X 0 X) 1 X 0 Ỹ =(X 0 DD 0 X) 1 X 0 DD 0 Y = (X 0 Σ 1 X) 1 X 0 Σ 1 Y (ˆβ ) = (X 0 Σ 1 X) 1 X 0 Σ 1 (Xβ + ²) = (X 0 Σ 1 X) 1 X 0 Σ 1 Xβ+(X 0 Σ 1 X) 1 X 0 Σ 1 (²) =β. In te same way we can derive te variance-covariance matrix of te GLS estimators: (ˆβ ) = ( X 0 X) 1 X 0 ( ² ² 0 ) X ( X 0 X) 0i 1 = ( X 0 X) 1 X 0 I X ( X 0 X) 0i 1 =( X 0 X) 1 = (X 0 DD 0 X) 1 =(X 0 Σ 1 X) 1 Note tat if we were not taking into consideration eteroskedasticity and we were using OLS, our estimator would ave been inefficient, but still unbiased. In fact: ˆβ =(X 0 X) 1 X 0 Y implies tat (ˆβ )=(X 0 X) 1 X 0 (Xβ + ²) =β. However, (ˆβ )=(X 0 X) 1 X 0 (²² 0 )X(X 0 X) 0 1 =(X 0 X) 1 X 0 ΣX(X 0 X) 0 1 wic can be proven to always exceed (ˆβ )=(X 0 Σ 1 X) 1. In fact, if Σ = I 2 te two estimators are identical and also te two variances: (ˆβ )=(ˆβ )= 2 (X 0 X) 1. Wen Σ 6= I 2,we can write teir ratio as: (ˆβ ) (ˆβ ) = (X 0 X) 1 X 0 ΣX(X 0 X) 0 1 (X 0 Σ 1 X) Because X(X 0 X) 0 1 X 0 is an idempotent matrix, it follows tat = (X 0 X) 1 X 0 DD 0 X(X 0 X) 0 1 (X 0 D 1 D 0 1 X) = D 0 1 X(X 0 X) 1 X 0 DD 0 X(X 0 X) 0 1 X 0 D 1 (ˆβ ) (ˆβ ) = D 0 1 X(X 0 X) 1 X 0 DD 0 X(X 0 X) 0 1 X 0 D 1 2 ( ) =( 0 )because( ) =(D 0 )=D 0 () =0. = D 1 D 0 1 DD 0 X(X 0 X) 0 1 X 0 X(X 0 X) 0 1 X 0 = X(X 0 X) 0 1 X 0 3

wic is always greater tan 1. Tis is wy te OLS estimator is less efficient tan te GLS estimator. We now ave to make and important remark. Wen estimating GLS in actual applications, te structure of Σ will never be known. You will rater ave at your disposal some estimator ˆΣ wic is an estimator of Σ and tat assuming you want to focus only on consistent estimators, as you sould will converge to te true covariance matrix as te sample becomes large (for tose of you wo know more, we can say tat ˆΣ = Σ). Our estimator will terefore be te Feasible Generalized Least Squares (FGLS) estimator: ˆβ =(X 0 ˆΣ 1 X) 1 X 0 ˆΣ 1 Y. Te specification of a peculiar structure/model for te variance-covariance matrix becomes terefore a problem of key importance. In a finite sample ˆΣ is different from Σ. Tis migt cause a bias in te GLS estimator, since we are not ex ante guaranteed (especially if te sample is very small) tat i (X 0 ˆΣ 1 X) 1 X 0 ˆΣ 1 ²) = 0. 3. Completing te Picture: te (Generalized) Metod of Moments Teestimatorstatweavereviewedsofar(OLSandGLS)andteonestatwewillconsiderin te following pages can be derived using Maximum Likeliood Estimation (MLE), wic is based on an assumption (called parametric) on te density of distribution of te residuals. In te case of OLS, tis assumption is part of te Strong OLS Hypoteses set, and can be written as (see also Section 1): (0 2 ). Given tis ypotesis, it is possible to estimate parameters maximizing te natural logaritm of te likeliood function. In te OLS case: ln (Y; β 2 )= 2 ln 2 2 ln 2 1 2 2 (Y Xβ)0 (Y Xβ) However, OLS and GLS estimators do not require any parametric assumption on residuals; tey can just be derived using ortogonality conditions. Parametric assumptions migt be used in a second step, in order to determine te probabilistic properties of te estimators. Tis is wy tey are called semiparametric estimators. Te underlying idea is tat te sample statistics of a given sample converge to te true moments of te population if te size of te sample gets larger and larger. Sample statistics cantenbeusedtoimposeconditionsinordertoestimate regression parameters. As te size of te sample becomes large, parameter estimates will converge to teir true values. OLS and GLS can terefore be included in a wider group of estimation metods, wic is called GeneralizedMetodof Moments (GMM). In formulas, suppose tat we want to identify parameters ( {1 2}). We ave to define sample moments ˆ = 1 X (ˆ 1 ˆ 2 ˆ ) =1, =1 knowing (or assuming) tat lim ˆ =,were = [ ( 1 2 )] is te true but unknown 4

population moment. Ten parameter estimates may be estimated by imposing te set of conditions: 1 1 (ˆ 1 ˆ 2 ˆ ) = 0 2 2 (ˆ 1 ˆ 2 ˆ ) = 0 = 0 (ˆ 1 ˆ 2 ˆ ) = 0 (1) were is te sample value of a statistic of interest, =1. Basically, from a matematical point of view, all tat you are doing is to look for te set of parameter values ˆ 1 ˆ 2 ˆ wic jointly minimizes te distance between te conditions imposed on parameters and sample moments. Notice tat te number of moment conditions may exceed te number of parameters to be estimated,. Wen =, we speak of just identified GMM or simply metod of moment estimates: in tis case you impose te minimal number of conditions for te system (1) to be solved assuming a solution exists to find ˆ 1 ˆ 2 ˆ. Often in practice, one elects to set and in tis case one additional problems is ow to weigt te different moment conditions, because from a matematical point of view it is clear tat only by cance ˆ 1 ˆ 2 ˆ will exist suc tat all te conditions/equations in (1) may find a solution. Altoug tis is a rater advanced topic to be developed in te appropriate context, one solution, wic in fact also delivers a number of optimal properties consists of estimating ˆβ by solving: were m [ 1 2... ] 0, f(β) min [ m f(β)] 0 S 1 [ m f(β)] (2) 1 P =1 1( 1 ) 1 P =1 2( 1 )... 1 P i 0 =1 ( 1 ) S = [ m f( β)] and β is a first-step estimator tat simply minimizes te quadratic form [ ˆm f(β)] 0 [ ˆm f(β)] Basically, (2) means tat you pick ˆβ to minimize a set of moment conditions weigted by te inverse of teir covariance matrix, so tat te moment conditions tat are estimated more precisely in te data sample will receive a iger weigt. Clearly, a just identified GMM estimation problem corresponds to (2) wen S = I. How can we write an OLS or a GLS estimator as a GMM one? Here recall tat one of te Weak OLS Hypoteses stated tat ( ) = 0. Tus, te GMM condition for estimating β is: [² 0 X]=[(Y Xβ) 0 X]=0. Te same appens for GLS, wit te only difference tat we ave to take into account te structure of te variance-covariance matrix: [(Ỹ Xβ) 0 X] =[(Y Xβ) 0 Σ 1 X]=0. 5

4. Dealing wit More tan One Equation: Seemingly Unrelated Regressions Let s now consider a situation were tere are two or more dependent variables, say 2. regression model now is: Te Y = Xβ + ² X 1 0 0 0 X 2 0 = Y 1 Y 2 β 1 β 2 + ² 1 ² 2, Y 0 0 X β ² were β is te vector of coefficients for te t equation, =1. Te covariance matrix of residuals will be: (²) = ² 1 ² 2. ² i ² 0 1 ² 0 2 ² 0 ² 1 ² 0 1 ² 1 ² 0 2 ² 1 ² 0 I 11 I 12 I 1 ² 2 ² 0 1 ² 2 ² 0 2 ² 2 ² 0 I 21 I 22 I 2 =. =..... = Ω,.... ² 1 ² 0 ² 2 ² 0 ² ² 0 I 1 I 2 I were ( ) is te covariance between te residuals of regressions and. As we said early on, a first approac to te problem would be to estimate a separate OLS model for eac regression equation. However, as we ave seen wen introducing GLS metods, tis migt not be te most efficient solution. If we look at te model above, te equations for te different dependent variables are apparently completely unrelated as far as te conditional mean is concerned. However, relationsips exist troug te covariance matrix Ω. Working troug an algebra very similar to te one sown in Section 2, it is possible to derive te Seemingly Unrelated Regressions estimator: ˆβ =(X 0 Ω 1 X) 1 X 0 Ω 1 Y Tis estimator is BLUE wit variance (ˆβ )=(X 0 Ω 1 X) 1. Also for te SUR estimator we ave to bear in mind tat usually we do not know te matrix Ω. Tis may pose problems in small samples. Interestingly, te OLS equation by equation estimator and te SUR estimator are equivalent in only two very special situations: Ω is a diagonal matrix, or in oter words =0if 6=. Te set of regressors is te same for all te equations: X = X {1 2}. 6

Wen you are dealing wit a single regression model ( = 1), te key issue is obviously only te coice of te rigt set of regressors. However, as you can understand after our discussion in tis section, in te case of a system of regression models ( 2), tere is anoter coice you ave to make: you ave to decide weter to model a time series on its own or along wit oter time series. If you coose te rigt set of time series and regressors, you will be able to exploit cross-sectional information and improve te efficiency of your estimates by obtaining a GLS-type estimator. Tis is te appropriate point in wic we can introduce some ideas on estimation of models wit fixed and random effects. Tese models are useful wen you are analyzing a population (e.g., te set of EU countries) and you want to ceck weter tere are any structural differences among units in te population (e.g., countries) conditionally on a common set of explanatory variables (GDP, current account, inflation, etc.). Consider te regression of different vectors of dependent variables, Y 1, Y 2,..., Y,onamatrixof independent variables X. We can write: P = 0 + + =1. =1 Assume tat in tis model =, (² )=0,(² ² 0 )=I, (² ² 0 )=0. Summing up, in tis model cross-sectional covariances between errors are null, wile te set of regressors and regression slopes are te same for eac dependent variable. Te only estimator measuring cross sectional differences is te intercept 0 ( =1). In practice we can tink of te intercept as a constant common to all models plus a fixed component tat canges according to te dependent variable. In formulas: 0 = 0 +. Tis model is known as a model wit fixed effects. Te name comes from te fact tat te cross-sectional effects captured by te intercept (te s) are fixed for eac dependent variable. Tis model (as usual) as its own BLUE estimator for te regression coefficients. Inordertouseamodelwitfixed effects, you need to assume tat your model encloses te wole cross-sectional dimension of te population. If you tink about it, wen you are introducing fixed effects, you completely neglect any conditional (cross-sectional) volatility for te intercept. Tis can be done only if you observe all te individuals and you can perfectly identify te intercept effect for eac one of tem. If your population is a sub-sample of a larger set you need to take into account tis conditional volatility and use random effects. Amodelwitrandom effects is very similar to a model wit fixed effects, even if tere is a key difference. In a random effect model, te intercept for te dependent variable is 0 = 0 +, were is a random error suc tat: ( )=0,( 2 )=2, ( )=0,( ² )=0. So te model can be re-written as: = P 0 + + + =1. =1 Also random effects ave teir own BLUE estimator. 5. Simultaneous Equations and Vector Autoregressive Models In our discussion so far we ave considered very general econometric frameworks. Wen modeling multiple time series in economics, a key question is te study of contemporaneous and lagged casual 7

relationsips between variables. Simultaneous Equations Models are very popular in macroeconomics, were Structural Models are used to bring teoretical frameworks wit large number of equilibrium equations to te data. A consistent modeling of casualty becomes of key importance. One of te most basic (and most popular) frameworks in tis area are Vector Autoregressive Models (VAR). VARs are based on te assumption tat, given a vector of time series, tere is a feedback relationsip between te current value of eac time series and lagged values of te same or oter series. In practice tis model can be seen as an autoregressive model for multiple time series. In formulas, a VAR model wit lags, also called VAR() can be written as: Z = Φ 0 + Φ 1 Z 1 + Φ 2 Z 2 + + Φ Z + ², were Z and ² are bot 1 vectors. Note tat tis model can always be written as a VAR(1) by re-arranging te equations in te following state-space form: Z Φ 0 Φ 1 Φ 2 Φ Z 1 ² Z 1 0 I O O Z 2 0 = +... +...... 0 O I O 0 Z +1 Because of tis result, let s focus now on te simple framework: Z = Φ 0 + Φ 1 Z 1 + ². First, tis model is estimated by OLS, because te set of regressors is te same in eac equation, and te SUR estimator coincides wit te OLS one. Second, te estimation of tis model involves a large number of parameters. Consider an example in wic Z as four elements. Even if you neglect intercepts you need to fill te full matrix Φ 1 (and te variance-covariance matrix of residuals): 11 12 13 14 Φ 1 = 21 22 23 24 31 32 33 34 41 42 43 44 Tepresenceofalargenumberofparametersmayposetrickyrisksof over-fitting te data. By tis we mean tat some of te parameters in te matrix Φ 1 migt result significantly different from zero, even if tey do not explain any meaningful relationsip in te data. Tis is wy in estimating VAR matrices econometricians usually impose constraints and set ex-ante some entries to equal zero. Te stationarity condition for a VAR(1) model is someow similar to te one tat we impose for an AR(1) model. Unconditional moments are: Z (Z) = ( Φ 1 ) 1 Φ 0 ((Z)) = ( Φ 1 N Φ1 ) 1 (Σ) were N is te Kronecker product of two matrices. Conditional -steps aead moments can be derived as: (Z + Z ) = Φ 0 + Φ 1 Φ 0 + Φ 2 1Φ 0 + + Φ 1 1 Φ 0 + Φ 1Z (Z + Z ) = Σ + Φ 1 ΣΦ 0 1 + Φ 2 1ΣΦ 20 1 + + Φ k 1 1 Σ(Φ 1 1 ) 0 8

Unconditional moments can ten be obtained as te limit of te conditional moments as. Remember tat we can always write te eigendecomposition: Φ 1 = CΛC; ten: Φ 1 = CΛ C. To be sure tat Φ 1 is well defined for, we need to ask tat Λ is well defined because 1 0 0 0 Λ 0 2 0 0 = 0 0 0 0 0 0 Tis is equivalent to requiring tat all te eigenvalues of Φ 1 fall in te unit circle. References [1] James D. Hamilton, 2004, Time Series Analysis, Princeton University Press. [2] William H. Green, 2008, Econometric Analysis, 6t Edition, Prentice Hall [3] C. Brooks, 2002, Introductory Econometrics for Finance, Cambridge University Press. 9