Notes on Time Series Models 1

Size: px

Start display at page:

Download "Notes on Time Series Models 1"

Joella West
6 years ago
Views:

1 Notes on ime Series Models Antonis Demos Athens University of Economics and Business First version January 007 his version January 06 hese notes include material taught to MSc students at Athens University of Economics and Business since 999. Please report all typos to: A. Demos, Athens University of Economics and Business, Dept. of International and European Economic Studies, 76 Patision Str., Athens 0434, Greece el: , Fax:

2 Before moving to the pure time series models we shall need the notion of orthogonal projections. We need this to evaluate the partial correlation coeffi cients, especially for Moving Average models. Projections Orthogonal Assume the usual linear regression setup, i.e. y = Xβ + u, u X D 0, σ I n where y is the n vector of endogenous variables, X is the n k matrix of weakly exogenous explanatory variables, β is the k vector of mean parameters and u is the n vector of errors. When we estimate a linear regression model, we simply map the regressand y into a vector of fitted values X β and a vector of residuals û = y X β. Geometrically, these mappings are examples of orthogonal projections. A projection is a mapping that takes each point of E n into a point in a subset of E n, while leaving all the points of the subset unchanged, where E n is the usual Euclidean vector space, i.e. the set of all vectors in R n where the addition, the scalar multiplication and the inner product hence the norm are defined. Because of this invariance the subspace is called invariant subspace of the projection. An orthogonal projection maps any point into the point of the subspace that is closest to it. If a point is already in the invariant subspace, it is mapped into itself. Algebraically, an orthogonal projection on to a given subspace can be performed by premultiplying the vector to be projected by a suitable projection matrix. In the case of OLS, the two projection matrices that yield the vector of fitted values and the vector of residuals, respectively, are P X = X X / X X /

3 and M X = I n P X = I n X X / X X /. o see this notice that the fitted values ŷ = X β = X X / X X / y = P X y. Hence the P X projection matrix project on to S X, i.e. the subspace of E n spanned by the columns of X. Notice that for any vector α R k the vector Xα belongs to S X. As now Xα S X then it should be the case, due to the invariance of P X, that P X Xα = Xα. But notice that P X X = X X / X X / X = XI k = X. It is clear that when P X is applied to y it yields the vector of fitted values. On the other hand the M X projection matrix yields the vector of residuals as M X y = [ I n X X / X ] X / y = y P X y = y X β = û. he image of M X is S X, the orthogonal complement of the image of P X. o see this, consider any vector w S X. It must satisfy the condition X / w = 0, which implies that P X w = 0, by the definition of P X. Consequently, I n P X w = M X w = w and S X must be contained in the image of M X, i.e. S X Im M X. Now consider any image of M X. It must take the form M X z. But then M X z / X = z / M X X = 0 as M X symmetric. Hence M X z belongs to S X, for any z. Consequently, Im M X S X and hence the image of M X coincides with S X.

4 For any matrix to represent a projection, it must be idempotent. his is because the vector image of a projection matrix is say S X, and then project it again, the second projection should have no effect, i.e. P X P X z = P X z for any z. It is easy to prove that this is the case with P X and M X, as P X P X = P X and M X M X = M X. By the definition of M X it is obvious that M X = I n X X / X X / = I n P X M X + P X = I n, and consequently for any vector z E n we have M X z + P X z = z. he pair of projections M X and P X are called complementary projections, since the sum M X z and P X z restores the original vector z. Assume that we have the following linear regression model: y = Xβ + ε where y and ε are N, β is k,and X is N k. For k = and if the first variable is a constant we have that : y i = β 0 + x i β + ε i for i =,,..., N. Now β = = = β0 = X β / X X / y = x y x x xy x x x y x x xy y x x xy x x xy x y x x. 3

5 Notice however that [ ] x [ x x = x = x x ] [ = x x ] [ = x xx + xx + x x ] [ = x x + x x x] = x x, and xy x y = [ x x y y ]. Hence β0 β = = y x x xy x x x xy y x x y [ x xy y] x x x xy y x x = x = y x x + x x{ [ x xy y]+ x y} y β x x xy y x x x x x xy y x x. Furthermore, V ar β β0 = V ar β = = σ X / X = σ σ x x x x x x x x. Hence and V ar β0 V ar β x = σ x x [ ] x x = + x σ x x = σ + x x x, σ = x x. Notice that to estimate these quantities we need an estimator of σ. We can employ its unbiased estimator, i.e. σ = e t where e t = y t β 0 β x t. 4

6 Part I Linear Dynamic Stationary Processes Autoregressive Models hese classes of models are of the form: yt = µ + α yt + α yt α p yt p + u t i.e. the current yt depends, directly, on p lags. Examples for various values of p are the AR, yt = µ + α yt + u t the AR3, yt = µ + α yt + α yt + α 3 yt 3 + u t etc.. Autoregressive of Order his class of models is given by yt = µ + αyt + u t where yt is the observed process and u t are iid 0, σ. Properties Notice that y t = µ + αy t + u t = µ + αµ + α y t + u t + αu t =... = µ + α α t + α t y 0 + u t + αu t α t u = µ αt α + αt y 0 + u t + αu t α t u. 5

7 Hence E y t = µ αt α + αt E y 0. It is clear that for stationarity we need α < so as t which is independent of t. o find the variance of y t and that E y t µ α notice that V ar y t = E { [y t E y t ] } = E y t µ α = µ µ α + αy t + u t = α { [ yt µ ] } α yt µ + u t. α Hence, if we substitute y t = yt µ, it follows that α { [ y t = αy t + u t and E yt µ ] } = E yt. α Now squaring and taking expectations we get E yt = α E yt + E u t + αe yt u t E yt = α E yt + σ as E u t = σ, by assumption, and E y t u t = 0. expectation is equal to zero, substitute backwards y t to get o see that the last E y t u t = E [αy t + u t u t ] = E [ ] α y t 3 + αu t + u t ut =... = = E [ u t + αu t + α u t ] u t = E u t u t + αe u t u t + α E u t 3 u t +... and all expectations are zero as the u t s are independent as before we need α < so α k y t k 0 as k. 6

8 Hence E yt = α E yt + σ = α [ α E yt ] + σ + σ = = α 4 E yt + α σ + σ =... = α k E yt k + α k α + σ = + α α k + α k +... σ = σ α as for k, α k 0, and the sum in the parenthesis is an infinite sum of a declining geometric series for α <. o find the autocorrelation function notice that ρ k = γ k = Cov y t, yt k = E { [yt E yt ], [ yt k E ]} yt k = γ 0 V ar yt E yt = E {[ ] [ ]} yt µ α, y t k µ α = E y t, y t k = E y t, y t k. E yt E yt σ α o find the numerator multiply by y t k and take expectations to get E y t y t k = αe y t y t k + E u t y t k = αe y t y t k γ k = αγ k. his is a difference equation with stable solution if α < which is assumed due to stationarity. Hence in general. ρ = γ γ 0 = αγ 0 γ 0 = α and ρ k = α k Notice that for the existence of E y t, V ar y t, and γ k the only requirement is that α <. Furthermore, for this condition the three quantities are independent of t. Hence the only condition for second order stationarity is α <. he partial autocorrelation of order k, ρ k, is the autocorrelation of y t with y t k after we have taken in to account the autocorrelations of y t with y t, y t,.., y t k+, i.e. is the theoretical coeffi cient of y t k in a regression of y t a constant, yt, yt,.., yt k+, y t k, i.e. of the regression y t = c + β y t + β y t β k y t k+ + ρ ky t k + ε t. 7 on

9 Of course for k = we have that ρ = ρ = α. In the case of the AR model is obvious that equation in can be written as so that ρ = 0, Also y t = µ + αy t + 0y t + u t y t = µ + αy t + 0y t + 0y t 3 + u t so that ρ 3 = 0. Hence, we can write that, for any k, ρ k = 0 as y t = µ + αy t + 0y t + 0y t y t k+ + 0y t k + u t. Figure.: ACF of the AR model y t = y t + u t, and u t are iid N 0,.4. Notice that any ARMAp, q model has a unique pair of autocorrelation and partial autocorrelation functions. In other words the two functions define uniquely the order of an ARMAp, q model. his procedure is what Box and Jenkins call Identification Procedure of an ARM Ap, q model. For the AR model ρ k decreases with a power law and ρ k is non-zero for k = and drops to zero for k. 8

10 he last property of ARM A models is invertibility, i.e. the property of these models so that they can be written as pure AR or MA models. Specifically for the AR model we can investigate under what conditions is the model invertible. Let us define he lag operator L as follows: L k x t = x t k for any time series x t, i.e. the lag operators lags any time series observation by so many periods as its exponent. he usual properties of powers apply to the lag operators, as well, i.e. L k L m = L k+m, L k m = L km, L k L m = Lk m. With this definition we can write the model in equation as y t = µ + αy t + u t = µ + αly t + u t yt αlyt = µ + u t αl yt = µ + u t yt = µ α + αl u t. Notice that as µ is nonstochastic L is dropped. Now if α < then αl can be considered as the sum of an infinite declining geometric series with ratio αl. Consequently, yt µ = α + αl u t = µ α + + αl + α L + α 3 L u t yt = µ α + u t + αlu t + α L u t + α 3 L 3 u t +... yt = µ α + u t + αu t + α u t + α 3 u t which is an MA. Hence the invertibility condition is the same, in this case, as the stationarity one, i.e. α <. 9

11 Prediction with known parameters If we had no information on the model and we wanted to predict the value of y t+, or if we believed that the process y t is white noise, or if we wanted to unconditionally predict y t+, then we would employ the unconditional distribution of y t+, which is the same as the unconditional distribution of y t recall that we can substitute backwards to get Let us assume that u t y t = µ + α + α u t + αu t + α u t... o find this iid N 0, σ. Consequently, the distribution of y t is Normal, as it is a linear combination of independent Normal random variables, with E yt = µ α, V ar y t = σ α. Hence µ yt N α, σ. α In this set up the best prediction for y t+ is its unconditional mean, i.e. with error variance V ar yt+ ŷt+ ŷ t+ = µ α, = V ar yt+ µ = σ α α. On the other hand, if we wanted to utilise the information that the true model is an AR, then we would employ the conditional distribution of y t+, i.e. the distribution of y t+ given the previous observations and the model. Again under the Normality of u ts, the conditional distribution of y t+, conditional on the previous observation y t, is Normal with mean µ + αy t and variance σ, i.e. E yt+ y t = E [µ + αy t + u t+ yt ] = µ + αyt + E [u t+ yt ] = µ + αyt and V ar yt+ y t = [ E y t+ E ] yt+ y t = E y t+ µ αyt = E ut = σ. 0

12 and it follows that y t+ y t N µ + αy t, σ. Hence the conditional predictor of y t+ is ŷ t+ = µ + αy t with an error variance of σ the variance of u t+ V ar yt+ ŷt+ = V ar u t+ = σ. Notice that the distribution is not the same for all t as the y t s have different values for different t s with probability. Consequently, the conditional predictor is better than the unconditional one as it has smaller error variance. Prediction with estimated parameters In practice, predictions are made with estimated parameters. Let us assume, for easy of calculations that µ = 0, and that from a sample of observations we have estimated α, say by regression. hen our prediction for y + is y + = αy where µ and α are the estimated parameters. We can decompose the prediction error of the above prediction as y + y + = y + y + + ŷ t+ y +. he first term on the right hand side represents the prediction error when µ and α are known, and the second is coming from the estimation of the parameters. Substituting in the second parenthesis the values of ŷt+ and y + we get: y + y + = y + y + + α α y.

13 One can prove that MSE y + = s + y y σ + y α t t= where MSE is the Mean Square Error. Notice that MSE is bigger to V ar by y α. yt+ ŷt+ Estimation o estimate the AR model in we can employ either the simple regression or the maximum likelihood. Employing the formulae for the linear regression we get: µ = y α and α = t= y t y y t y t= y t y. Notice that the summation runs from t =. Hence in a sample of observations we employ for the estimation of α. Furthermore, the unbiased estimator of σ is σ = 3 e t where e t = yt µ αy t. t= o find the Maximum Likelihood Estimators, let us denote by Lµ, α, σ = L y, y,..., y ; µ, α, σ the likelihood function for the random variables then it can be written as Lµ, α, σ = L y, y,..., y ; µ, α, σ = = L y y,..., y, y ; µ, α, σ L y,..., y, y ; µ, α, σ i.e. the joint Likelihood equals the conditional of the last observation y on rest of the sample y,..., y, y times the marginal of the rest of the sample.

14 Repeating the above procedure we have dropping the parameters to conserve space: Lµ, α, σ = L y y,..., y, y L y y..., y, y...l y y L y. Now the conditional distribution of y t is see Prediction section: y t y t N µ + αy t, σ. Of course the same is true for the conditional, on all previous observations, distribution of y t, i.e. as y t y t y t, y t,..., y, y N µ + αy t, σ, depends only on the previous observation y t. Hence the Likelihood can be written as: Lµ, α, σ y = exp t µ αyt L y πσ. t= Now the distribution of y is different as there is no information in the sample for the previous observation. Consequently the distribution of y is the unconditional distribution of the AR model, i.e. see Prediction section µ y N α, σ. α herefore, the Likelihood function is Lµ, α, σ y = exp t µ αyt πσ t= σ σ π σ α and the log-likelihood is lµ, α, σ = ln π y ln σ t µ αy t σ t= ln π ln σ ln α y µ α. σ α 3 exp y µ α σ α,

15 he first order condition are given by: { } l y µ = t µ αyt + α y µ α + = 0, σ σ l α = t= t= y t µ αyt y t + σ α α + [ y µ µ+α + α y α α µ α and l σ = y σ + t µ αy { t + σ 4 σ + α y µ α σ 4 t= where in curly brackets are the derivatives of the log-likelihood of the first observation y. Notice that the estimator that solve the above first order conditions are different from the estimators of the regression. hese estimators are called Full Information Maximum Likelihood Estimators. However, if we drop the contribution of the first observation to the log-likelihood function, by assuming that y is constant, then solving the first order conditions we have: µ = y t= α, α = y t y yt y t= y t y, which are identical to the regression estimators. σ } hese estimators are called Conditional on the first observation Maximum Likelihood Estimators. = 0, ] = 0 3 Moving Average Models hese classes of models are of the form: y t = µ + u t θ u t θ u t... θ q u t q i.e. the current y t depends, directly, on q lags of u t. Examples for various values of q are the MA, y t = µ + u t θ u t θ u t 4

16 the MA4, y t = µ + u t θ u t θ u t θ 3 u t 3 θ 4 u t 4 etc. 3. Moving Average of Order hese models are given by y t = µ + u t θu t 3 where y t is the observed process and u t are iid 0, σ. Properties It easy to prove that, for any value of θ, E y t = E µ + u t θu t = µ + E u t θe u t = µ, and V ar y t = E [y t E y t ] = E [u t θu t ] = σ + θ as E u t u t = E u t E u t = 0 due to independence of the u t s. Furthermore, it easy to prove that the autocorrelation is given by θ for k = +θ ρ k =. 0 for k > as γ k = Cov [ yt, yt k = E y t µ yt k µ ] = E [u t θu t u t k θu t k ] = E u t u t k θe u t u t k θe u t u t k + θ E u t u t k = θe u t u t k + θ E u t u t k 5

17 for any positive k and due to independence of the u ts. Hence for k = we have that γ = θe u t + θ E u t u t 3 = θσ whereas, for k > γ k = 0. Notice that E yt, V ar yt, and γ k are independent of t for any value of θ, i.e. the MA process is nd order stationary for any value of θ. In terms of the partial correlation we know that ρ = ρ. Now to find ρ we have to find the theoretical coeffi cient of y t of the regression projection of y t on y t and y t, i.e. y t = αy t + ρ y t, where y t = yt µ for all t. Hence y t = y t y t α. ρ Now from the normal equations we have E y t y t y t y t α ρ = E y t y t y t and it follows E y t y t y t α = E y t y t y t y t y t ρ y t y t γ 0 γ γ γ 0 α ρ = γ γ 6

18 or dividing by γ 0 ρ ρ α ρ = ρ ρ. Hence ρ = ρ ρ ρ ρ ρ = ρ ρ ρ = ρ, α = ρ ρ ρ ρ ρ ρ = ρ ρ, ρ as ρ = 0. Substituting for ρ = θ +θ we get ρ = θ +θ θ +θ = θ + θ = θ θ θ With the same logic, the normal equations for ρ 3 are ρ ρ a ρ ρ ρ b = ρ, ρ ρ ρ 3 θ 6 α = θ + θ + θ. θ ρ 3 and ρ 3 = = ρ ρ ρ 0 0 ρ 0 = ρ 0 ρ ρ 0 ρ 3 θ +θ θ ρ ρ 0 ρ ρ ρ ρ ρ ρ 0 θ 3 = ρ3 ρ = [ ] = θ 3 θ + θ + θ θ θ 8. +θ 7

19 In general we have that consequently we have that ρ k ρ k ρ k = θ k θ θ k+, declines with a power law if θ <. Hence for the MA model ρ is non zero and ρ k is zero for k, whereas decreases with a power law compare with the AR model. Notice that for any value of θ the MA process is stationary. Figure 3.: ACF and Partial ACF for the MA process with θ = 0.8, σ =.4, µ =.0. In terms of invertibility, notice that equation 3 can be written as and if θ < we have that which is an AR. y t = µ + θl u t θl y t = µ θ + u t + θl + θ L + θ 3 L y t = µ θ + u t y t = µ θ θy t θ y t θ 3 y t u t 8

20 Prediction with known parameters he unconditional distribution for an MA model is Normal, as it is a linear combination of independent Normal random variables, with E yt = µ, V ar yt = σ + θ. Hence y t N µ, σ + θ. In this set up the best prediction for y t+ is its unconditional mean, i.e. ŷ t+ = µ, and the error variance is given by V ar yt+ ŷt+ = V ar yt+ µ = V ar u t+ θu t = σ + θ. For the conditional distribution we need to assume a value for u 0, say u 0 = 0. hen u = y µ + θu 0 which is known, provided that the parameters µ and θ are known. Consequently u = y µ + θu is known and so forth. Hence u t = y t µ + θu t is known. Hence for t = t + we have that y t+, conditional on the values of y t, y t, y t,..., y, y, and u 0, is distributed as Normal, because the only stochastic element is u t+ which is Normally distributed, with mean µ θu t and 9

21 variance σ, i.e. E y t+ y t, y t,..., y, u 0 = µ θut and V ar y t+ y t, y t,..., y, u 0 = E ut+ = σ. and it follows that y t+ y t, y t,..., y, u 0 N µ θu t, σ. Notice that the distribution is not the same for all t as the y t s have different values for different t s with probability. Hence the conditional predictor of y t+ is now µ θu t with an error variance of σ the variance of u t+. Consequently, the conditional predictor is better than the unconditional one as it has smaller error variance. Furthermore, notice that we need to condition not only on y t, as in the case of the AR, but on the whole series of the previous y t s, i.e. on y t, y t,..., y as well as on u 0. o understand this difference recall that the M A model, if invertible, is an AR, and consequently the last observation depends on all previous ones. Estimation o estimate the MA model in 3 we can employ either the Method of Moments or the maximum likelihood. o apply the Method of Moments we equate the theoretical mean and first order autocorrelation with their sample counterparts we could estimate µ and θ, i.e. µ = y t = y, θ solve s θ + θ = t= y t y yt y y t y. t= y t y t= 0

22 Notice that the equation in θ is a quadratic with two roots such that θ = θ and consequently if they are real one one is less than, in absolute value, the other will be greater than. o raise this indeterminacy we adopt the convention that θ is less than in absolute value, and consequently we choose the solution which has absolute value less than. his also is a condition for the invertibility of the M A process. For the Maximum Likelihood, let Lµ, α, σ = L y, y,..., y ; µ, α, σ be the likelihood function for the random variables then, as in the AR case, it can be written as Lµ, α, σ = L y y,..., y, y L y y..., y, y...l y y L y. Now the conditional distribution of y t is see Prediction section: y t y t, y t,...y, u 0 N µ θu t, σ. Hence the Likelihood can be written as: Lµ, α, σ = exp y t µ + θu t πσ σ and the log-likelihood is lµ, α, σ = ln π ln σ y t µ + θu t σ. Recall that u t = y t µ + θu t for t =,...,. Hence, the u t s depend on both µ and θ. his is important for the first order conditions which are given by: l µ = yt µ + θu t θ u t = 0, µ

23 where where u t µ = + θ u t µ for t =,,..., and u 0 µ = 0 l θ = yt µ + θu t θ u t + u t = 0 θ u t θ = u t + θ u t θ for t =,,..., and u 0 θ = 0, and l σ = σ + t= y t µ + θu t σ 4 = 0, Notice that the equations do not have an explicit solution. he dependence of the u t s on the parameters is the main reason for not being able to estimate the parameters by a simple regression. 4 Mixed Models Mixed models have both parts, i.e. an autoregressive and a moving average part. he order of the ARMA models is defined from the order of the AR and MA part, i.e. ARMA 3, means a process which depends on 3 lags of its own value and lagged errors, i.e. yt = µ + α yt + α yt + α 3 yt 3 + u t θ u t θ u t. In general an ARMA p, q is written as y t = µ + α y t + α y t α p y t p + u t θ u t θ u t... θ q u t q. Notice that the first number in the parenthesis refer to the AR part and the second to the MA part.

24 4. ARMA of Order, hese models are given by y t = µ + αy t + u t θu t 4 where y t is the observed process and u t are iid 0, σ. Properties Notice that equation 4 is written as: and provided that α < we have that y t = µ α + θl αl u t y t = µ α + θl + αl + α L + α 3 L u t. Multiplying the Lag polynomials and collecting terms we get: y t = µ α + [ + α θ L + α θ αl + α θ α L ] u t and multiplying through we get y t = µ α + u t + α θ u t + α θ αu t + α θ α u t which is an MA process. Hence as E u t = 0 for all t s. Furthermore, E y t = µ α, V ar y t = σ [ + α θ + α θ α + α θ α ] = σ + σ α θ [ + α + α ] 3

25 and provided that α < we have that V ar yt = σ + σ α θ αθ + θ = σ. α α o find the autocorrelation function, notice that in deviation, from the mean µ, form the model is written: α y t = αy t + u t θu t. Multiply both sides by y t k and taking expectations we get: E y t y t k = αe y t y t k + E u t y t k θe u t y t k. Clearly, E u t y t k = 0 for any k, as y t k depends on u t k and the previous u ts, see equation 5. Now for k = we have where E u t y t Hence E y t y t = αv ar y t θe u t y t E u t y t = αe u t y t + E u t θe ut u t E u t y t = 0 + σ 0 = σ. γ = αγ 0 θσ = ασ + α σ α θ θσ α θ αθ = σ α α and For k, we have that ρ = α θ αθ αθ + θ E y t y t k = αe y t y t k+ γ k = αγ k and consequently, ρ k = αρ k. 4

26 Hence ρ k = α θ αθ αθ+θ for k = α k ρ for k i.e. for k the autocorrelation function of an ARMA, is the same as the equivalent of an AR model, i.e. declines as a power law for α <. Hence we can conclude that the ARMA, model is stationary if and only if α <. In terms of the partial correlation we know that ρ = ρ. Now to find ρ we have to find the theoretical coeffi cient of y t of the regression projection of y t on y t and y t, i.e. where y t = y t µ α y t = αy t + ρ y t, for all t. Hence y t = y t y t α. ρ Now from the normal equations we have E y t y t y t y t α ρ = E y t, y t and it follows E y t y t y t α = E y t y t y t y t y t ρ y t y t γ 0 γ γ γ 0 α ρ = γ γ y t or dividing by γ 0 ρ ρ α ρ 5 = ρ ρ.

Hence ρ = ρ ρ ρ ρ ρ α ρ = ρ, ρ as ρ = αρ. Substituting for ρ = α θ αθ αθ+θ we get ρ = α θ αθ θ [ θ α θ] θ It is easy but tedious to prove that ρ k will decline with a power law, depending on α θ.

27 Hence ρ = ρ ρ ρ ρ ρ α ρ = ρ, ρ as ρ = αρ. Substituting for ρ = α θ αθ αθ+θ we get ρ = α θ αθ θ [ θ α θ] θ It is easy but tedious to prove that ρ k will decline with a power law, depending on α θ. Hence we can conclude that ρ k behaves like in the MA case. Figure 4.: ACF and Partial ACF for the ARMA, process with α = 0.8, θ = 0.8, σ =.4, µ =.0. In terms of invertibility the process is invertible if α <, so that it can be written as an MA, and θ < so that it can be written as an AR. 6

28 Prediction with known parameters he unconditional distribution for an ARMA, model is Normal, as it is a linear combination of independent Normal random variables, with Hence E y t = µ α, V ar y t = y t N αθ + θ α. µ αθ + θ,. α α In this set up the best prediction for y t+ is its unconditional mean, i.e. and the error variance is given by V ar yt+ ŷt+ ŷ t+ = µ α, = V ar yt+ µ α = V ar u t + α θ u t + α θ αu t + α θ α u t = αθ + θ α. For the conditional distribution we need to assume a value for u, say u = 0, and the value of y is constant in repeating sampling. hen u = y µ αy + θu which is known, provided that the parameters µ, α,and θ are known. Consequently u 3 = y3 µ αy + θu is known and so forth. Hence u t = yt µ αy t + θu t is known. Hence for t = t + we have that yt+, conditional on the values of yt, yt, yt,..., y, y, and u, is distributed as Normal, because the only 7

29 stochastic element is u t+ which is Normally distributed, with mean µ+αy t θu t and variance σ, i.e. and it follows that E y t+ y t, y t,..., y, u = µ + αyt θu t and V ar y t+ y t, y t,..., y, u = E ut+ = σ. y t+ y t, y t,..., y, u 0 N µ + αy t θu t, σ. Notice that the distribution is not the same for all t as the y t s have different values for different t s with probability. Hence the conditional predictor of y t+ is now µ + αy t θu t with an error variance of σ the variance of u t+. Consequently, the conditional predictor is better than the unconditional one as it has smaller error variance. Furthermore, notice that we need to condition not only on y t, as in the case of the AR, but on the whole series of the previous y t s, i.e. on y t, y t,..., y as well as on u, as in the case of the MA model. o understand this difference recall that the ARM A, model, if invertible, is an AR, and consequently the last observation depends on all previous ones. Estimation o estimate the ARMA, model in we can employ the maximum likelihood. For the Maximum Likelihood, let Lµ, α, σ = L y, y,..., y ; µ, α, σ be the likelihood function for the random variables. Now assuming that y is constant, and consequently it does not contribute to the likelihood, and that u = 0 we can write: Lµ, α, σ y, u = L y y,.., y, y, u L y y,.., y, y, u..l y y, u. 8

30 Now the conditional distribution of y t is see Prediction section: y t y t, y t,...y, u 0 N µ + αy t θu t, σ. Hence the Likelihood can be written as: Lµ, α, σ = exp y t µ αy t + θu t πσ σ and the log-likelihood is t= lµ, α, σ = ln π ln σ t= y t µ αy t + θu t σ. Recall that u t = y t µ αy t + θu t for t =,...,. Hence, the u t s depend on both µ, α,and θ. his is important for the first order conditions which are given by: l µ = yt µ + θu t θ u t = 0, µ t= where where where u t µ = + θ u t µ l α = yt µ αy t + θu t t= u t α = y t + θ u t α l θ = t= u for t =,..., and µ = 0. y t θ u t α u for t =,..., and α = 0. yt µ αy t + θu t u t + θ u t θ u t θ = u t + θ u t θ for t =,..., and 9 u θ = 0.

31 Finally, l σ = σ + t= y t µ αy t + θu t σ 4. Notice that the equations do not have an explicit solution. Again, the dependence of the u t s on the parameters is the main reason for not being able to estimate the parameters by a simple regression. 5 esting for Autocorrelation We shall mainly employ the Portmantau Q test. 5. Autocorrelation Coefficients Given a stationary second order time series y t we have already defined the k th order autocovariance and autocorrelation coeficients, γ k and ρ k, as: γ k = Cov y t, y t k, ρ k = γ k γ 0. For a given sample {y t } autocovariance and autocorrelation coeffi cients can be estimated in the natural way by replacing population moments with the sample counterparts: γ k = ρ k = γ k, γ 0 t=k+ r t _ r r t k _ r for 0 k < and where _ r = Depending on the assumed process for y t we can derive the distributions of γ k and ρ k. If y t is white noise, has variance σ, its distribution is symmetric and r t 30

32 the sixth moment is proportional to σ 6, then: ρk E ρk Cov, ρ l = k + and k + if k = l 0 = otherwise. Notice that although, under the white noise assumption, the true ρ k = 0 for all k s, the sample autocorrelations are negatively biased. his negative bias comes from the fact that by estimating the sample mean and subtracting it from the data results in deviations that sum to zero. Hence on average positive deviations are followed by negative ones and vice versa and consequently result in negative sum of the product. In large samples, under the assumption that the true ρ k = 0, we have that A ρk N0,. 5. Portmanteau Statistics Since the white noise assumption implies that all autocorrelations are zero we can use the Box-Pierce Q statistic. Under the null H 0 : ρ = ρ =...ρ m = 0 it easy to see that: Q m = m k= ρk A χ m. he Ljung-Box small sample correction is: ρk m Q m = + k k= A χ m. Notice that for unnecessarily big m the tests has low power, whereas for too small m it does not pick up higher possible correlation. his is one of the reasons that most econometric packages print the Q m for various values of m see Figures., 3., and 4.. 3

33 Remark R. When the residuals of an ARMAp,q are employed for the Q m test then we have that, under the null of no extra autocorrelations, Q m A χ m p q. 3

34 Part II Conditional Heteroskedasticity 6 Conditional Heteroskedastic Models hese are models which have time varying conditional variance, i.e. the conditional variance is a function of all available information at time t. 6. ARCH he Autoregressive Conditional Heteroskedasticity models of order are given by Engle 98 y t = µ + u t where u t I t N 0, σ t and σ t = c + αu t 6 where yt is the observed process and I t = {y t, y t,...}. Notice that given I t σ t the conditional variance is known, i.e. non-stochastic. Of course unconditionally, i.e. when the conditioning set is the empty set, σ t is stochastic. Properties o capitalise on the properties of the ARMA models, notice that the conditional variance can be written as: u t = c + αu t + u t σ t u t = c + αu t + v t 7 where v t = u t σ t this parameterisation was suggested by Pantula. Now v t has the following properties: E v t = E [ ] [ ] u t σ t = E E u t σ t It = E E u t I t σ t = E [ ] σ t σ t = 0 33

35 where the second equality follows from the Law of Iterated Expectations, and the forth from the definition of σ t. Furthermore, E v t v t k = E [ [ u t σ t u t k σt k] = E E u t σ t u t k σ t k = E [ E ] u t I t u t k σt k σ t u t k σ t k = E [ ] σ t u t k σ t k σ t u t k σ t k = 0. Hence v t is a martingale sequence, i.e. a zero mean and uncorrelated stochastic sequence. Additionally, E u t v t = E [ u t u t σ t ] = E [ u t E u t I t u t σ t = E [ u t σ t u t σ t ] = 0. Consequently, equation 7 describes an AR model. his in turn means that the process u t is st order stationary iff α <, consequently y t is nd order stationary under the same condition. and Furthermore, E y t = E µ + u t = µ + E u t = µ + E [E u t I t ] = µ + E [0] = µ, V ar y t = V ar µ + u t = V ar u t = E u t = c α. he autocorrelation function of the process u t say ρ u k is given by the autocorrelation function of the AR model, i.e. Notice that as σ t corr u t, u t k = ρ u k = α k. is a conditional variance then it must be positive with probability. his is achieved by imposing the so-called positivity constraints, i.e. σ t is positive with probability P [σ t > 0] = if and only if c > 0 and α ] It ]

36 Estimation o estimate the parameters of the model in 6, we employ the maximum likelihood. From 6 we get that y t I t N µ, σ t and σ t = c + αu t. Assuming that u 0 = 0 we have that the likelihood is given by Lµ, c, α u 0 = L y t I t, u 0, and the log-likelihood is: lµ, c, α u 0 = ln π ln σ t y t µ. he first order conditions are: lµ, c, α u 0 y t µ σ t αu t y t µ = αu t + µ lµ, c, α u 0 α σ t lµ, c, α u 0 c = = u t + σ t σ t + σ 4 t y t µ. σ 4 t σ t where u 0 = 0. y t µ u t where u 0 = 0. σ 4 t It is obvious that the conditions do not have explicit solutions. 6. GARCH, he Generalised ARCH models of order, are given by y t = µ+u t where u t I t N 0, σ t and σ t = c+αu t +βσ t 8 where y t is the observed process and I t = {y t, y t,...}. Notice that given I t σ t is known, i.e. non-stochastic. Of course unconditionally, i.e. when the conditioning set is the empty set, σ t is stochastic. Notice that we can also write y t = µ+u t where u t σ t = z t iid N 0, and σ t = c+αu t +βσ t. 35

37 Properties Again, to capitalise on the properties of the ARMA models, we employ the Pantula parameterisation, i.e. u t = c+α + β u t +β σ t u t +u t σ t u t = c+α + β u t +v t βv t where v t = u t σ t, which is a martingale sequence as: E v t = E [ E u t σ t It ] = E [ E u t I t σ t ] = E [ σ t σ t ] = 0, and E v t v t k = E [ [ u t σ t u t k σt k] = E E u t σ t u t k σ t k = E [ E ] u t I t u t k σt k σ t u t k σ t k = E [ ] σ t u t k σ t k σ t u t k σ t k = 0. and Furthermore, E y t = E µ + u t = µ + E u t = µ + E [E u t I t ] = µ + E [0] = µ, V ar y t = V ar µ + u t = V ar u t = E u t given that α + β <, for stationarity reasons. = c α β, he autocorrelation function of the process u t is given by the autocorrelation function of the ARMA, model, i.e. corr u t, u t k Notice that as σ t u = ρ k = α[ α+ββ] αβ β for k = α + β k ρ for k. 9 is a conditional variance then it must be positive with probability. his is achieved by imposing the so-called positivity constraints, i.e. σ t is positive with probability if and only if c > 0, β 0 and α It ]

38 Estimation o estimate the parameters of the model in 8, we employ the maximum likelihood. From 8 we get that y t I t N µ, σ t and σ t = c + αu t + βσ t. Assuming that u 0 = 0 and σ 0 = that the likelihood is given by c, the unconditional variance, we have α β Lµ, c, α, β u 0, σ 0 = L y t I t, u 0, σ0, and the log-likelihood is: lµ, c, α, β u 0, σ 0 = ln π he first order conditions are: ln σ t y t µ. σ t lµ, c, α, β u 0, σ 0 µ = σ t σ t µ + y t µ σ t + σ t µ y t µ σ 4 t where σ t µ = αu t + β σ t µ, u 0 = 0, σ 0 µ = 0. lµ, c, α, β u 0, σ 0 c where = σ t σ t c + y t µ σ 4 t σ t c σ t µ = + β σ t µ, σ 0 c = α β lµ, c, α, β u 0, σ 0 α where = σ t σ t α + y t µ σ 4 t σ t α = u t + β σ t α, u 0 = 0, σ t α σ 0 α = c α β. 37

39 lµ, c, α, β u 0, σ 0 β where σ t = σ t σ t β + = σ t + β σ t β, σ 0 = β σ 0 α = c α β. y t µ σ 4 t It is obvious that the conditions do not have explicit solutions. σ t β c α β, u 0 = 0, 6.3 GQARCH, he Generalised Quadratic ARCH models of order, Sentana. 995 are given by y t = µ+u t where u t I t N 0, σ t and σ t = c+α u t γ +βσ t 0 where yt is the observed process and I t = {y t, y t,...}. Notice that given I t σ t is known, i.e. non-stochastic. Of course unconditionally, i.e. when the conditioning set is the empty set, σ t is stochastic. Notice that the conditional variance equation can be reparameterized as: σ t = ω + αu t δu t + βσ t where ω = c + γ and δ = αγ. o explore the properties of the model the last parameterisation is very useful. Furthermore, recall that the equation in 0 can be also written as: y t = µ+u t where u t σ t = z t iid N 0, and σ t = c+α u t γ +βσ t Properties Again, to capitalise on the properties of the ARMA models, we employ the Pantula parameterisation, i.e. u t = ω + α + β u t δu t + v t βv t 38

40 where v t = u t σ t, which is a martingale sequence as: E v t = E [ E ] [ ] [ ] u t σ t It = E E u t I t σ t = E σ t σ t = 0, and E v t v t k = E [ [ ] u t σ t u t k σt k] = E E u t σ t u t k σ t k It = E [ E ] u t I t u t k σt k σ t u t k σ t k = E [ ] σ t u t k σ t k σ t u t k σ t k = 0. Furthermore, E y t = E µ + u t = µ + E u t = µ + E [E u t I t ] = µ + E [0] = µ, and V ar y t = V ar µ + u t = V ar u t = E u t = E z t σ t = E zt E σ t = E σ ω t = α β, given that α + β <, for stationarity reasons. It is not very diffi cult to prove that the autocorrelation function of the process u t is given by [ ] Covu t, u t k = Covσ t, σ t k = α+β k ω α + 4α δ ω α β 3α β αβ α β, provided that 3α + β + αβ <, a stronger condition than α + β <, and Covu t, u t k = Eu t u t k = Eσ t u t k = α + β k δαω α β Notice that as σ t is a conditional variance then it must be positive with probability. his is achieved by imposing the so-called positivity constraints, i.e. σ t is positive with probability if and only if c > 0, β 0 and α 0. 39

41 Estimation o estimate the parameters of the model in 0, we employ the maximum likelihood. From 0 we get that y t I t N µ, σ t and σ t = ω + αu t δu t + βσ t. Assuming that u 0 = 0 and σ 0 = that the likelihood is given by c, the unconditional variance, we have α β Lµ, ω, α, β, δ u 0, σ 0 = L y t I t, u 0, σ0, and the log-likelihood is: lµ, ω, α, β, δ u 0, σ 0 = ln π ln σ t y t µ. σ t he first order conditions are similar to GARCH, process they are recursive and do not have explicit solutions. 6.4 EGARCH, he Exponential GARCH models of order, Nelson 99 are given by y t = µ + u t where u t I t N 0, σ t and ln σ t = c + αz t + γ z t E z t + β ln σ t where z t = u t σ t iid N 0, where y t is the observed process and I t = {y t, y t,...}. Notice that given I t σ t is known, i.e. non-stochastic. Of course unconditionally, i.e. when the conditioning set is the empty set, σ t assumed normality E z t = π for all t s. is stochastic. Furthermore under the 40

42 Compared to GARCH, models, the EGARCH, models do not require any positivity constraints on the parameters so that the conditional variance is positive. Furthermore, the nd order stationarity of EGARCH, is satisfied if β < whereas the nd order stationarity of GARCH, requires α + β < see equation 0, i.e. the sum of two coeffi cients to be less than. his constrain is diffi cult to impose. Properties he process y t is nd order Stationarity if and only if β <. he unconditional mean and variance of yt is: E y t = E µ + u t = µ + E u t = µ + E [E u t I t ] = µ + E [0] = µ, and V ar y t = V ar µ + u t = V ar u t = E u t = E z t σ t = E z t E σ t = E σ t = exp c γ [ π Φ β i γ β i γ β i δ exp + exp Φ β i δ ], β i=0 where γ = γ + α, δ = γ α and Φ k is the value of the cumulative standard Normal evaluated at k, i.e. Φ k = k π exp x dx. he autocovariance function of the process u t is rather complicated and given by Cov[u t, u t k] = Cov[σ t, σ t k] = exp ϖ k k i=0 [ exp β i γ c γ π β Φβ i γ + exp E σ t ] β i δ Φβ i δ, 4

43 where γ and δ as before and exp ϖ k = + exp i=0 [+β k β i γ ] [+β k β i δ] Φ [ + β k β i γ ] Φ [ + β k β i δ ] he dynamic asymmetry leverage is given by Cov[u t, u t k ] = E[u t u t k ] = E[σ t u t k ] = β k exp 3 c γ π ϖ k β [ k β i γ β i exp Φβ i γ δ ] + exp Φβ i δ i=0 [ γ Φβ k γ β k γ β k exp δφβ k δ ] δ exp. where ϖ k = exp [ i=0 + exp +βk β i γ ] [ +βk β i δ] Φ [ + βk β i γ ] Φ [ + βk β i δ ]. Estimation o estimate the parameters of the model in, we employ the maximum likelihood. Assuming that z 0 = 0 and ln σ 0 = we have that the likelihood is given by c, the unconditional variance, β Lµ, c, α, β, γ z 0, σ 0 = L y t I t, z 0, ln σ0, and the log-likelihood is: lµ, c, α, β, γ z 0, σ 0 = ln π ln σ t y t µ. σ t he first order conditions are again recursive and consequently do not have explicit solutions. 4

44 6.5 Stochastic Volatility of Order Let us consider the following SV model: y t = µ + u t where u t = z t σ t, lnσ t = α 0 + β lnσ t + η t and z t η t iid N 0 0, ρσ η ρσ η σ η. Notice that given the information set J t = { y t, y t,...η t, η t,... }. the conditional variance σ t is known, i.e. non-stochastic. However, our information set is I t = {y t, y t,...} J t, as the only observed process is the y t, and consequently σ t is stochastic this is the origin of the name of this model. his is the main difference between this model and various heteroskedastic models where the conditional variance is non-stochastic, given I t. Properties Under the normality assumption and for β < the process {y t is covariance nd order and strictly stationary, and we can also invert the AR representation of the conditional variance and write the MA representation as lnh t = α 0 + ψ β i η t i where ψ i = β i. he unconditional mean and variance of y t is: i=0 E y t = E µ + u t = µ + E u t = µ + E [E u t I t ] = µ + E [0] = µ, and V ar y t = V ar µ + u t = V ar u t = E u t = E z t σ t α0 = exp β + σ η β. = E z t E σ t = E σ t 43

45 he autocorrelation function of the process u t compared to EGARCH,, and given by + β k ρ σ η exp β k σ η ρu t, u β t k = σ 3 exp η β and Covσ t, σ t k = exp α [ 0 β + σ η β exp is less complicated, as β k σ η β he dynamic asymmetry leverage is given by E σ t u t k = ρση β k 3α 0 exp β + β k + 5 σ 4 η β. ]. Notice that for ρ = 0, as in Harvey et. al. 994, the correlation of the squared errors is the same as in Shephard 996 and aylor 984. Furthermore, in this case we have that E h t ε t k = 0, i.e. the leverage effect is 0. Moreover, it is easy to prove that ρu t, u t k 3 ρσ t, σ t k < 3. Estimation he main diffi culty of employing the SV model is that it is not obvious how to evaluate the likelihood, i.e. the distribution of y t given I t = {y t, y t,...}. One solution is to employ the Generalised Method of Moments. Another possibility is to apply the Kalman Filter and maximise the likelihood of one-step prediction error, as if this distribution was normal quasi-likelihood. Other possibilities include MCMC and Bayesian methods. 7 Conditional Heteroskedastic in Mean Models hese are models which have the time varying conditional variance in the mean specification as well, i.e. r t = δh t + ε t, ε t = z t ht 3 44

46 where h λ t = ω + αλf ν z t h λ t + βh λ t + φ η λη t, 4 and z t η t fz t = z t b γz t b, 5 iid N 0, ρσ η 0 ρσ η σ η. 6 A variety of conditionally heteroskedastic in mean models are nested in the above parametrization. However, here we consider only three possible values of the parameter λ and two possible values of ν, i.e. λ {0, /, }and ν {, }. Among others, the following models are nested in our parameterization: Standard GARCH GARCH,; Bollerslev 986 and GARCH,-M; Engle et. al. 987 λ =, ν =, φ η = b = γ = 0. Nonlinear Asymmetric GARCH NLGARCH,; Engle and Ng 993 λ =, ν =, φ η = γ = 0. Glosten-Jagannathan-Runkle GARCH GJR-GARCH, and GJR-GARCH,- M; Glosten et al. 993 λ =, ν =, φ η = b = 0. hreshold GARCH GARCH,; Zakoian 994 λ =, ν =, φ η = b = 0. Absolute Value GARCH AVGARCH,; aylor 986 and Schwert 989 λ =, ν =, φ η = b = γ = 0. Exponential GARCH EGARCH, and EGARCH,-M; Nelson 99 λ = 0, ν =, φ η = 0, b = 0. Stochastic Volatility SV; Harvey and Shephard 996 λ = 0, α = 0, φ η =. 45

47 Furthermore, we consider the Quadratic GARCH GQARCH, and GQARCH,- M; Sentana 995 which does not fall within the modelling of equations 4 and 5. In principle, we can impose restrictions of the parameters of the model described in equations 3-6, so that the stochastic process {r t } is second order and strictly stationary. We further assume that the process {h t } started from some finite value in the distant past and the conditional variance is positive with probability one. Now we have that under the appropriate conditions of second order and strict stationarity we have that γ k = Covr t, r t k is a quadratic function in δ given by Arvanitis and Demos 004 and 004b: γ k = Covr t, r t k = E r t r t k Er t Er t k = E [δh t + ε t δh t k + ε t k ] Eδh t + ε t Eδh t k + ε t k = E δh t δh t k + ε t δh t k + δh t ε t k + ε t ε t k δ Eh t Eh t k = δ [E h t h t k Eh t Eh t k ] + δe h t ε t k = δ Covh t, h t k + δe h t ε t k It is important to mention that the assumption of normality of the errors is by no means a necessary condition for the above formula. It can be weakened and substituted for higher moment conditions. From formula above it is immediately obvious that if E h t ε t k is zero, as in the GARCH-M model of Engle et al. 987, then the k-order autocovariance of the series has the sign of the autocovariance of the conditional variance irrespective of the value of δ; this could explain the poor empirical performance of GARCH-M models see Fiorentini and Sentana 998. Ideally, one would like to employ a model which can be compatible with either negative or positive mean autocorrelations, and potentially different from of the sign of the autocorrelation of the conditional variance. As an example consider the volatility clustering observed in financial data. his implies 46

48 that the autocorrelation of the returns conditional variance is positive. However, short horizon returns are positively autocorrelated, whereas long horizon ones are negatively autocorrelated see Poterba and Summers 988. It is a well documented fact that in all applied work with financial data, the estimated values of β are positive, indicating the observed volatility clustering, and the estimated values of ρ are negative due to the asymmetry effect see Harvey and Shephard 996. hese two facts imply that E h t ε t k is negative and Covh t, h t k is positive, for any k. Consequently, γ k is negative for δ 0, E h t ε t k /Covh t, h t k and is positive for δ E h t ε t k /Covh t, h t k,, i.e. there are positive values of δ, the price of risk, that can incorporate both negative and positive autocorrelations of the observed series. Furthermore, notice that for positive β the autocorrelation of the squared errors, of any order, is positive for any ρ. It turns out that not all first-order dynamic heteroskedasticity in mean models are compatible with negative autocorrelations of observed series. In fact, the GARCH,-M, AVGARCH,-M and SV-M with uncorrelated mean and variance errors models are not compatible with data sets that exhibit negative autocorrelations. his statement rests on the assumption of symmetric distributions of the errors, which are mostly employed in applied work. On the other hand, all the other models considered above are compatible with either positive or negative autocorrelations of the observed data. In fact, under the assumption of symmetric distribution of the errors, models which incorporate the leverage effect can also accommodate the observed series autocorrelation of either sign. 47

49 7. he Log Conditional Variance Models Let us consider the model in equations 3-6 and reparameterize equation 4 so that now it reads h λ t λ = ω +αf ν z t h λ t +β hλ t λ +φ η η t, where ω = ω β. λ aking the limit now as λ 0 we have a family of first-order processes where the natural log of the conditional variance is modeled in the variance equation Demos 00, i.e. Now for α = 0 and φ η ln h t = ω + αf ν z t + β ln h t + φ η η t = we have the first-order Stochastic Volatility of Harvey and Shephard 996, whereas for ν = and φ η = 0 we have the first-order Exponential GARCH of Nelson 99. generalization of the above equation, i.e. ln h t = α 0 + α θ i fz t i + φ η i=0 In fact we can consider a i=0 ψ i η t i 7 where θ 0 = ψ 0 =. Under the normality assumption, {δ t }, {ln h t } and {h t } are covariance and strictly stationary if ϕ <, and are finite. In θ i i=0 ψ i i=0 such a case, the observed process {r t } is covariance and strictly stationary as well. Furthermore, an advantage of using λ = 0, over λ = / or λ =, is that in this case we do not need to constrain further the parameter space to get the positivity of the conditional variance. Let us now state the following Lemmas which will be needed in the sequel see Demos 00 for a proof. Lemma 7. For the following Gaussian AR process y t = α 0 + βy t + η t, where β <, η t i.i.d.n0, σ η and τ any finite number we have that Cov expτy t, expτy t k = exp τµ y + τ σy [ ] τ β k σ η exp β 48

50 where µ y = α 0 β and σ y = σ η. β A special case of the above Lemma for τ = can be found in Granger and Newbold 976. Lemma 7. If z N 0, ρσ η η 0 ρσ η σ η real numbers we have that then for any κ, τ finite E {exp [τfz] expκη} = ΦA b exp Γ + exp Φb B E {z exp [τfz] expκη} = AΦA b exp Γ + BΦb B exp E { z exp [τfz] expκη } = τ κ σ η b κσ η ρ exp π + + A ΦA b exp Γ + + B Φb B exp where = [κσ η τ + γ] Γ = [κσ η + τ γ] + τ + γ[b + κσ η ρ], τ γ[b + κσ η ρ], B = κρσ η τ + γ and A = κρσ η + τ γ. heorem 7.3 For the models described in 3, 5, 6 and 7 above we have that: E h t ε t k = 3α0 exp ϖ / k k i=0 [ A 0 0,k ΦA0 0,k b exp Γ 0 0,k k E h t h t k = expα 0 ϖ k ΦA 0 0,i b exp i=0 [ ] exp Γ i ΦA 0 0,i b + exp i Φb B i 49 Γ 0 0,i ] + B 0 0,k Φb B0 0,k exp 0 0,k, +exp 0 0,i Φb B 0 0,i,

51 [ ] Eε t ε t k = Eh t h t k D ΦA 0 0,k b exp Γ 0 0,k + exp 0 0,k Φb B 0 0,k, and where ϖ j k = i=0 [ exp Γ j k,i E h t = expα 0 ϖ 0 0 ΦA j k,i b + exp j k,i ] Φb B j k,i, A j k,i = φ ηjψ i + ψ i+k ρσ η + αjθ i + θ i+k γ, B j k,i = φ ηjψ i + ψ i+k ρσ η αjθ i + θ i+k + γ, j k,i = [φ ηjψ i + ψ i+k σ η αjθ i + θ i+k + γ] +αjθ i + θ i+k + γ[b + φ η jψ i + ψ i+k σ η ρ], Γ j k,i = [φ ηjψ i + ψ i+k σ η + αjθ i + θ i+k γ] αjθ i + θ i+k γ[φ η jψ i + ψ i+k σ η ρ + b], and D = αθ k φη ψ k σ η b φ η ψ k σ η ρ exp π + + A 0 0,k ΦA 0 0,k b exp Γ 0 0,k + + B 0 k Φb B 0 0,k exp 0 0,k. Proof Demos 00 Let us now turn our attention to individual models where the log of the conditional variance is modeled. 50

52 Stochastic Volatility in Mean Consider the model in equations 3, 5-7 and impose the constraints φ η =, and α = 0 to get the SV M model, i.e. where r t = δh t + ε t, ε t = z t ht 8 lnh t = α 0 + and z t, η t / are as in equations 6. ψ i η t i 9 Notice that this model nests all invertible ARMA representations of the conditional variance process, i.e. lnh t = α 0 + p β i lnh t i + q ξ i η t i with [ i= ] i=0 the assumption that the roots of p β i x i lie inside the unit circle we can i= invert the ARMA representation of the conditional variance equation to get: lnh t = α 0 + ψ i η t i where α α 0 = 0, ψ β β... β p 0 =, and ψ i, for i=0 i =,,..., are the usual coeffi cients for the MA representation of an ARMA model see e.g. Anderson 994. In terms of stationarity, and under the normality assumption, the square summability of the ψ is and ϕ < guarantee the second order and strict stationarity of the {h t } process, and consequently the i=0 covariance and strict stationarity of the {r t } process. Now we can state the following Lemma: Lemma 7.4 For the SV-M above and under the assumptions of stationarity we have that: E h t ε t k = ρσ η ψ k exp Covh t, h t k = exp 3α 0 + σ η α 0 + σ η i=0 5 k ψ i i=0 [ ψ i + σ η exp σ η ψ k+i + ψ i, i=0 ] ψ i ψ i+k i=0

53 and ρε t, ε t k = + ψ k ρ σ η exp 3 exp σ η σ η i=0 ψ i i=0 ψ i+k ψ i Assume for the moment that δ is positive, i.e. the price of risk is positive. hen in light of the above lemma and theorem 7.3 there are positive values of δ associated with either negative or positive values of the autocovariance, γ k, autocorrelation of the process {r t }. If there is no correlation between the mean and variance innovations, i.e. ρ = 0 as in Harvey et al. 994, we have that E h t ε t k = 0 and consequently γ k is positive when Covh t, h t k is independent of the value of δ. If, on the other hand, the Covh t, h t k is negative, then there are positive values of δ that are compatible with negative values of γ k, i.e. for δ 0, ϕ Covh t, h t k /ϕ k σ ue h t h t k γ k is negative and for δ ϕ Covh t, h t k /ϕ k σ ue h t h t k, γ k is positive. Since in most applications a first order SV-M model is considered with time invariant coeffi cient for the conditional variance in the mean equation, let us now turn our analysis to this model. First-Order Stochastic Volatility in Mean with Constant Coefficients Let us consider the following SV-M model: and z t η t r t = δ t h t + ε t where ε t = z t ht, lnh t = α 0 + β lnh t + η t iidn 0, ρσ η 0 ρσ η σ η. Under the normality assumption and for β < the process {r t } is covariance and strictly stationary, and we can also invert the AR representation of the con- 5

54 ditional variance and write the MA representation as lnh t = α 0 + ψ i η t i where α 0 = α 0 / β and ψ i = β i. Now we can state the following Lemma i=0 Corollary For the above SV-M we have that: E h t ε t k = ρσ η β k 3α 0 exp β + β k + 5 σ 4 η β, and Covh t, h t k = exp α [ 0 β + σ η β exp ρε t, ε t k = β k σ η β + β k ρ σ η exp β k σ η β σ 3 exp η β ] o our knowledge, in all applied work with financial data, the estimated values of β are positive, indicating the observed volatility clustering, and the estimated values of ρ are negative due to the asymmetry effect see Harvey and Shephard 996. hese two facts imply that E h t ε t k is negative and Covh t, h t k is positive, for any k. Consequently, γ k is negative for δ 0, E h t ε t k /Covh t, h t k and is positive for δ E h t ε t k /Covh t, h t k,, i.e. there are positive values of δ, the price of risk, that can incorporate both negative and positive autocorrelations of the observed series. Furthermore, notice that for positive β the autocorrelation of the squared errors, of any order, is positive for any ρ. Notice that for ρ = 0, as in Harvey et al. 994, the correlation of the squared errors is same as in Shephard 996 and aylor 984. Furthermore, in this case we have that E h t ε t k = 0 and according to heorem γ k is positive for any value of δ. Moreover, it is easy to prove that ρε t, ε t k 3 ρh t, h t k <. 3 53

55 Exponential ARCH in Mean Let us consider again the model in equations 3, 5-7. o derive the ime Varying Mean Parameter EGARCH,-M model set ν =, φ η b = 0, i.e. lnh t = α 0 + r t = δh t + ε t, ε t = z t ht θ i g z t i, g z t = α z t i=0 = 0 and + α z t, π where z t iidn0,, u t iidn0, σ u, z t and u t independent, δ t as in?? and θ 0 =. hroughout this section we assume that α 0. In case that α = 0 the above model is the same as the model in section. with ρ =. Notice that as it is the case for the SV-M model, this model nests all invertible ARMA representations of the conditional variance. Under the assumption of normality of z t s the processes {ε t } and {h t } are covariance and strictly stationary if θ i i=0 is finite see Nelson 99. Consequently, under these conditions, the process {r t } is covariance and strictly stationary as well. We can state the following Lemma: Lemma 7.5 For the above EARCH-M model we have Cov[h t, h t k ] = exp[α 0 α π k θ exp ϖ k i=0 θ i ] i=0 i α Φθ i α + exp where θ 0 =, α = α +α, α = α α, ϖ = i=0 θ i α Φθ i α exp Φθ i α ϖ θ i α + exp, θ i α Φθ i α 54

56 and ϖ k = exp [θi +θ i+kα ] Φθ i + θ i+k α [θi +θ i=0 + exp i+k α ] Φθ i + θ i+k α Eh t ε t k = θ k exp 3 k [ θ exp α 0 α i α i=0 [ α Φθ k α exp θ i ϖ k π i=0 θ Φθ i α + exp θ k α i α, ] Φθ i α θ α Φθ k α exp where ϖ k = exp [ θ i+θ i+k α ] Φ θ i + θ i+k α [. i=0 + exp i+θ i+k α ] Φ θ i + θ i+k α Also Cov[ε t, ε t k] = Cov[h t, h t k ] + θ k E h t h t k exp θ + exp α π k α Φθ k α θ exp k α +θ k α Φθ k α exp θ k α k α θ k α k α ] Φθ k α Φθ k α Clearly the sign of Eh t ε t k depends on the sign of θ k and the relative values of α and α. Let us now turn our attention to the first-order EGARCH in Mean model with time invariant δ t. First-Order EGARCH in Mean with Constant Coefficients Consider now the following EGARCH-M model: r t = δh t + ε t ε t = z t ht where z t iid N0, and 55

57 lnh t = α 0 + β lnh t + g z t where g z t = α z t Provided that β < we can invert the above equation as: π + α z t. lnh t = α 0 + β i g z t i 0 i=0 where α 0 = α 0 β Assuming normality and β < we get the second order and strict stationarity of {r t } and we can state the following Corollary: Corollary For the EGARCH-M and under the assumptions that z t i.i.d.n0, and β < we have that ϖ k k exp i=0 Cov[h t, h t k ] = exp[ α 0 β α β π ] β i α Φβ i α ϖ, β i α Φβ i α + exp where α = α +α, α = α α, ϖ = Φβ i α exp i=0 and ϖ k = exp [+βkβiα ] Φ + β k β i α, [+β i=0 + exp k β i α ] Φ + β k β i α β i α + exp Eh t ε t k = β k 3 exp α 0 β α β π ϖ k k [ β i α β i ] exp Φβ i α α + exp Φβ i α i=0 [ β k α Φβ k α α exp α Φβ k α exp where ϖ k = exp [ i=0 + exp +βk β i α ] [ +βk β i α ] Φ + βk β i α Φ + βk β i α β k ] α. β i α Φβ i α 56

58 Also Cov[ε t, ε t k] = Cov[h t, h t k ] + β k E h t h t k exp βk α Φβ k α β + exp k α Φβ k α α + π βk α Φβ k α β exp k α +β k α Φβ k α exp β k α In Figure we present the autocorrelation function of the conditional variance for an EGARCH-M process when the parameter values are as in Nelson 99. Furthermore, we graph two approximations. he first one is employing the formula ρh t, h t k = β k ρh t, h t, whereas the second is given by k ρh t, h t k = ρht,ht ρh t,h t ρht, h t, for k. Notice that the exact ACF is always between these two approximations, closer to the second. Furthermore, for high positive values of β the first approximation is quite inaccurate, being more so as k increases. It is clear that the sign of E[h t ε t k ] depends on the relative values of β, α, and α. However, notice that under the assumptions of volatility clustering, leverage and asymmetry effects, i.e. β > 0, α > 0 and α < 0, we have that α > α and α > 0. Hence α β exp k α Φβ k α α β exp k α Φβ k α < 0 as the exponential and the cumulative distri- bution functions are non-decreasing. Consequently, E[h t ε t k ] is negative for any k. Now if the random sequence {g z t } had a normal distribution, then we would be able to apply Lemma and conclude that the autocovariance of the conditional variance is positive for positive β. However, although {g z t } is an independent Since Nelson 99 estimated an EGARCH,-M, we use the largest estimated root as our β parameter. 57

59 sequence of random variables, its distribution is clearly non-normal. Nevertheless, under the assumptions of volatility clustering, leverage and asymmetry effects it is possible to prove that at least the first-order autocovariance is positive up to forth-order approximation, i.e. expanding the autocovariance with respect to β around zero. Under these facts we can say that either positive or negative autocorrelations of the observed series are compatible with positive expected value of the price of risk see heorem. 7. Comparing the Ln Models Comparing the two models we can see that both models can incorporate positive and negative autocorrelations of the observed series with positive expected value of risk price. Furthermore, both models are compatible with cyclical effects in the ACF of the squared errors, for negative β. he SV-M models have a relative advantage over the EGARCH-M models, in that the autocorrelation functions of the conditional variance and the squared errors are easier to evaluate. Any other comparison between the two models is diffi cult due to the complexity of the formulae for the EGARCH model. Nevertheless, notice that the symmetric firstorder EGARCH model, i.e. when α = 0, has the same ACFs for the conditional variance and the squared errors as a SV model with ρ = and σ η = α see Demos 00 and He et al Harvey and Shephard 996 estimated a SV model for the data in Nelson 99. he estimates for β, ρ, and σ η were , 0.66, and 0.06, respectively. In Figure we graph the ACF of the conditional variance and the squared errors for the two models and the above mentioned parameter values. Notice that although the ACF of the two conditional variances are very close this is not the case for the ACF of the squared errors. In fact, we know that the ACF of the squared errors for first-order SV-M models is bound from above by /3 see 58

60 end of section.. It seems that this bound is higher for the EGARCH-M models. However, the complexity of the formulae for these models makes the verification of this conjecture very diffi cult. Nevertheless it is possible to compare the symmetric first-order EGARCH-M, i.e. when α = 0, with the SV-M one. For this we shall need the following Proposition. Proposition 3 For the symmetric Gaussian first-order EGARCH-M model we have that and ρε t, ε t k = ρh t, h t k = exp β k α β i=0 3 exp α β exp β k α β i=0 exp α β Φ+β k β i α ki+ Φβ j α i=0 [Φ+β k β i α ] i=0 ki+ j=ki Φβ j α Φβ i α Φ β i α j=ki Φβ i α Φ β i α + exp β k α β k α Specifically, for k = and β close to we have: ρε t, ε t exp α 3 exp α α + + α. πφα π Φβ k α i=0 + β k α α 3 exp. Φβ i α β Φ β i α In light of the above Proposition, it is immediately obvious that the upper bound for the ACF of the squared errors is higher than /3. For high values of α the first-order ACF is dominated by the exp α 3 term. In fact, solving numerically the first order condition for < α <, we find that the maximum is around α Recall that the ACF for the conditional variance and the squared errors of a symmetric SV-M model are given by ρh t, h t k = exp 59 β k σ η β σ exp η β and

61 ρε t, ε t k = exp β k σ η β σ 3 exp η β, respectively. Let us assume that the estimated parameters of such a model and an EGARCH-M process are such, so that they result in ρ EGARCH h t, h t k ρ SV h t, h t k, for small k. hen, provided that α is positive, we get that ρ EGARCH ε t, ε t k > ρsv ε t, ε t k. However, any other comparison of the two models is extremely diffi cult. Nevertheless, as the applicability of symmetric conditional heteroskedasticity models is rather limited see e.g. Gallant et al. 997, and a comparison of the properties of general order asymmetric SV-M with those of EGARCH-M is extremely diffi cult, we do not pursue this topic any further. 7.3 Modelling the Standard Deviation All proofs of this section can be found in Arvanitis and Demos 004 and 004b. Let us now consider the following alternative set of constraints on the model in equations 3-5: φ η = 0, i.e. there is no stochastic error in the conditional variance equation, λ =, i.e. the standard deviation is modeled. Let us also relax the normality assumption and instead assume that z ts are iid 0, random variables. his yields the following model: r t = δ t h t + ε t, ε t = z t ht where h / t = ω + αf ν z t h / t + βh / t, fz t = z t b γz t b, 3 and z t iid0,. 4 60

62 When ν = and φ u = ϕ = b = 0 we get the GARCH,-M of Zakoian 994 whereas if γ = 0 as well we get the AVGARCH,-M of aylor 986 and Schwert 989. Under the assumption E αf ν z + β 4 < both the conditional variance process, {h t }, and the observed process, {r t }, are covariance and strictly stationary see He and erasvirta 999. Furthermore, for the positivity of the conditional variance we need β > 0, α > 0 and γ. In this setup we can state the following Lemma. Lemma 7.6 If the z ts are i.i.d. 0, random variables with E αf ν z t + β 4 < then for k > 0 we have that: E h t ε t k = B k BEh 3/ A k + ωãehbk B A and Covh t, h t k = B k V h+ωae h 3/ B k A k AB k A k B A ω Eh AB A, which is positive for any k, where Eh = ω + A A B, Eh3/ = ω 3 + AB + A + B A B Γ, V h = Eh E h, Eh 4 + B [3A + Γ + + AΓ] + 4B + AΓ t = ω, A B Γ A = E αf ν z + β, Ã = E [z αf ν z + β] = αe zf ν z, B = E αf ν z + β, B = E [ z αf ν z + β ] = α E zf ν z + αβã, Γ = E αf ν z + β 3 and = E αf ν z + β 4. Unfortunately the expressions derived are too complicated to draw any conclusions. In fact, in order to draw any conclusions in terms of the signs of various expectations we need, at least, to specify the distribution of the z t s. Hence, let us assume that they are normally distributed and moreover that b = 0. 6

63 7.4 he First-Order GARCH-M under Normality he Gaussian the GARCH,-M model see Zakoian 994 is defined as follows.. r t = δh t + ε t, where ε t = z t ht, h / t = ω + αfz t h / t + βh / t, fz t = z t γz t, and z t iidn0,. For stationarity assume that E αf ν z + β 4 < whereas for positivity of the conditional variance assume that β > 0, α > 0 and < γ <. We can now state the following Corollary. Corollary 4 For the above first-order GARCH-M model and for k > 0 we have that: and E h t ε t k = B k α γ + βeh 3/ ωαγeh Bk A k B A Covh t, h t k = B k V h + ωae h 3/ B k A k B A AB k A k ω Eh AB A where Eh = ω + A A B, Eh3/ = ω 3 + AB + A + B A B Γ, Eh 4 + B [3A + Γ + + AΓ] + 4B + AΓ = ω, A B Γ V h = Eh E h, A = α π + β, B = α + γ +αβ π +β, Γ = π α3 + 3γ +3αβ α + γ + β β 3 and = 3α 4 + 6γ + γ 4 +α β + γ 4 +4 α + 3β π π αβ3 +β 4. for proof see Arvanitis Demos π +

64 It is now apparent that the covariance, of any order, between the conditional variance and the lagged error is negative, then there are positive values of δ which are compatible with either negative or positive values of γ k. If we impose one additional restriction that the leverage parameter, γ, is zero, we get the Absolute Value GARCH,-M of aylor 986 and Schwert 989. Notice that for this model the covariance correlation of the conditional variance with the lagged errors is zero see above Corollary. Consequently, the autocovariance of the observed process can have only the sign of the autocovariance of the conditional variance which is positive. 7.5 Modelling the Conditional Variance For λ =, φ η = 0 we get the following GARCH-M model: r t = δh t + ε t, ε t = z t ht 5 where h t = ω + αf ν z t h t + βh t, 6 fz t = z t b γz t b, 7 and z t iid0,. 8 When ν =, φ u = φ = γ = 0 we get the NLGARCH,-M of Engle and Ng 993. Furthermore, if b = 0, but γ 0, we get the GJR-GARCH,-M of Glosten et al. 993 and if in addition γ = 0 we get the GARCH, of Bollerslev 986 and GARCH,-M of Engle et al Under the assumption that ϕ < and E f ν z t + β < the process {r t } is second order and strictly stationary see He and erasvirta 999, and if α > 0 and β > 0 the positivity constraint for the conditional variance is satisfied, as well. Consequently, we can state the following Lemma: 63

65 Lemma 7.7 For the first-order GARCH model in equations 5-8 we have that Eh t ε t k = αe [zf ν z] A k Eh 3/, and Covh t, h t k = ω A k B A A B where A = E f ν z t + β and B = E f ν z t + β. It is obvious that if E [zf ν z] = 0 then the sign of the autocovariance of the observed series, {r t }, is positive for any value of δ. In most models considered in applied work ν =. In such a case E [zf ν z] = Ez 3 + γ b + γ γe [ z b z bz]. Consequently, for the GARCH,-M models we get that E [zf ν z] = Ez 3, for the GJR-GARCH,- M we get that E [zf ν z] = Ez 3 +γ γez z and for the NLGARCH,- M model we get that E [zf ν z] = Ez 3 b. Consequently, for the GARCH,- M model we get that for symmetric distributions of z t s the autocovariance of the observed series can only be positive independent of the value of δ. 7.6 First-Order GQARCH in Mean Consider the following GQARCH, in Mean model: r t = δh t + ε t, ε t = z t ht where z t iid0, sequence of random variables and h t = ω + αε t b + βh t. In terms of stationarity we need α Ez 4 + β + αβ <, whereas we need both α and β to be positive for positivity of the conditional variance see Sentana

66 he Sastry Pantula parameterization see Bollerslev 986 of the conditional variance of the GARCH model applies also to GQARCH-M, i.e. ε t = ω + α + βε t αbε t + ν t βν t where ω = ω + b and ν t = ε t h t is a martingale difference sequence, as and for k > 0 Eν t = Eε t h t = E E t ε t h t = 0, Eν t ν t k = E ε t h t ε t k h t k = Eh t ε t k Eh t ε t k Eh t h t k + Eh t h t k = 0. We can state the following Lemma. Lemma 7.8 For a GQARCH, in Mean model we have: and Eh t ε t k = Eε t ε t k = α + β k [ αez 3 Eh 3/ αbe h ] Covh t, h t k = α + β k [ Eh E h ] = α + β k V h where Eh = ω +4α b +ω α+ω βeh 4α bez 3 Eh 3/ α Ez 4 β αβ, Eh = ω α β, and Eh 3 a finite number. From the above lemma it is clear that if the z t s are symmetrically distributed, i.e. Ez 3 = 0, and the asymmetry parameter b is positive then, Eh t ε t k is less than zero for any value of k. Consequently, the autocovariance of the process is negative for δ 0, αbα + β E h /V h and positive for δ αbα + β E h /V h,. Furthermore, notice that if in the GQARCH,-M we set b = 0 we get the GARCH,-M. Hence, if z t has a symmetric distribution, the autocovariance of any order of the GARCH,-M can have only non-negative values independent of the value δ see also Hong 99 and Fiorentini and Sentana

67 7.7 Conclusions It turns out that not all first-order dynamic heteroskedasticity in mean models, with time invariant conditional variance coeffi cient in the mean, δ, are compatible with negative autocorrelations of observed series. In fact, the GARCH,-M, AVGARCH,-M and SV-M with uncorrelated mean and variance errors models are not compatible with data sets that exhibit negative autocorrelations. his statement rests on the assumption of symmetric distributions of the errors, which are mostly employed in applied work. On the other hand, all the other models considered here are compatible with either positive or negative autocorrelations of the observed data. In fact, under the assumption of symmetric distribution of the errors, models which incorporate the leverage effect can also accommodate the observed series autocorrelation of either sign. Finally, the expressions for the autocovariances of the square residuals for the SV-M and EGARCH-M models make it possible to see how well the theoretical properties of the models fit the observed data, something which is important for the identification of the order of the conditional variance process along the lines of Bollerslev 984 and Nelson 99. A relative advantage of the SV-M model over the EGARCH-M one is that for the former it is simpler to evaluate the autocovariance function of the conditional variance and the squared errors. In fact, in terms of formulae simplicity the SV-M and GARCH with λ = and ν = models have an advantage over the rest. Comparing the two models, in terms of the autocorrelation function of the squared errors we have the following Proposition: Proposition 5 If the decline from the first to the second order autocorrelation in the conditional variance or squared errors is the same for a SV-M and a GQARCH,-M model then the autoregressive parameter in the log conditional variance equation of the SV model, β, is bigger or equal to the persistence 66

68 of the GQARCH model, i.e. the sum of GARCH and ARCH coeffi cients. In other words, if the β in the SV model is equal to the variance persistence parameter, as defined by the sum of the ARCH and GARCH parameters, the autocorrelation function of the conditional variance and the squared errors for the SV model decline at a rate which is at most equal and generally lower to those of the GQARCH model. he β and persistence parameters play an important part in the stationarity of the SV-M and GQARCH,-M models. Consequently, if the objective function of a maximization procedure were the matching of the ACFs of the conditional variance or the squared errors the GQARCH,-M models would produce estimated values that are within the stationary region of the parameter space with higher probability. However, in the majority of the applications of these models, a Quasi Maximum Likelihood method is employed for parameter estimation, leaving the structure in the squared errors as a diagnostic tool. Nevertheless, intuition suggests that considering an in Mean model would make these results relevant to estimation as well. his is something that we investigate now. Another interesting topic would be to investigate the autocorrelation function of the squared observed variables and the so called aylor effect. Again this is a complete project on its own that is under consideration. Of course our analysis is by no means exhaustive. Many members of the dynamic heteroskedasticity in mean family are not analyzed, e.g. models where λ and ν take any non-integer values in the range [0, ] as in Ding et al. 993, or models where a fourth-order power of the errors is added in the conditional variance equation, as in Yang and Bewley

69 Part III Non-stationarity 8 Non-stationary processes For this chapter stationary process means second order weakly stationary, i.e. the mean, variance and autocovariances of the process are independent of t. here are several reasons to pay attention on stationarity. For a Radmon Walk see below, a shock at time t will have the same effect on the process at time t, t +, t +,... he same is not true for a stationary process, e.g. for the stationary AR the effect on the process at t + j is α j, where α is the AR coeffi cient. In other words, the Impulse-Response function of a Random Walk is at all horizons. he "spurious regression" problems, i.e. when two non-stationary variables seem to be connected when they are not related to one another see below on this. 3 he forecast variance of a simple Random Walk grows linearly with the forecast horizon. 4 Heuristically, the autocorrelation function of a Random Walk is for any lag. In effect this means that the correlogram would die out very slowly. 5 Dealing with non-stationary variables in a regression framework would invalidate most of the asymptotic analysis, i.e. the usual t-ratios could not follow a t-distribution, and the F-statistic would not follow an F-distribution etc. 68

70 here are two models that are usually employed to describe non-stationarity. First, the rend Stationary S process: y t = a + bt + u t, where u t is W N, 9 and W N stands for White Noise. In fact the term a + bt is the deterministic term, as it is completely predictable. It can be any polynomial function in t, e.g. it can be of the following form: a, a + bt, a + bt + ct, etc. Second, the random walk RW with drift, i.e. x t = b + x t + z t, where z t is W N, 30 where b is called the drift. Assuming that x 0 = 0, substituting backwards we get that x t = b + x t + z t + z t =... = bt + z t + z t z justifying in this way the name drift. 8. rend Stationary Process Let us consider the S process in equation 9. Assuming normality, i.e. u t N 0, σ, all classical assumptions apply and consequently, the usual t-statistic and F-statistic have the standard Students-t and F distributions. However, as soon as we drop the assumption of normality things change. Let us rewrite equation 9 as: a y t =, t + u t = x b tγ + u t, where x t = 3 t a γ = and u t iid with E u t = 0, E u t = σ and E u 4 t <. b Let γ denote the OLS estimator of γ based on n observations, i.e. â n n γ = = x t x b t x t y t. 69

71 Substituting out y t, employing equation 3, we get n n n n γ = x t x t x t x tγ + u t = γ + x t x t x t u t and it follows that γ γ = n n x t x t x t u t n γ γ = n n x t x t n Under standard assumptions we have that n n x t u t. n x t x P t M, where M a nonsingular matrix and P denotes convergence in probability. Further, n n x t u t N 0, σ M. It follows that n γ γ D N 0, σ M. However, for the S case we have that: Now notice that n t = n n + γ γ = and it follows that n n t = O n, n n t n t n t n t n t n t = = n ut D. 3 tu t n n + n + 6 n nn+ nn+ for future reference that n [ ] n n + t 3 = = O n nn+n+ 6 = O n 3. Notice also,

72 Hence, scaling this matrix by would lead to non-convergent ellements. On the n other hand scaling by n 3 would result to a singular limiting matrix. Consequently, â and b require different rates for asymptotic convergence. In fact we have to scale â by n and b by n 3/. Hence, employing equation 3, we get â n â a n 0 n = a n = N n x t x n b 3/ b 0 n 3/ b b t x t u t n n = N n x t x t N n Nn x t u t n n = x t x tnn N x t u t, N n n 0 where N n =. Now 0 n 3/ N n n x t x tn n = For the next term, notice that N n n n+ n+ n n n+n+ 6 n x t u t = n n 3/ n n u t n tu t and as u t is iid with finite fourth moment we get that n Further we can prove that n 3/ 3 = M. n u D t N 0, σ. n tu D t N 0, σ /3 see Hamilton 994, and the linear combination of the two elements converges to normality we conclude that n n 3/ n u t n tu t D N 0, σ M, 7

73 and it follows that n â a D N n b 0, σ M 33 3/ b establishing the superconsistency of b. Further, it is an easy exercise to prove the asymptotic normality of the t statistics for â and b. Here we consider only the t statistic for the hypothesis H 0 : b = b 0 see Hamilton 994 for details. t b = b b 0 s m, s = n n where m is the, element of y t â t b. n x t x t and Hence t b = = = s s s b b0 n b 3/ b0 = n x t x 0 n t s 0 n 3/ x t x 0 t n b 3/ b0 n 0 n n 0 x t x 0 0 n 3/ t 0 n 3/ n b 3/ b0 n 0 0 n 3/ n n 0 x t x t 0 n 3/. 0 n 3/ n 0 Now from above we know that 0 n 3/ n n 0 x t x t 0 n 3/ M. Further, s P σ and n b 3/ D b N 0, σ m by equation 33. 7

74 Hence we get: t b P n b 3/ b0 σ 0 M 0 D N 0,. 8. Over and Under Differencing Let us now consider the cases where the true model is RW and we treat it as S under differencing as well as the reverse case, i.e. the true model is S and we treat it as RW over differencing. Under-Differencing Assume that we estimate the following equation y t = â + bt + ê t where â and b denote the LS coeffi cients, ê t t =,...n. However, the true model is denotes the LS residual, and y t = µ + y t + u t, where u t is such that S t = t j= u j sutisfy an appropriate functional Central Limit heorem see Durlauf and Phillips 988 for details. In this setup it is posible to prove that see heorem 3. in Durlauf and Phillips 988, as n : n / â D N 0, σ /5, where σ = lim n n E S n n / b µ D N 0, 6σ /5 3 he t statistics, t a=0 and t b=0, diverge 4 he DW Durbin-Watson statistic goes in probability to 0 73

75 5 he R has a nondegenerate distribution. he results do not change if µ = 0 see heorems. and. in Durlauf and Phillips 988, just set µ = 0 in. Notice that the asymtotic distribution of â is non-standard, whereas the one of b is. However, the t b=0 diverges, invalidating any standard inference. Notice that the DW is going to 0 indicating that, for large data sets the probability of mistaken a RW process as a S one is very low. However, a low DW statistic does not necessarily imply a RW process. he R converges in distribution to a nondegenerate random variable with an expected value approximately 0.44 see Nelson and Kang 98. Another effect of under-differencing is the spurious periodicity of the autocorrelations of the fitted values, ŷ t = â + bt see Nelson and Kang 98 and Patterson 0. Over-Differencing Consider the following process y t = a + bt + ε t, where ε t is iid with Eε 4 t <. It follows that y t = b + ε t ε t, i.e. over-differencing induces a nonivertible M A process. It seems that overdiffrencing is less of problem, as compared to under-differencing, provided that the autocorrelation of the errors are taken into acount. In fact, estimating the following model y t = b + ε t θε t, by ML, provided that ε t is iid N 0, σ, it is possible to prove that θ is n consistent see Shephard 993 and Sargan and Bhargava 983. However, although 74

76 θ converges to faster than the standard n rate, its asymptotic distribution is not normal for an indirect estimator with non iid and/or nonnormal ε t see Arvanitis 03. Finally, for implications of over-differencing for the spectral density function see Patterson Spurious Regressions Assume two independent RW s, i.e. y t = y t + u t, and x t = x t + v t, where u t and v t are independent and satisfy some heterogeneity and weak dependence conditions see Phillips 986. Now according to Durlauf and Phillips 988 heorem 5. the coeffi cients in the least squares regression y t = â + bt + ĉx t + û t, t =,..., n have the following asymptotic behaviour: ĉ converges weakly to a nondegenerate random variable; â diverges; 3 b P 0; 4 s, the estimate of the error variance, diverges; 5 t c=0, the t statistic for c = 0, diverges; and 6 the Durbin-Watson statistic goes in probability to zero, i.e. DW P 0. Notice that the only consistent estimator is b. ĉ has a nondegenarate asymptotic distribution, which explains the simulation results in Granger and Newbold 974, althought their results concern a regression without a time trend. Even in this case, i.e. when a time trend is not included, ĉ has a nondegenarate asymptotic distribution which differs by the presence of terms which express the interaction between the time trend and the nonstationary series see Phillips 75

77 986 and Durlauf and Phillips 988. he t c=0 test, diverges as in the nondetrended case Phillips 988. Hence, regardless of the inclusion of the time trend, the t statistic test will diverge and consequently, the nonstationarity of the underlying series is the critical issue, rather than inappropriate detrending see Durlauf and Phillips 988. Again the DW statistic converges in probability to zero. o quote Durlauf and Phillips 988 "he inappropriateness of the conventional t statistic test should thus become apparent to the investigator from the inspection of residual DW diagnostics. Once again the results indicate that the employment of conventional significance tests must be suspect until the stationarity of the dependent variable is resolved." Consider now the regression of the two independent RW s without the drift, i.e. y t = â + ĉx t + û t, t =,..., n. Further, assume that the two independent RW s have a drift, i.e. y t = γ y + y t + u t, and x t = γ x + x t + v t, where u t is iid 0, σ y, vt is iid 0, σ x are u t and v t are independent for all t s. Under regularity assumptions made explicit in Phillips 986, Entorf 99 was able to get the following: ĉ P γ y γ x ; â diverges; and 3 DW P 0. It is evident that is cases that the RW s have a drift the regression coeficient, ĉ, converges to a constant, unlike in cases where the are no drifts see above where the coeffi cient converges to a random variable. In both case, the consant 76

78 of the regression â diverges. Finally, notice that in both cases the DW statistic goes to 0 in probability, hence DW can be an important tool to detect spurious regressions, in large samples. 9 Unit Roots In order to derive unit root tests, against the stationary alternatives, we need first to derive the distribution of the OLS estimator of the autoregressive parameter under the unit root null hypothesis. his distribution turns out to be a function of Wiener processes and hence a brief introduction to the Wiener processes is needed. 9. he Wiener Process Let W be the change in the Wiener process, or standard Brownian motion, say W, during a small time interval t. hen have that: W = z t where z N 0,. It follows that E W = 0, V W = t. Further, for any disjoint small time intervals, the values of W are independent, as the zs are independent. Now consider the value of W during a long period, say W. hen break the long period in to n non-overlapping equal small intervals t, i.e. = n t. hen we have that n W W 0 = z i t, and since z i iidn 0,, assuming that W 0 = 0 we get that E W = 0, V W = n t =. 77 i=

79 Now consider S = z t, where z t iidn 0,. hen S t is a random walk as S t = S t + z t. Assuming that S 0 = 0, we have that as in the case of W. E S = 0, V S =, Now divide the interval [0, ] into intervals of length /, i.e. the points of the interval are 0, /, /,..., /,. Further define a new index r in [0, ] corresponding to the time t, in {0,,..., }, by the relation i r < i for i =,...,. Let [r ] denote the integer part of r, i.e. for = 50 and r = 0.45, [r ] = [ ] = [.5] =. Finally define the step function X r = S [r ]. According to the so called Donsker s Functional Central Limit heorem we get that X r D W r. Furthermore, from the Continuous Mapping heorem we have that if g. is a continuous function on [0, ] we have that g X r D g W r, see Billingsley 968 section 5 for proofs. Results on Wiener Process Some basic results, employing the Donsker s theorem, are provided in the sequel. Suppose that y t is a random walk, i.e. y t = y t + z t 78

80 where z t iidn 0,, for t =,,...,. Assume that y 0 = 0 for simplicity. Let y = y t, then we have the following lemma. Lemma 9. i y D 0 W r dr, ii y t D 0 [W r] dr, iii 5 ty t D 0 rw r dr, iv 3 tz t D 0 rdw r, and v y t z t D 0 W r dw r. Proof of Lemma 9.. i Consider the step function X r = y [r ] = y i for i r < i and X = y where X r is a step function with steps y i at i steps. Hence it follows that and is constant between as 0 X r dr = i= i i i i X r dr = dr = r i i i= i y i = i i i =. dr = Now, at least asymptotically, i= y i i= y i we have that Now by Donsker s heorem we have that 0 X r dr = y. X r D W r i= y i and it follows that y D 0 W r dr. 79

81 ii With the same logic we have that 0 [X r] dr = i= i i [X r] dr = i= i y i dr = i i= y i. Now, at least asymptotically, i= y i i= y i we have that 0 [X r] dr = yi. Now by Donsker s heorem and the Continuous Mapping heorem we have that i= [X r] D [W r] and it follows that iii Now y t D 0 [W r] dr. 0 as rx r dr = i= = i i i= rx r dr = i= i y i i y i = 5/ i rdr i= i y i 5/ i= y i i i rdr = r i i = i i = i = i. Further, as 5/ i= y i 3/ y P 5/ i= ty t we get that 5/ i= P 0 and 5/ i= i y i = 5/ i= ty t i y i 5/ i= y i P 5/ ty t. Now by Donsker s heorem and the Continuous Mapping heorem we have that i= rx r D rw r 80

82 it follows that iv Notice 5/ ty t i= D 0 rw r dr. 3 tz t = = i= 0 i z i = rdx r D i= 0 i i i dx r = W r dw r. i= i i rdx r v Last y t z t = = i= 0 y i z i = X r dx r D i= 0 i i X r dx r W r dw r. Notice the following mapping see Maddala and Kim 998:, t r, z t dw r and y t W r. Now as z t iidn 0, and y = z t it follows that y N 0, = W. he following lemma presents the relation between W r and the normal distribution. Lemma 9. i W r dr N 0, 0 3, ii rdw r N 0, 0 3, iii r a W r dr N 0, 8 5a+0a, iv If W r and W r are indepen dent Wiener processes then 0 W r dr 0 W r dv r N 0,, v Now if z t N 0, σ z but not independent and lim E yt = σ y > 0 then W r dw r σ y σ 0 χ + y σ z. 8

83 Proof of Lemma 9.. y = = 3/ 3/ i Notice that t y t = 3/ tz t+. It follows that V ar y i= z i = 3/ z + z z = V ar 3 tz t+ = 3 t = for. Further, the distribution of 3 y is normal, as it is a sum of independent normal variates. But from the previous lemma, 9.i, we have that y D 0 W r dr and it follows that 0 0 W r dr N 0,. 3 ii Again notice that V ar 3 t tz = + +, E t tz = 0 and as the z ts are normally distributed we get that 3 tz D t N 0, 3. From lemma 9.iv we have that 3 tz D t rdw r and it follows that 0 rdw r N 0,. 3 iii Notice first that y t = z + z + z z t + z = z t t z t = t= z t tz t and ty t = z + z + z + 3 z + z + z z + z + z z + z = = [ + + z + [ ] t t z t z t = t= 8 t= ] [ + z + z + t= z t ] z z t t z t + t= tz t. t=

84 Hence V ar 5 ty t a 3/ y t = V ar ty 5 t a 4 Cov ty t, + a V ar 34 y 3 t y t. Now for the first term of the above equation we have: [ ] V ar ty 5 t = V ar + z 5 t t z t + tz t t= t= = + [ ] [ ] V ar z 5 t V ar t z 5 t + 4 V ar tz 5 t t= t= [ ] [ + ] Cov z 5 t, t + z t + Cov z 5 t, tz t t= t= [ ] Cov t z 5 t, tz t as t= = t= t= t= t 5 t t t= + 5 t = t= t 0 and 5 t= t3 0 for. For the second term in equation 34 V ar y 3 t = V ar 3 = 3 V ar Cov = + 3 z t z t z t, t= t 83 t= tz t t= + 3 V ar tz t t= z t t= t= tz t + V ar z 3 t t= t= V ar z t + Cov tz 3 t, t + 3 = 3, t= t t= z t t=

85 as V ar 3 t= z t 0, Cov 3 t= tz t, t= z t 0 and V ar z t 0 for. Cov ty 4 t, as + 4 0, 3 Cov V ar y t t= z t t= tz t, z t = Cov 4 + z t = = 5 4 0, Cov 4 t= t z t, 0 and Cov 4 t= z t t= t z t + t= tz t, z t t= tz t t= z t t= t 3 t= t + 4 0, V ar 4 0 for t= tz t, t= z t. Consequently, substituting in equation 34 we get: V ar ty t a y 3/ t = a 3 a But from the previous lemma, 9.i and v, below, we get that 5 r a W r dr and it follows that 0 8 5a + 0a r a W r dr N 0, ty t t= tz t a 3 iv Let F t = σ W r, r t, F t = σ dv r r t and F t = σ F t F t be the natural filtration generated by the two independent Wiener processes. First, by the conditional Ito Isometry see e.g. Steele 00 we have that [ t ] [ t ] E W r dv r F t = E W r dr F t, and as E t 0 s W r dv r <, s t L Wtn Vtn+ V tn W r dv r and due to independence and normality, which is preserved in the limit we get that t 0 W r dv r F t N 84 0, t 0 0 W r dr. t= t 3 D y t

86 v Notice that yt = yt + y t z t + zt. Hence y t z t = and it follows that yt yt zt = y y t z t = y z t. zt. Further, y = z t and from normality of the z t s we get y N 0, σ y and y σ yχ. From the Law of Large Numbers we have that z t V ar z t = σ z. Hence y t z t D σ y χ σ z = σ y σ y and from lemma 9.v we have that χ + σ y σ z, y t z D t W r dw r. Now 0 if z t iidn 0,, then y N 0, and y χ and z t V ar z t =, and it follows that 0 W r dw r χ. 9. Unit Root ests without Deterministic rend o visualize the difference between a stationary and a random walk process we depict in the following figures an AR process with an autoregressive coeffi cient of 0.89 and a random walk, where the errors are drawn from a normal distribution. he same random errors are employed for the simulations of the processes. 85

Y X 8 6 4 30 0 0 0 0 4 6 0 8 50 300 350 400 450 500 550 600 0 50 300 350 400 450 500 550 600 Fig. 9.: Graph of the AR process y t = 0.

9.: Correlogram of y t = 0.89y t + u t, u t N 0,.4. Fig. 9.: Correlogram of x t = x t + u t, u t N 0,.

87 Y X Fig. 9.: Graph of the AR process y t = 0.89y t + u t, u t N 0,.4. Fig. 9.: Graph of the random walk x t = x t + u t, u t N 0,.4. Furthermore, the following figures present the correlogram of the two processes: Fig. 9.: Correlogram of y t = 0.89y t + u t, u t N 0,.4. Fig. 9.: Correlogram of x t = x t + u t, u t N 0,.4 here are distinct differences between the graphs and the correlograms of the two processes. However, in practice things are not so obvious. Let us now return 86

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50 GARCH Models Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 50 Outline 1 Stylized Facts ARCH model: definition 3 GARCH model 4 EGARCH 5 Asymmetric Models 6