Time Series Methods. Sanjaya Desilva

Time Series Methods Sanjaya Desilva

1 Dynamic Models In estimating time series models, sometimes we need to explicitly model the temporal relationships between variables, i.e. does X affect Y in the same period, or with a lag of one or more periods? do past values of Y affect the current value of Y? Economic variables often influence each other with lags. See Gujarati, Ch. 17 for several examples and explanations of the reason for lags and inertia. We consider two types of dynamic models: 1) distributed lag models, and 2) autoregressive models. 1.1 Distributed Lag Model In the distributed lag model, the current value of Y is influenced by the current value of X as well as its lags. Y t = β + β 0 X t + β 1 X t 1 + β 2 X t 2 +... + β k X t k + ɛ t (1) The estimation of this model allows us to figure out the dynamics of the effect of X on Y. The total, or long-run effect of X on Y is, β = β 1 +... + β k (2) If we want to know whether all the lags have a collective effect on Y, we can perform a F-test with appropriate restrictions to capture the null hypothesis. Question: Set up this F-test. There are three problems associated with distributed lags models. 1. We need to decide how many lags to use. Because the multiplier effects generally diminish over time, it makes theoretical sense to use a finite number of lags. 2. The lags of the same X variables are likely to be highly correlated. This may lead to multicollinearity problems. Question: What are the consequences of multicollinearity? 1

3. If the sample size is small, as is generally the case with time series data sets, the use of multiple lags erodes degrees of freedom. Question: What are the consequences of low degrees of freedom? 1.2 Autoregressive Model The other type of dynamic model we have learned is the autoregressive model. Here, the Y variable has persistent effects on itself. For example, current consumption depends on past consumption, and the current stock price depends on the past stock price. A first-order autoregressive model can be written as Y t = β 0 + β 1 X t + β 2 Y t 1 + ɛ t (3) 2 Koyck Transformation Koyck demonstrated that, under certain conditions, the distributed lag model can be expressed as an autoregressive model. By making this transformation, Koyck argued that the three pitfalls associated with the distributed lag model (see above) can be avoided. In order to make this transformation, we need assume an infinite lag model with the following structure for the lags. β k = β 0 λ k (4) where λ is known as the rate of decline or decay, and 0 < λ < 1 Question: Is this lag structure realistic? Why is it necessary to have 0 < λ < 1? With this transformation, the infinite distributed lag model can be rewritten as Y t = β + β 0 [X t + λx t 1 + λ 2 X t 2 +...] + ɛ t (5) Question: In a model with infinite Koyck lags, the combined, or long-run, effect of X on Y is β 0. Why? (Hint: Use the expression for the sum of an infinite series) 1 λ 2

If we multiply both sides of this equation by λ and lag both sides by one period, we get λy t 1 = λβ + β 0 [λx t 1 + λ 2 X t 2 + λ 3 X t 3 +...] + λɛ t 1 (6) By subtracting this expression from the previous one, we get Y t λy t 1 = β(1 λ) + β 0 X t + ɛ t λɛ t 1 (7) Rearranging terms, we get a familiar autoregressive model, Y t = β(1 λ) + β 0 X t + λy t 1 + ɛ t λɛ t 1 (8) In the infinite distributed lag model, each lag effect was given by β k = β 0 λ k (9) and the total long-run effect was given by β 0 1 λ (10). Therefore, if we can somehow obtain estimates for β 0 and λ without having to explicitly estimate an infinite distributed lag model, we can figure out individual as well as combined effects of X on Y. The autoregressive model we just formulated allows us to do just that. The coefficient of X t is an estimate of β 0 and the coefficient of Y t 1 is an estimate of λ. Note that the estimation of the Koyck autoregressive model does not suffer from the three pitfalls associated with running distributed lag models because all the combined effects of infinite lags are captured by one single autoregressive term. Therefore, this method is more efficient. Unfortunately, we cannot obtain efficient and unbiased estimates for β 0 and λ by estimating the Koyck autoregressive function using OLS. There are two reasons for this. 1. The error term in the Koyck AR model is serially correlated. 3

2. The autoregressive lag variable Y t 1 is endogenous, i.e. correlated with the lagged error term e t 1. Question: The first problem leads to inefficient estimates, whereas the second problem leads to biased coefficient estimates. Why? There are more sophisticated methods that can be used to estimate such a model. For example, the second problem can be addressed if we can find an appropriate instrumental variables for the lagged Y variable. Question: What properties should this instrumental variable have? These methods are beyond the scope of this course. 3 Causality We can use the idea of lags to establish causal relationships in time series data. 3.1 Granger Test for Causality Suppose you want to test whether X causes Y. Granger proposed the estimation of the following pair of equations, Y t = α 1 X t 1 +... + α k X t k + β 1 Y t 1 +... + β k Y t k + u1 t (11) X t = λ 1 Y t 1 +... + λ k Y t k + δ 1 X t 1 +... + δ k X t k + u2 t (12) Consider X t 3. This variable can influence Y t in two ways. The first is a direct casual effect from X t 3 to Y t. The second is an indirect effects; for example, there could be a causal effect from X t 3 to Y t 2, and then another causal effect from Y t 2 to X t 1 that then gets transmitted to Y t. There are many such effects, but they all go through a lagged Y variable. Therefore, once we control for all lags of Y, if the lags of X still have significant effects on the current Y, we can conclude that there is indeed a direct causal effect from X to Y. In other words, when you control for all lags of Y, in effect controlling all effects from lagged Xs to current Y via lagged Ys, 4

we isolate the direct causal effects from the lags of X to current Y. Therefore, the if all α i s are jointly significant, we can conclude that X (Granger) causes Y. Similarly, if all λ i s are jointly significant, we can conclude that Y (Granger) causes X. Question: Construct a formal test to establish that X causes Y. (Hint: Use the restrictions F-test). 3.2 Sims Test for Causality Sims proposed a simpler test for causality using leads and lags. He suggested estimating the following equation, Y t = α + β k X t k +... + β 1 X t 1 + β 0 X t + λ 1 X t+1 +... + λ k X t+k (13) Here, we are running the Y variable in year t on X variables with k lags (preceding years) and k leads (succeeding years). Sims insight was that if X causes Y, the lags of X should influence Y, but controlling for the lags, the leads of X should not influence Y. Therefore, if all λs are jointly insignificant and all βs are jointly significant, we can conclude that X causes Y. Question: Construct the formal test for causality. Again, we need to use the F-test for restrictions. Note that unlike with the Granger model, we need data on lead years to carry out this test. For example, if the Y variable is for 2000, we can use X variables from 1990 to 1999 (lags) and 2001 to 2007 (leads) to test for causality. 4 Stationarity We begin with some definitions. 1. A stochastic process is a collection of random variables ordered over time. For example, GDP is a random variable from which we observe realizations every year. The time-series of such realizations is called a stochastic process. 5

2. A stationary stochastic process is a stochastic process whose mean and variance do not change over time (There are additional conditions on the covariance that we can ignore for present purposes). For example, if the mean and variance of the distribution from which the GDP is obtained every year remains the same, we can call the GDP process stationary. 3. A nonstationary stochastic process is a stochastic process whose mean or variance changes over time. Question: Do you think the GDP is in fact stationary? How about the Dow Jones index? The GDP growth rate? The inflation rate? 4.1 Consequences of Nonstationarity Nonstationary variables tend to show distinct patterns over time. Therefore, even if two non-stationary variables are not causally relatied, a regression between the two of them would show a strong correlation because both variables have these patterns. This scenario is called spurious correlation. For example, the GDP fluctuates from year to year, but it has a constant positive trend. So does the Dow Jones Index. Therefore, a regression of GDP on Dow Jones will yield a significant positive effect even though these two variables may not be causally related. 5 Random Walks Random Walks are among the best known examples on nonstationary processes. The simplest random walk is a random walk without drift. 5.1 Random Walk without Drift Consider the following stochastic process. Y t = Y t 1 + u t (14) 6

where u t is a random shock with mean 0 and variance σ 2. Economists who believe in the efficient market hypothesis believe that stock prices are random walks without drift. Question: Why is this an AR1 process? What specific restriction must be imposed on a standard AR1 process to obtain a random walk without drift? Note that we can rewrite this equation as Y t = Y 0 + u 1 + u 2 +... + u t (15) Question: Show how this can be done. Question: Using the fact that u t (0,σ 2 ), show that E(Y t ) = Y 0 (16) V ar(y t ) = tσ 2 (17) Question: Using what you just found, can you confirm that the random walk without drift a nonstationary process? 5.2 Random Walk with Drift A slightly different stochastic process is as follows Y t = δ + Y t 1 + u t (18) where u t is the same random shock as before, but δ is a constant drift parameter. Because of the drift, the Y variables gets shifted by a constant δ in addition to getting shocked by u t in every period. Question: Show that E(Y t ) = Y 0 + t.δ (19) V ar(y t ) = tσ 2 (20) Question: Confirm that the random walk with drift is in fact nonstationary? 7

5.3 Unit Root Consider a simple AR1 process Y t = ρy t 1 + u t (21) When the AR1 process has a unit root, i.e. ρ = 1, the AR1 process becomes a random walk without drift. We know that the random walk is in fact nonstationary. It can be shown (we don t need to know how to show this as yet) that the AR1 process is stationary when ρ < 1 Therefore, a test for a unit root, i.e.ρ = 1 has become a common test for nonstationarity. The unit root is a useful concept because it provides us with a solution for the nonstationarity problem. Suppose Y t is a nonstationary variable that a unit root. Then, Y t = Y t Y t 1 = u t (22) When you take the differences in Y, rather than the absolute Y, we get a stationary variable. Question: Why is Y t stationary? What is the expected value of Y t? What is the variance of Y t? Do either of these change over time? (Hint: the answers are 0 and σ 2 respectively). Here is the solution summarizes: Suppose you have two nonstationary variables, GDP and Dow Jones Index, but both of these have a unit root, then you can avoid spurious correlation by running a regression of the changes in GDP from year to year ( GDP t = GDP t GDP t 1 ) on the changes in the Dow Jones index from year to year ( DJ t = DJ t DJ t 1 ). The incorrect and correct models, respectively are GDP t = β 0 + β 1 DJ t + ɛ t (23) GDP t = β 0 + β 1 DJ t + ɛ t (24) 8

5.4 Testing for Unit Roots We have the solution of using differences if our stochastic processes do in fact have a unit root. In order to test whether there is a unit root, Dickey and Fuller proposed estimating the following equation. Y t = δ 0 + δ 1 Y t 1 + ɛ t (25) In this equation, the null hypothesis for a unit root is H 0 : δ 1 = 0. (Question: Why does δ 1 = 0 imply ρ = 1?). For a random walk without drift, we can test additionally whether δ 0 = 0. Note: In the Dickey-Fuller test, we can t use the standard t-tests to ascertain significance. These coefficients follow a special D-F distribution with its own set of tables in the book. You don t need to know the specifics of this distribution, except to note that the standard t-test should be replaced by the D-F test. Say we find that δ 0 = 0 and δ 1 = 0. Then, Y t = ɛ t (26) Y t = Y t 1 + ɛ t (27) That is, Y t has a unit root and is nonstationary, but Y t is stationary. We can use the difference method. 9