Introduction to Time Series Modelling This version: 29 November 2018

Size: px

Start display at page:

Download "Introduction to Time Series Modelling This version: 29 November 2018"

Beatrice Norris
5 years ago
Views:

1 Introduction to Time Series Modelling This version: 29 November 2018 Notes for Intermediate Econometrics / Time Series Analysis and Forecasting Anthony Tay # This note uses the following libraries library(tidyverse) library(forecast) library(ggfortify) library(gridextra) library(grid) library(dynlm) # Theme settings for figures: ts_thm <- theme(text = element_text(size=14), axis.title = element_text(size=11), axis.line = element_line(linetype = 'solid'), panel.background = element_blank()) # We use data from "timeseries_quarter.csv" and timeseries_monthly.csv" df_qtr <- read_csv("timeseries_quarterly.csv") glimpse(df_qtr) Observations: 128 Variables: 3 $ Period <chr> "1980Q1", "1980Q2", "1980Q3", "1980Q4", "1981Q1", " $ Y <dbl> , , , , , $ Y1 <dbl> , , , , , df_mth <- read_csv("timeseries_monthly.csv") glimpse(df_mth) Observations: 420 Variables: 6 $ DATE <chr> "Jan-83", "Feb-83", "Mar-83", "Apr-83", "May-83",... $ ELEC_GEN_SG <dbl> 667.3, 586.9, 727.4, 719.0, 727.1, 728.4, 741.1, 7... $ TOUR_SG <int> , , , , , , $ IP_SG <dbl> 14.34, 11.37, 14.50, 12.86, 13.01, 12.62, 13.56, 1... $ CPI_US <dbl> 97.8, 97.9, 97.9, 98.6, 99.2, 99.5, 99.9, 100.2, 1... $ DOMEX5_SG <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA... 1

2 Intermediate Econometrics / Time Series Analysis and Forecasting 2 Characteristics of Economic Time Series The main issue with working with economic time series, whether it be estimating causal or predictive relationships, is that they often display intertemporal correlations, dependence on time, or both. In other words, economic time series are seldom i.i.d.. Intertemporal correlations and dependence on time manifest themselves in a number of ways, including cycles, seasonalities, and trend. Many economic series show one or more of these characteristics. As examples, we display singapore electricity generated and tourist arrivals series. Both series show clear trend over time. Apart from trend, the electricity generation series also shows what appears to be a very regular seasonal pattern occurring over each year. This pattern is also present in the tourist arrival data, though it is not as easy to see from the time series plot. It is not uncommon for economic time series to display the effects of unusual, possibly one-off events. For instance, the tourist arrival data also shows a massive, but temporary, fall in second quarter of 2003, due to the outbreak of SARS. ElecGen.ts <- ts(df_mth$elec_gen_sg, start=c(1983,1), end=c(2017,12), frequency=12) TourArr.ts <- ts(df_mth$tour_sg, start=c(1983,1), end=c(2017,12), frequency=12) p1 <- autoplot(elecgen.ts) + xlab("sg Elec. Gen") + ts_thm + theme(aspect.ratio = 2/3) p2 <- autoplot(tourarr.ts) + xlab("sg Tourist Arr.") + ts_thm + theme(aspect.ratio = 2/3) grid.arrange(p1, p2, ncol=2) SG Elec. Gen SG Tourist Arr. Why intertemporal correlations and dependence on time are issues and how they can be exploited for forecasting will be discussed later. Our objective for the time being is to illustrate these concepts. Describing Cycles Consider the simulated series Y in df_qtr. We display below the time series plot and the scatterplot of Y (t) on its lag Y (t 1).

3 Intermediate Econometrics / Time Series Analysis and Forecasting 3 Y.ts <- ts(df_qtr$y, start=c(1980,1), end=c(2011,4), frequency=4) p1 <- autoplot(y.ts) + xlab("t") + ylab("y(t)") + ts_thm + theme(aspect.ratio = 2/3) df_plot <- data.frame(y=df_qtr$y, LY=lag(df_qtr$Y)) p2 <- ggplot(data=df_plot, aes(y=y, x=ly)) + geom_point() + xlab('y(t-1)') + ylab('y(t)') + ts_thm + theme(aspect.ratio = 1) grid.arrange(p1,p2, ncol=2, widths=c(1.4,1)) Y(t) 4 Y(t) t Y(t 1) The series displays a boom-bust cyclical pattern, which is one possible manifestation of intertemporal correlation: when above the mean, the series tends to remain above the mean; when below the mean, it tends to remain below the mean. This positive correlation between observations one period apart is clearly seen in the scatterplot. (A remark on t : Because the data is quarterly, each t refers to a quarterly period. If t = 1997Q1, then t-1 = 1996Q4, and t+1 = 1997Q2. We often take t as an integer series t=1,2,... to represent periods. If t=1 is 1980Q1, then t=2 is 1980Q2, and so on.) The series Y (t), while simulated, is not atypical in economics. Below are similar plots for singapore monthly domestic exports, top 5 electronic products, January 1997 to December DOMEX5_SG.ts <- ts(df_mth$domex5_sg, start=c(1980,1), end=c(2017,12), frequency=12) p1 <- autoplot(window(domex5_sg.ts, c(1997,1), c(2017,12))) + xlab("dom. exp. top 5 electronics") + ts_thm + theme(aspect.ratio = 2/3) df_plot <- data.frame(y=df_mth$domex5_sg, LY=lag(df_mth$DOMEX5_SG)) p2 <- ggplot(data=df_plot, aes(y=y, x=ly)) + geom_point() + ylab("dom. exp. top 5 elec") + xlab("lagged dom. exp. top 5 elec") + ts_thm + theme(aspect.ratio = 1) grid.arrange(p1,p2, ncol=2, widths=c(1.4,1))

4 Intermediate Econometrics / Time Series Analysis and Forecasting dom. exp. top 5 elec dom. exp. top 5 electronics lagged dom. exp. top 5 elec The correlation illustrated by the scatterplots is often referred to as autocorrelation, or serial correlation, as it is correlation of a series Y (t) with itself, albeit at a lag. In particular, the correlation illustrated by the scatterplot of Y (t) against its first lag Y (t 1) is referred to as the autocorrelation at lag 1. In fact, the scatterplots of the simulated Y t and top 5 electronic domestic exports series suggest that a model such as Y t = β 0 + β 1 Y t 1 + ɛ t, might describe the behavior of both series. Such a model is called an autoregression of order 1, or AR(1). In fact, the series Y t is a simulated AR(1). Such a model, or extensions of it, will also fit the domestic export series well. The AR(1) model plays a huge role in time series econometrics, and is a member of a much larger class of models called Autoregressive Integrated Moving Average ( ARIMA ) models, which we will study in some detail at a later stage. While cycles may seem somewhat irregular, the AR(1) under certain conditions is in fact very regular in a certain sense. In particular, if β 1 < 1, the AR(1) is covariance-stationarity. A time series Y t is said to be covariance-stationary if i. E[Y t ] is finite and constant for all t ii. var[y t ] is finite and constant for all t iii. cov[y t, Y t k ] may depend on k, but not on t Note that these are unconditional expectations. The first of these conditions says that the series fluctuates around some constant value (i.e., no trend). The second says that the size of the fluctuations in the series will stay roughly the same over time. The following graphs give an example each of series where the second condition holds (left) and does not hold (right). Y.ts <- ts(df_qtr$y, start=c(1980,1), end=c(2011,4), frequency=4) #defined earlier Y1.ts <- ts(df_qtr$y1, start=c(1980,1), end=c(2011,4), frequency=4) p1 <- autoplot(y.ts) + ylab("y") + ts_thm p2 <- autoplot(y1.ts) + ylab("y1") + ts_thm

5 Intermediate Econometrics / Time Series Analysis and Forecasting 5 grid.arrange(p1, p2, nrow=1) Y Y The third condition describes the autocovariance of a series, the covariance between a series and itself (at a lag). cov(y t, Y t k ) = E[(Y t E[Y t ])(Y t k E[Y t k ])]. The earlier scatterplot of Y t on Y t 1 illustrated the autocovariance of Y t at lag 1. One can generate scatterplots of Y t on Y t 2, Y t on Y t 3,... corresponding to autocovariances of Y t at lags 2, 3, and so on. What condition (iii) says is that these intertemporal correlations do not change over time. If you were to plot any particular scatterplot (say of Y t on Y t 1 ) over subperiods spanning the range of your sample, the scatterplots should look rather similar. The autocovariances at different lags can be (and usually are) very different, but for any fixed lag k, the autocovariance does not change with time. The autocovariance at lag k is often denoted γ k. Obviously, γ 0 refers to the variance of Y t. A related concept is the autocorrelation at lag k, which is ρ k = corr(y t, Y t k ) = cov(y t, Y t k ) var(yt ) var(y t k ). For a stationary series, this simplifies to ρ k = γ k γ 0. We often prefer working with the autocorrelation rather than the autocovariance. We now show that the AR(1) with β 1 < 1 is covariance stationarity. Suppose Y t follows, for all t, the specification Y t = β 0 + β 1 Y t 1 + ɛ t, β 1 < 1, ɛ t i.i.d. with E[ɛ t ] = 0, var[ɛ t ] = σ 2 ɛ.

6 Intermediate Econometrics / Time Series Analysis and Forecasting 6 To prove covariance stationarity, we first re-write Y t in a different form, using backward substitution : Y t = β 0 + β 1 Y t 1 + ɛ t = β 0 + β 1 (β 0 + β 1 Y t 2 + ɛ t 1 ) + ɛ t = β 0 + β 0 β 1 + β 2 1(β 0 + β 1 Y t 3 + ɛ t 2 ) + ɛ t + β 1 ɛ t 1 =... = β 0 1 β 1 + ɛ t + β 1 ɛ t 1 + β 2 1ɛ t This form makes it easy to derive the properties of the AR(1): E[Y t ] = β 0 1 β 1 + E[ɛ t ] + β 1 E[ɛ t 1 ] + β 2 1E[ɛ t 2 ] +... = β 0 1 β 1 var[y t ] = var[ɛ t ] + β 2 1var[ɛ t 1 ] + β 4 1var[ɛ t 2 ] +... = σ 2 ɛ + β 2 1σ 2 ɛ + β 4 1σ 2 ɛ +... = σ2 ɛ 1 β 2 1 both finite and constant. Note that E[Y t ] = 0 when β 0 = 0. The autocovariance at lag k can be calculated as cov[y t, Y t k ] = E[(Y t E[Y t ])(Y t k E[Y t k ])] = E[(ɛ t + β 1 ɛ t 1 + β 2 1ɛ t 2 + )(ɛ t k + β 1 ɛ t k 1 + β 2 1ɛ t k2 + )] = β k 1 E[ɛ 2 t k] + β k+2 1 E[ɛ 2 t k 1] + β k+4 1 E[ɛ 2 t k 2] + = βk 1 σ2 ɛ 1 β 2 1 Therefore the autocorrelation is ρ k = β k 1 Because the unconditional mean and variance is constant (and finite), and the autocorrelations do not depend on time (they only depend on lag), the series is covariance-stationary. Covariance stationarity is an important concept in forecasting. In forecasting, the basic idea is to measure regularities in the history of a data series and project it forward to forecast. For a series to be forecastable, it has to be regular in some sense. Covariance stationary is a formal expression of a kind of regularity that permits forecastability: if the autocorrelation, say, at lag 1, is constant over time, then we can measure this autocorrelation from past data, and then use this period s realization to forecast next period s outcome. Of course, we cannot expect all economic time series of interest to be covariance stationary. If a series is not covariance stationary, then hopefully it can be transformed to stationarity in some manner, or hopefully its deviation from

7 Intermediate Econometrics / Time Series Analysis and Forecasting 7 stationarity is, somehow, predictable. There is in fact more than one level of stationarity. In these notes, we shall be concerned only with covariance-stationarity, and will use the terms covariance-stationary and stationary interchangeably. Note that the autocorrelation of the covariance-stationary AR(1) decays as k increases. Observations become less correlated as they get farther apart, becoming virtually uncorrelated in the limit. This is in fact the case with the simulated Y t series. We plot below the scatterplot of Y t with various lags: df <- tibble(y=df_qtr$y, L1=lag(Y), L2=lag(Y,2), L5=lag(Y,5), L10=lag(Y,10)) pthm <- theme(aspect.ratio = 1, text = element_text(size=8), axis.title = element_text(size=10), panel.background = element_blank(), strip.text.x = element_text(size=10)) df %>% gather(lag, Val, -Y) %>% mutate(lag=factor(lag, levels=c("l1", "L2", "L5", "L10"), labels=c("y(t-1)","y(t-2)","y(t-5)","y(t-10)"))) %>% ggplot() + geom_point(aes(y=y, x=val)) + facet_wrap(~lag, ncol=4) + pthm + xlab("") 6 Y(t 1) Y(t 2) Y(t 5) Y(t 10) 5 Y The decay to zero of the autocorrelations, as k increase, is a feature we refer to as weak-dependence. We say the covariance stationary AR(1) is a weakly-dependent series. We can present the autocorrelations more systematically, by calculating the sample autocorrelation at lag k as Tt=k+1 (Y t Y )(Y t k Y ) ˆρ = Tt=1 (Y t Y ) 2 for k = 1, 2,... K for som K reasonably smaller than T. These values make up the series sample autocorrelation function. ci = qnorm(0.975)/sqrt(length(y.ts)) ggacf(y.ts) + scale_y_continuous(limits=c(-1,1)) + labs(title="sample ACF of Y") + ylab("") +

8 Intermediate Econometrics / Time Series Analysis and Forecasting 8 geom_hline(yintercept = c(ci, -ci), color = "black", linetype = "dashed") + theme(panel.background = element_blank(), aspect.ratio = 1/3) 1.0 Sample ACF of Y Lag The bands are the 0.95 confidence interval, centered at zero, using the fact that when the true autocorrelations are zero, the sample autocorrelations are asymptotically normal with mean zero and variance 1/T (proof omitted). Estimation The AR(1) is essentially a linear regression model, and can be estimated as such. There are a few estimation issues, both with the OLS estimator as well as implementation, but we shall not concern ourselves with these issues here. ar1_mdl <- dynlm(y.ts ~ L(Y.ts), data=df) summary(ar1_mdl) # dynlm() from dynlm package, L is lag Time series regression with "ts" data: Start = 1980(2), End = 2011(4) Call: dynlm(formula = Y.ts ~ L(Y.ts), data = df) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** L(Y.ts) < 2e-16 *** ---

9 Intermediate Econometrics / Time Series Analysis and Forecasting 9 Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 125 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 471 on 1 and 125 DF, p-value: < 2.2e-16 Yhat.ts <- fitted(ar1_mdl) autoplot(cbind(y.ts, Yhat.ts)) + ylab("") + xlab("") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted ehat <- residuals(ar1_mdl) autoplot(ehat) + geom_hline(yintercept=0) + ts_thm + theme(aspect.ratio = 1/2) + aes(size=plot_group) + scale_size_manual(labels = "Residuals", values = 1, name = "series") series Residuals

10 Intermediate Econometrics / Time Series Analysis and Forecasting 10 Visually it appears as though the fitted series is just the actual, with a shift. This is almost correct, since the fitted values are in fact Ŷt = Y t 1. The residual plot show that much of the cyclical patterns have in fact been captured by the model. Are there cycles left over? Different people might see different things in the residual plot. The sample autocorrelation function of the residuals is a more formal way of examining the residuals, and they show little intertemporal correlations. ci = qnorm(0.975)/sqrt(length(ehat[!is.na(ehat)])) ggacf(ehat) + scale_y_continuous(limits=c(-1,1)) + labs(title="sample ACF of residuals") + ylab("") + geom_hline(yintercept = c(ci, -ci), color = "black", linetype = "dashed") + theme(panel.background = element_blank(), aspect.ratio=1/3) 1.0 Sample ACF of residuals Lag We will say more about the AR(1) later, when we discuss the larger class of ARIMA models. For now, we turn to the issue of trends and seasonalities. Describing Trends We return to the electricity generation and tourist arrival data, reproduced here SG Elec. Gen SG Tourist Arr.

11 Intermediate Econometrics / Time Series Analysis and Forecasting 11 We will focus on modelling of the trend. Before doing so, we note the increasing size of fluctuations in these two series over time. This is a feature we often observe in trending economic time series. In such cases, taking (natural) log transformation of the data often stabilizes the size of the fluctuations. Note that first difference of log transformed variables approximate percentage growth rates: ln(y t ) ln(y t 1 ) Y t Y t 1 Y t 1. Many economic variables tend to grow at a roughly constant percentage rate. Such variables will have exponential trends. The natural log transformation of these variables will then display linear trends. From the plots of the log-transformed data below, we see that the log transformation of the series regulates the increasing fluctuations over time. It also appears that tourist arrivals have been growing at a constant rate over time (the trend is roughly linear, when viewed over the entire sample), whereas growth in electricity generation appears to have slowed down after the 1990s. p1 <- autoplot(log(elecgen.ts)) + xlab("log SG Elec. Gen") + ts_thm + theme(aspect.ratio = 2/3) p2 <- autoplot(log(tourarr.ts)) + xlab("log SG Tourist Arr") + ts_thm + theme(aspect.ratio = 2/3) grid.arrange(p1, p2, ncol=2) log SG Elec. Gen log SG Tourist Arr There are, broadly speaking, two ways of describing trends. The first is to specify Y t as having a deterministic trend component, meaning describing the series as having a specification along the lines of Y t = f(t) + ɛ t, t = 1, 2,..., ɛ t zero mean i.i.d. noise where f(t) is some deterministic trending function (we will take ɛ t is being an i.i.d. noise term for now). For instance we might have Y t = β 0 + β 1 t + ɛ t, t = 1, 2, 3,...

12 Intermediate Econometrics / Time Series Analysis and Forecasting 12 In this specification, Y t changes on average by β 1 units every period; if β 1 > 0, then Y t is trending upwards. The series follows a linear trend, and because the trend component is fully deterministic, the model is called a linear deterministic trend model. You may, and may have to, specify more complicated deterministic trends than a linear one. The (log transformed) electricity generation series does not seem to follow a single linear trend throughout. One possibility might be to specify a quadratic trend: Y t = β 0 + β 1 t + β 2 t 2 + ɛ t, t = 1, 2, 3,... We fit linear and quadratic trneds to the log electricity generated series below. Some might argue that fitting such a quadratic trend to this series works reasonably well, though others may disagree; For sure, it fits the series better than the linear trend. See figures below. Rather than repeat the residual diagnostic plots, we write it as a function instead resid_diag <- function(ehat){ p1 <- autoplot(ehat) + geom_hline(yintercept=0) + ts_thm + theme(aspect.ratio = 1/2) + aes(size=plot_group) + scale_size_manual(labels = "Residuals", values = 1, name = "series") + theme(legend.position = "bottom") ci = qnorm(0.975)/sqrt(length(ehat[!is.na(ehat)])) p2 <- ggacf(ehat) + scale_y_continuous(limits=c(-1,1)) + labs(title="sample ACF of residuals") + ylab("") + geom_hline(yintercept = c(ci, -ci), color = "black", linetype = "dashed") + theme(panel.background = element_blank(), aspect.ratio=1/2) p3 <- grid.arrange(p1, p2, nrow=1) } From the dynlm documentation Trends: trend(y) specifies a linear time trend where (1:n)/freq (default) trend(y, scale = FALSE) employs 1:n time(y) employs the original time index. Y=log(ElecGen.ts) lin_trend_mdl <- dynlm(y ~ trend(y, scale=f)) summary(lin_trend_mdl) Time series regression with "ts" data: Start = 1983(1), End = 2017(12) Call:

13 Intermediate Econometrics / Time Series Analysis and Forecasting 13 dynlm(formula = Y ~ trend(y, scale = F)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.755e e <2e-16 *** trend(y, scale = F) 4.475e e <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 418 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 6814 on 1 and 418 DF, p-value: < 2.2e-16 Yhat1 <- fitted(lin_trend_mdl) autoplot(cbind(y, Yhat1)) + ylab("") + xlab("log ELEC_GEN_SG, and fitted linear trend") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted 6.5 log ELEC_GEN_SG, and fitted linear trend ehat1 <- residuals(lin_trend_mdl) resid_diag(ehat1)

14 Intermediate Econometrics / Time Series Analysis and Forecasting Sample ACF of residuals series Residuals Lag The misspecification shows up clearly in the residual and residual sample ACF. Quadratic Trend Model Y=log(ElecGen.ts) quad_trend_mdl <- dynlm(y ~ trend(y, scale=f) + I(trend(Y, scale=f)^2)) summary(quad_trend_mdl) Time series regression with "ts" data: Start = 1983(1), End = 2017(12) Call: dynlm(formula = Y ~ trend(y, scale = F) + I(trend(Y, scale = F)^2)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.476e e <2e-16 *** trend(y, scale = F) 8.441e e <2e-16 *** I(trend(Y, scale = F)^2) e e <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 417 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 2.378e+04 on 2 and 417 DF, p-value: < 2.2e-16

15 Intermediate Econometrics / Time Series Analysis and Forecasting 15 Yhat2 <- fitted(quad_trend_mdl) autoplot(cbind(y, Yhat2)) + ylab("") + xlab("log ELEC_GEN_SG, and fitted quadratic trend") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted 6.5 log ELEC_GEN_SG, and fitted quadratic trend ehat2 <- residuals(quad_trend_mdl) resid_diag(ehat2) 0.1 Sample ACF of residuals series Residuals Lag The cycles, and especially the seasonalities, show up clearly in the residual plots. Seasonality also shows up clearly in the residual autocorrelation plot; a 12-month seasonality is reflected in a significant autocorrelation at lag 12. We will discuss seasonality in more detail later. It is possible to design much more complex deterministic trends, including ways where we don t have to specify the form in advance. We leave that to later discussions, and restrict ourselves to simple deterministic trends for now.

16 Intermediate Econometrics / Time Series Analysis and Forecasting 16 Stochastic Trends Another way to describe a trending series is to say that the change in Y t every period is β 0 plus some zero mean noise term: Y t Y t 1 = β 0 + ɛ t, ɛ t zero mean i.i.d. noise or Y t = β 0 + Y t 1 + ɛ t, ɛ t zero mean i.i.d. noise. If β 0 is positive, for instance, then in every period, Y t changes by β 0 on average, so will be trending upwards. This simple mechanism for describing a trend is worth exploring in some detail, and is important enough to have a special name the random walk. You can get a good sense of the behavior of such a Y t by starting from a fixed point, say Y 0 = 0 (doesn t have to be zero) and manually computing Y 1, Y 2, etc. We have Y 1 = β 0 + Y 0 + ɛ 1 Y 2 = β 0 + Y 1 + ɛ 2 = 2β 0 + Y 0 + ɛ 2 + ɛ 1 Y 3 = β 0 + Y 2 + ɛ 3 = 3β 0 + Y 0 + ɛ 3 + ɛ 2 + ɛ 1. Y t = β 0 + Y t 1 + ɛ t = Y 0 + β 0 t + ɛ t + + ɛ 2 + ɛ 1. We see that Y t also has a deterministic trend term Y 0 + β 0 t. The main difference between this and the deterministic trend + noise is that in the random walk case, the noise terms stack up. If we take the simple case where var[ɛ t ] = σɛ 2 then the variance of Y t (conditional on the starting point) is tσɛ 2, which increases without bound. If β 0 = 0, then there is no deterministic trend, and conditional on any given starting point Y 0, the series has mean Y 0, and an increasing variance (without bound) around this value. The β 0 = 0 case is called a pure random walk, and the β 0 0 case is called a random walk with drift. We simulate one hundred pure random walks and one hundred random walks with drift, each of length 80 observations, all starting from Y 0 = 0, as an illustration. # Simulate set.seed(1306) nrep <- 100 nobs <- 80 b0 <- 0.5 RW <- matrix(rep(0,nrep*nobs), nrow=nobs) RWd <- RW e <- matrix(rnorm(nrep*nobs), nrow=nobs)

17 Intermediate Econometrics / Time Series Analysis and Forecasting 17 for (i in 2:nobs){ RW[i,] <- RW[i-1,] + e[i,] RWd[i,] <- b0 + RWd[i-1,] + e[i,] } RW.df <- data.frame(t=1:nobs, RW) RW.df <- gather(rw.df, Series, vals, -t) # plot pthm <-theme(aspect.ratio=1/1.5, legend.position="none", axis.text = element_text(size=10), axis.title.y = element_text(size=10), axis.line = element_line(linetype="solid"), panel.background=element_blank()) p1 <- ggplot(data=rw.df, aes(x=t, y=vals, group=series, color=series)) + geom_line() + ggtitle("random Walks") + ylab("") + xlab("") + pthm RWd.df <- data.frame(t=1:nobs, RWd) RWd.df <- gather(rwd.df, Series, vals, -t) p2 <- ggplot(data=rwd.df, aes(x=t, y=vals, group=series, color=series)) + geom_line() + ggtitle("random Walks with Drift") + ylab("") + xlab("") + pthm grid.arrange(p1, p2, nrow=1) Random Walks Random Walks with Drift While it is obvious that the random walk with drift will show a trend, what is not so obvious is that we can also observe trends in the random walk without drift. We sample a few of the simulated pure random walks from our simulated examples and plot them below. We see a wide range of patterns, including what appears to be upward trends, downward trends, cycles, mean shifts, and so on. This is possible because of the increasing variance over time. We sometimes refer to the random walk without drift as a stochastic trend, which is why we use the alternative word drift in the case of

18 Intermediate Econometrics / Time Series Analysis and Forecasting 18 the random walk with drift. RW.df %>% filter(series %in% c("x10", "X20", "X30", "X40", "X50", "X60")) %>% ggplot(aes(x=t, y=vals)) + geom_line() + theme(panel.background = element_rect(fill="white"), panel.border = element_rect(linetype="solid", fill=na), strip.background.x = element_blank()) + facet_wrap(~series, scales = "free") X10 X20 X vals X40 X50 X t A quick summary: A Deterministic (Linear) Trend Model: Y t = α 0 + α 1 t + ɛ t A Stochastic Trend Model (Pure Random Walk): Y t = Y t 1 + ɛ t A Stochastic Trend Model with Drift (Random Walk with Drift): Y t = β 0 + Y t 1 + ɛ t In deterministic trend models, more complicated deterministic trend lines f(t) than a linear one can be specified. We can write the pure random walk as Y t = Y 0 + ɛ t + ɛ t ɛ 1

19 Intermediate Econometrics / Time Series Analysis and Forecasting 19 We can write the random walk with drift as Y t = Y 0 + β 0 t + ɛ t + + ɛ 2 + ɛ 1 We see that the random walk with drift model is equivalent to a deterministic linear time trend + stochastic trend; the drift term generates the linear deterministic trend (other trend lines can be specified too). All of these trend models can be embedded into larger models with cycles and seasonalities. For the moment, we focus on these pure trend models. We have already seen an example each of OLS estimation of linear and quadratic deterministic trend models. We won t be concerned at this stage with the properties of OLS estimators of such models at this stage; suffice it to say it works well. The pure random walk model has nothing to estimate, apart from the variance of the noise term; to do so, simply take the sample variance of Y t = Y t Y t 1 = ɛ t, the first difference of Y t. Likewise, to estimate the drift term in the random walk with drift, find the sample mean of Y t = Y t Y t 1 = β 0 + ɛ t. Notice that first differencing of the random walk with drift removes both the stochastic trend and deterministic linear trend: Y t = Y t Y t 1 is simply β 0 + ɛ t. This is a theme that will recur throughout our discussions. What happens if we try to first difference a (pure) linear deterministic time trend model? Since Y t = α 0 + α 1 t + ɛ t, we have Y t Y t 1 = (α 0 + α 1 t + ɛ t ) (α 0 + α 1 (t 1) + ɛ t ) = α 1 + ɛ t ɛ t 1. The linear deterministic trend is also removed, but the noise term becomes ɛ t ɛ t 1 ; we will discuss the consequences of this at a later stage. What happens if we try to first difference a (pure) quadratic deterministic time trend model? Some simple algebra leads to Y t Y t 1 = (α 1 α 2 ) + 2α 2 t + ɛ t ɛ t 1. which is still a trending series. First differencing removes stochastic and deterministic linear time trends. If there is no stochastic trend in the first place, then the term ɛ t ɛ t 1 is introduced by the first differencing operation. It will not have escaped your notice that the covariance-stationary AR(1) and the random walks are very similar: they are both AR(1) models, the difference being that β 1 < 1 in the stationary case, whereas β 1 = 1 in the random walk cases, with or without drift. In the stationary case, the unconditional mean and variances are constant, and for any k, the unconditional autocorrelation is constant. In the non-stationary case, the variance (conditional on Y 0 ) increases without bound; the unconditional variance is not finite, and therefore the random walks are not stationary. We also saw that the stationary AR(1) is weakly dependent: ρ k 0 as k. The random walk is not weakly

20 Intermediate Econometrics / Time Series Analysis and Forecasting 20 dependent: the autocovariance (conditional on a fixed Y 0 ), is cov(y t, Y t+h ) = E[(ɛ t+h + + ɛ 2 + ɛ 1 )(ɛ t + + ɛ 2 + ɛ 1 ) = tσ 2 ɛ Dividing by the standard errors of Y t and Y t+k (with Y 0 fixed), we have ρ k = t/(t + k). As t increases to infinity, this autocorrelation at lag k goes to one (even if k is very large). The reason for the differences in behaviour is that in the covariance stationary AR(1) case, the effect of any given error term diminishes over time. Dropping the constant term from the AR(1) for convenience, we have Y t+k = β 1 Y t+k 1 + ɛ t+k, β 1 < 1, = ɛ t+k + β 1 ɛ t+k 1 + β 2 1ɛ t+k β k 1 ɛ t + β k+1 1 Y t 1, so the effect of error term ɛ t on Y t+k diminishes over time. In the β 1 = 1 case, the errors persist forever. If Y t follows a pure deterministic trend model, is it covariance-stationary? No, it is not, since if Y t = f(t) + ɛ t where f(t) is not a constant function, and ɛ is iid zero-mean, then E[Y t ] = f(t), which is not constant over time (it depends on t). However, Y t f(t) is simply the the iid zero-mean noise term, which is stationary. We call such series trend-stationary. Notice that the random walk with drift, which is non-stationary, remains non-stationary even after the deterministic trend component is removed. It is therefore not trend-stationary. The random walk (without or without drift) are instead referred to as a difference-stationary series, by which we mean taking firstdifferences leads to a (i) stationary process, (ii) without introducing the ɛ t ɛ t 1 term into the equation. Seasonalities The seasonality in the electricity generation series was noticeable from the time series plot, although clearly the dominant feature is the trend. Detrending, by fitting a quadratic trend, which for the moment we assume is appropriate, and subtracting the fitted trend from the series, results in a series where the seasonality is very prominent. We reproduce the plots of the log-transformed ElecGen series below, as well as the results of the quadratic trend fit, and detrending. Quadratic Trend Model, repeated from earlier Y=log(ElecGen.ts) quad_trend_mdl <- dynlm(y ~ trend(y, scale=f) + I(trend(Y, scale=f)^2)) Yhat2 <- fitted(quad_trend_mdl) autoplot(cbind(y, Yhat2)) + ylab("") + xlab("log ELEC_GEN_SG, and fitted quadratic trend") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) +

21 Intermediate Econometrics / Time Series Analysis and Forecasting 21 scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted 6.5 log ELEC_GEN_SG, and fitted quadratic trend ElecGen_l_dt.ts <- residuals(quad_trend_mdl) # ElecGen, log transformed, detrended resid_diag(elecgen_l_dt.ts) 0.1 Sample ACF of residuals series Residuals Lag Another way to visualize seasonalities is to use seasonal subseries plots. We do so here for both the original log-transformed series as well as the detrended version. seas_thm <- theme(aspect.ratio = 1/2, axis.title.x = element_blank(), panel.background = element_blank(), axis.text.x = element_text(angle=90, vjust=0.3), axis.title = element_blank(), title = element_text(size=10)) p1 <- ggsubseriesplot(log(elecgen.ts)) + seas_thm + ggtitle("log(elecgen.ts)") p2 <- ggsubseriesplot(elecgen_l_dt.ts) + seas_thm + ggtitle("log(elecgen.ts), detrended") grid.arrange(p1, p2, nrow=1)

22 Intermediate Econometrics / Time Series Analysis and Forecasting log(elecgen.ts) 0.1 log(elecgen.ts), detrended Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec In both plots, we see that monthly amount of electricity generated varies systematically over the months of the year. Less electricity is generated in months with fewer days. This is what characterizes seasonalities from cycles. Seasonals are very regular periodic patterns in the data that occur for systemic reasons. Often, the amplitude of the seasonals are also quite steady over time, but not necessarily so. We can model seasonalities in a number of ways. We consider here a method that treats the seasonal factors as deterministic and constant over time. This method is called the seasonal dummies model. A seasonal dummy is a indicator variable, taking value 1 during a particular season (in this case, a particular month), and zero everywhere else. Therefore a february seasonal dummy is zero everywhere, except for february observations where it takes value 1. The march seasonal dummy is a series that is zero everywhere except for march observations where it takes value 1, and so on. The seasonal dummy model can take many equivalent forms, a few of which are shown below. Y t = β 1 d jan,t + β 2 d feb,t β 12 d dec,t + ɛ t = α 0 + α 2 d feb,t α 12 d dec,t + ɛ t = δ 0 + δ 2 (d feb,t 1 12 ) δ 12(d dec,t 1 12 ) + ɛ t where d month,t are the monthly seasonal dummies. We show the interpretation of the first equation. You can easily show that all three equations are equivalent, with only the interpretation of the parameters changing. In the first equation, the equation is Y t = β 1 + ɛ t when t is a Jan observation Y t = β 2 + ɛ t when t is a Feb observation. Y t = β 12 + ɛ t when t is a Dec observation

23 Intermediate Econometrics / Time Series Analysis and Forecasting 23 The parameters β 1, β 2,..., β 12 therefore represent the mean value of Y t for the months January, February,..., December respectively, and the OLS estimates are simply the sample average values of Y t in each of those months. In the following, we fit the second form. Y <- ElecGen_l_dt.ts fit_seas <- dynlm(y~season(y)) # automatically generated seasons summary(fit_seas) Time series regression with "ts" data: Start = 1983(1), End = 2017(12) Call: dynlm(formula = Y ~ season(y)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** season(y)feb < 2e-16 *** season(y)mar e-08 *** season(y)apr ** season(y)may e-16 *** season(y)jun e-06 *** season(y)jul e-13 *** season(y)aug e-10 *** season(y)sep ** season(y)oct e-11 *** season(y)nov season(y)dec Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 408 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 11 and 408 DF, p-value: < 2.2e-16

24 Intermediate Econometrics / Time Series Analysis and Forecasting 24 Yhat <- fitted(fit_seas) autoplot(cbind(y, Yhat)) + ylab("") + xlab("log ELEC_GEN_SG detrended, and fitted seasonals") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted 0.2 log ELEC_GEN_SG detrended, and fitted seasonals ehat <- residuals(fit_seas) # ElecGen, log transformed, detrended resid_diag(ehat) Sample ACF of residuals series Residuals Lag We do not have to apply the deterministic quadratic trend and seasonality separately. We can model them simultaneously.

25 Intermediate Econometrics / Time Series Analysis and Forecasting 25 Y <- log(elecgen.ts) fit_seas2 <- dynlm(y~trend(y,scale=f) + I(trend(Y,scale=F)^2) + season(y)) # automatically generated seasons summary(fit_seas2) Time series regression with "ts" data: Start = 1983(1), End = 2017(12) Call: dynlm(formula = Y ~ trend(y, scale = F) + I(trend(Y, scale = F)^2) + season(y)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.456e e < 2e-16 *** trend(y, scale = F) 8.437e e < 2e-16 *** I(trend(Y, scale = F)^2) e e < 2e-16 *** season(y)feb e e < 2e-16 *** season(y)mar 4.306e e e-08 *** season(y)apr 2.392e e ** season(y)may 6.373e e e-15 *** season(y)jun 3.508e e e-06 *** season(y)jul 5.827e e e-13 *** season(y)aug 4.976e e e-10 *** season(y)sep 2.192e e ** season(y)oct 5.114e e e-11 *** season(y)nov e e season(y)dec 3.808e e Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 406 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 9878 on 13 and 406 DF, p-value: < 2.2e-16

26 Intermediate Econometrics / Time Series Analysis and Forecasting 26 Yhat2 <- fitted(fit_seas2) autoplot(cbind(y, Yhat2)) + ylab("") + xlab("log ELEC_GEN_SG, and fitted trend and seasonals") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) + ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted log ELEC_GEN_SG, and fitted trend and seasonals ehat2 <- residuals(fit_seas2) resid_diag(ehat2) Sample ACF of residuals series Residuals Lag After applying the quadratic trend and seasonal dummies, there appears to be some cyclical component left over. We consider modelling this shortly. There are many ways of modelling and measuring seasonality in time series. Some, like

27 Intermediate Econometrics / Time Series Analysis and Forecasting 27 the seasonal dummy model, treat the seasonals as deterministic, whereas others allow stochastic treatments. We shall come to other methods later. For now, we will note that many statistical agencies release both seasonally adjusted and non-seasonally adjusted data. The adjustment methods are usually much more sophisticated than the seasonal dummies model (though in many instances the seasonal dummy models do a very good job at modelling seasonalities). Putting Cycles, Deterministic Trends, Seasonal Dummies together We have put together a model combining seasonal dummies and a quadratic trend and applied it to the log(elecgen) series. We saw that cyclical patterns were left over in the residuals. We can easily put together a model with deterministic trends, seasonal dummies, and a stationary AR(1) cyclical term. We might write such a model as Y t = α 0 + α 1 t + α 2 d feb,t α 12 d dec,t + υ t, υ t = ρυ t 1 + ɛ t, ρ < 1, ɛ t zero-mean, iid. Looking at the error term υ t, we see that it has a convariance-stationary zero-mean AR(1) structure, meaning that is fluctuates around the zero-line. If ρ > 0, then we would expect boom-bust cyclical patterns. The entire model itself can be expressed as Y t = deterministic trend + seasonals + stationary cycles. Such a model can be estimated directly, but to use OLS, we have to re-write the equation. Taking Y t ρy t 1 we get Y t ρy t 1 = α 0 + α 1 t + α 2 d feb,t α 12 d dec,t + υ t ρα 0 ρα 1 (t 1) ρα 2 d feb,t 1... ρα 12 d dec,t 1 ρυ t 1 = δ 0 + δ 1 t + δ 2 d feb,t δ 12 d dec,t + υ t ρυ t 1 Y t = δ 0 + δ 1 t + δ 2 d feb,t δ 12 d dec,t + ρy t 1 + ɛ t The δs are slightly messy functions of the original parameters; you are asked to fill in the details in the exercises. We have made use of the fact that υ t ρυ t 1 = ɛ t, and d feb,t 1 = d mar,t and so on. The re-parameterized equation can be easily estimated by OLS, and will be sufficient, unless estimates of the original parameters are required. Y <- log(elecgen.ts) fit_seas3 <- dynlm(y~trend(y,scale=f) + I(trend(Y,scale=F)^2) + season(y) + L(Y)) # automatically generated seasons summary(fit_seas3) Time series regression with "ts" data:

28 Intermediate Econometrics / Time Series Analysis and Forecasting 28 Start = 1983(2), End = 2017(12) Call: dynlm(formula = Y ~ trend(y, scale = F) + I(trend(Y, scale = F)^2) + season(y) + L(Y)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.695e e e-14 *** trend(y, scale = F) 2.202e e e-13 *** I(trend(Y, scale = F)^2) e e e-13 *** season(y)feb e e < 2e-16 *** season(y)mar 1.177e e < 2e-16 *** season(y)apr e e season(y)may 4.913e e < 2e-16 *** season(y)jun e e season(y)jul 3.544e e e-11 *** season(y)aug 9.812e e season(y)sep e e * season(y)oct 3.800e e e-12 *** season(y)nov e e e-10 *** season(y)dec 7.084e e L(Y) 7.378e e < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 404 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1.991e+04 on 14 and 404 DF, p-value: < 2.2e-16 Yhat3 <- fitted(fit_seas3) autoplot(cbind(y, Yhat3)) + ylab("") + xlab("log ELEC_GEN_SG, and fitted trend and seasonals") + scale_color_manual(labels = c("actual", "Fitted"), values=c("black", "black")) + aes(size=series) + scale_size_manual(labels = c("actual", "Fitted"), values = c(0.5, 1.2)) +

29 Intermediate Econometrics / Time Series Analysis and Forecasting 29 ts_thm + theme(aspect.ratio = 1/2) series Actual Fitted log ELEC_GEN_SG, and fitted trend and seasonals ehat3 <- residuals(fit_seas3) resid_diag(ehat3) Sample ACF of residuals series Residuals Lag We see that the model does a very good job at fitting the data. There is little discernible pattern remaining in the residuals. There are some significant autocorrelations in residuals, but apart from the autocorrelation at lag 1, the rest are probably not worth worrying about even if the true autocorrelations are zero, there is a five percent chance of observing a significant sample autocorrelation, since the bands represent a 95% confidence interval. Converting the equation Y t = α 0 + α 1 t + α 2 d feb,t α 12 d dec,t + υ t, υ t = ρυ t 1 + ɛ t, ρ < 1.

30 Intermediate Econometrics / Time Series Analysis and Forecasting 30 to Y t = δ 0 + δ 1 t + δ 2 d feb,t δ 12 d dec,t + ρy t 1 + ɛ t makes the model easy to estimate by OLS. However, it is often easier to express the model in the previous form, with the cycles expressed in the error term, as the parameters are easier to interpret. Nonetheless, understanding the relationship between the two forms is useful. Can we put a stochastic trend (with or without drift) together with cycles? In a sense, yes. Suppose we write Y t = β 0 + Y t 1 + υ t, υ t = ρυ t 1 + ɛ t, ρ < 1. This model has exactly the form of a random walk with drift, except that the error term is not i.i.d, but a zero-mean AR(1) term. It will nonetheless display random walk with drift characteristics. However, taking first differences leads to Y t Y t 1 = β 0 + υ t, υ t = ρυ t 1 + ɛ t, ρ < 1. a constant plus a zero-mean covariance-stationary AR(1), which fluctuates cyclically around the constant value. The original series thus has a deterministic linear trend (from the drift term), a stochastic trend, and a covariance-stationary cycle all intertwined. Notice that had we quasidifferenced the original model, we would have gotten Y t = β 0 + Y t 1 + υ t ρβ 0 ρy t 2 ρυ t 1 Y t ρy t 1 = β 0 (1 ρ) + Y t 1 ρy t 2 + +υ t ρυ t 1 Y t = β 0 (1 ρ) + (1 + ρ)y t 1 ρy t 2 + +ɛ t The model is still autoregressive but now with two lags: we have an AR(2) model. This particular AR(2) turns out to be non-stationary (it has a stochastic trend), but there are also covariancestationary AR(2), and ARs of higher orders. We will explore this class of models in more detail in a later set of notes.

31 Intermediate Econometrics / Time Series Analysis and Forecasting 31 Exercises 1. Models such as the AR(1) may look odd in that there is no X variable on the right-hand side. Models such as the AR(1) do not aim to relate two variables, but aim only to describe the behavior of a single variable. They are univariate models. Nonetheless, they can be viewed as reduced-forms of larger structural models that do describe relationships between multiple variables. As an example, suppose Q s t = α 0 + α 1 P t + α 2 P t 1 + ɛ s t Q d t = β 0 + β 1 P t + ɛ d t Q s t = Q d t (supply equation) (demand equation) (market clearing) Apply the market clearing equation by equating Q s t and Q d t, and solve for P t as a function of P t 1 to show that when viewed as a univariate time series, P t behaves as an AR(1). 2. Show that the following three ways of writing the seasonal dummies model are equivalent Y t = β 1 d jan,t + β 2 d feb,t β 12 d dec,t + ɛ t = α 0 + α 2 d feb,t α 12 d dec,t + ɛ t = δ 0 + δ 2 (d feb,t 1 12 ) δ 12(d dec,t 1 12 ) + ɛ t Interpret the parameters of the second and third forms. 3. Show that the two models (I) Y t = β 0 + β 1 Y t 1 + ɛ t, ɛ t zero mean iid (II) Y t = c + υ t, υ t = ρυ t 1 + ɛ t, ɛ t zero mean iid are equivalent. Find the relationships between the two sets of parameters. 4. Show that the two models, for quarterly data, are equivalent (I) Y t = α 0 t + α 1 d 1,t + α 2 d 2,t + α 3 d 3,t + α 4 d 4,t + α 5 Y t + ɛ t, ɛ t zero mean iid (II) Y t = β 0 t + β 1 d 1,t + β 2 d 2,t + β 3 d 3,t + β 4 d 4,t + υ t, υ t = ρυ t 1 + ɛ t, ɛ t zero mean iid. Find the relationships between the two sets of parameters. The variables d i,t, i = 1, 2, 3, 4 are quarterly seasonal dummies. 5. Is the pure (monthly) seasonal dummies model Y t = α 0 + α 2 d feb,t α 12 d dec,t + ɛ t, ɛ t zero mean iid,

32 Intermediate Econometrics / Time Series Analysis and Forecasting 32 covariance stationary? Is it trend-stationary? What is the correlation between Y t and Y t k? 6. Starting from a fixed point Y 0, show that if Y t = β 0 + β 1 t + Y t 1 + ɛ t, ɛ t zero mean iid then Y t behaves like a deterministic quadratic trend plus stochastic trend. Computer Exercises 7.(a) Convert the series IP_SG from the dataset timeseries_monthly.csv into an R time series object IP_SG.ts. Plot IP_SG.ts and log(ip_sg.ts). Plot the sample autocorrelation function for log(ip_sg.ts) for up to 36 lags. Explain in words why you think you observe this pattern. (b) Plot diff(log(ip_sg.ts)), the first difference of log(ip_sg.ts). Plot the sample ACF of diff(log(ip_sg.ts)). Is there any evidence of seasonality in the sample ACF? Why do you think this evidence wasn t observed in the sample ACF in part (a)? (c) Fit a seasonal dummies model to diff(log(ip_sg.ts)). Did it capture the seasonalities well? 8. (a) Take the first difference of the log of the singapore tourist arrival series from the timeseries_monthly.csv file. Implement the seasonal dummies model for this series, using any special dummy variables needed for the outlier observations. Did the seasonal dummy model successfully capture the seasonality in this series? Is there any evidence of an AR(1) in the residuals? (b) Regardless of your answers to part (a), fit a season dummies + AR(1) model for this series, using any special dummy variables needed for the outlier observations. Analyze the fit of your model.

Forecasting with ARIMA models This version: 14 January 2018

Forecasting with ARIMA models This version: 14 January 2018 Notes for Intermediate Econometrics / Time Series Analysis and Forecasting Anthony Tay Elsewhere we showed that the optimal forecast for a mean