STOR 356: Summary Course Notes Part III

Size: px
Start display at page:

Download "STOR 356: Summary Course Notes Part III"

Transcription

1 STOR 356: Summary Course Notes Part III Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC April 23, ESTIMATION OF TIME SERIES MODELS The aim of this chapter is to discuss different estimation methods for time series models. The starting point is that we have a specific model in mind either AR(p) for given p or ARMA(p, q) for given p and q (or more rarely, MA(q) for given q usually we don t set out with the objective of fitting an MA model, but if model selection criteria for ARMA lead us to p = 0, that is what we are left with) and we would like to fit the model. Therefore, the main focus is on actual estimation of the model parameters. In Section 1.7, we list several commonly used criteria for choosing the model. 1.1 Yule-Walker Used for AR(p) processes. We assume (as in most places in this course) that the series has mean 0 in most cases this is achieved by subtracting the sample mean X from every observation before beginning the analysis. The Yule-Walker method starts with the equation p X t φ j X t j = Z t, (1) j=1 where Z t W N[0, σ 2 ]. Taking the covariance with X t k for k 1, we deduce p γ X (k) φ j γ X (k j) = 0. (2) j=1 If (2) is evaluated for k = 1, 2,..., p, we get p linear equations in p unknowns, which we write in vector-matrix notation as where Γ p φ = γ p (3) 1

2 Γ p is the p p matrix whose (i, j) entry is γ X (i j), φ is the vector ( φ 1 φ 2... φ p ) T, γ p is the vector ( γ X (1) γ X (2)... γ X (p) ) T, Also by applying (2) to the case k = 0, we get p γ X (0) φ j γ X (j) = σ 2. (4) j=1 (The right hand side in this case is Cov{X t, Z t }. But by writing X t = j=0 ψ j Z t j and noting that ψ 0 = 1, this reduces to the variance of Z t, which is σ 2.) In practice we substitute sample estimates ˆΓ p for Γ p, γˆ p for γ p, to deduce ˆφ = 1 ˆΓ p γˆ p, (5) ˆσ 2 = ˆγ X (0) ˆφ T γˆ p (6) We also have the approximate result ˆφ N [ ] φ, σ2 n Γ 1 p (7) in other words, the vector of estimates ˆφ has an approximately normal distribution with mean φ and covariance matrix σ2 n Γ 1 p. This is used in determining standard errors for the parameters. See the parallel discussion of the information matrix in Section below. 1.2 Burg s Method Burg s method is a method for AR processes that is similar to Yule-Walker, but based on the PACF. We don t try to describe the details here, but the basic idea is that it is an alternative to the Yule- Walker method, that usually leads to a model with lower AICC (see Section 1.7), and closer to the maximum likelihood estimates. Therefore, although it is less intuitive than the Yule-Walker method and harder to calculate by hand, it is probably a better method overall. 1.3 Innovations Algorithm This algorithm is primarily used for the estimation of MA models. It is offered as one of the choices for ARMA models by the ITSM package. 1.4 Hannan-Rissanen Method This is a procedure that tries to extend the Yule-Walker equations to general ARMA processes. The estimators are not as good as general MLEs, but they are useful as a starting point for the MLE algorithm (see Section 1.5.3). The idea is to extend (1) to the equation X t = p q φ i X t i + θ j Z t j + Z t. (8) i=1 j=1 2

3 In other words, we regress X t on X t 1,..., X t p, Z t 1,..., Z t q, treating Z t as a residual error term. The disadvantage of this is that we need initial estimates of Z t 1,..., Z t q to get the procedure started. Usually this is done by first fitting an AR(p ) model for some p p + q, using that model to estimate the residuals, and treating these are starting values for the Z s in (8). Then, once an initial model is fitted, the Z s are recalculated and the analysis repeated. This can be done several times until the procedure converges. There is still the issue that the analysis does not deal well with the earlier observations in the series (we can only start (8) at t = max(p, q) + 1), and in that respect, it is less efficient than methods that use all the data. 1.5 Method of Maximum Likelihood The Method of Maximum Likelihod is the most commonly used method for estimating parameters in all of statistics. The method is covered in one of the STOR courses (555), but since that isn t a prerequisite for this course, I won t assume students have previously heard about it. In view of that, I first describe how maximum likelihood is applied to two common distributions, the binomial and the normal, and then go on to consider the time series case Example 1: Binomial Distribution Suppose X has a binomial distribution with parameters n (known) and p (unknown). This means that Pr {X = k} = n! k!(n k)! pk (1 p) n k, k = 0, 1, 2,,..., n. (9) The method of maximum likelihood states that to estimate p given the observed value X = k, we choose the value ˆp that maximizes (9). It s a little simpler if we work with the log likelihood: Differentiate (10) twice: l(p) = log Pr {X = k ; p} n! = log + k log p + (n k) log(1 p). (10) k!(n k)! From (11), we see that l (p) = 0 when k p = n k 1 p, which reduces to l (p) = k p n k 1 p, (11) l (p) = k p 2 n k (1 p) 2. (12) ˆp = k n (13) which is the answer we expected, e.g. a course like STOR 155 teaches that we estimate a population proportion by the sample proportion, which is exactly what (13) says. Moreover, from (12), we see at once that l (p) < 0, so it really is a local maximum. [Strictly speaking, this argument fails 3

4 when k = 0 or n, but then we prove it is a local maximum directly. For example, if k = 0 then l(p) = n log(1 p), which is decreasing over 0 p 1, and has a maximum at p = 0.] However, we can do more with the second derivative. Suppose we evaluate l (p) at p = ˆp: l (ˆp) = k n2 k 2 (n k) n 2 (n k) 2 = n3 k(n k) n = ˆp(1 ˆp). (14) However, let s also note that from the standard (e.g. STOR 155) formula for the variance of a binomial distribution, Var(ˆp) = p(1 p) n which in practice (p being unknown) we approximate by Comparing (14) with (15), we observe that: Var(ˆp) The variance of ˆp is approximately { l (ˆp)} 1. ˆp(1 ˆp) n This turns out to be a general property of maximum likelihood estimators Example 2: Normal distribution Suppose X 1,..., X n are independent random variables, each normal with unknown mean µ and variance τ. Note that normally we would write σ 2 instead of τ, but in this case, to avoid possible confusion over whether we are differentiating with respect to σ or σ 2, I write τ. The density of a normal (µ, τ) random variable is 1 2πτ e (x µ)2 /(2τ). (15) The joint density of X 1,..., X n is the product of this for X 1,..., X n : { f(x 1,..., X n ; µ, τ) = (2πτ) n/2 exp 1 } (Xi µ) 2. (16) 2τ [This is a consequence of independence: any statement about the values of X 1,..., X n can be expressed as a product of probabilities for X 1,..., X n individually. When the answer is written as a probability density, we get (16).] Again we take logarithms of (16), so l n (µ, τ) = f(x 1,..., X n ; µ, τ) = n 2 log(2π) n 2 log τ 1 (Xi µ) 2. (17) 2τ 4

5 We then calculate l n (µ, τ) µ l n (µ, τ) τ = 1 τ (Xi µ), (18) = n 2τ + 1 2τ 2 (Xi µ) 2. (19) (18) is 0 when ˆµ = 1 n Xi = X, which again agrees with the STOR 155 solution. If we substitute µ = X and set (19) equal to 0, we get ˆτ = (Xi X) 2 which however is different from the STOR 155 solution: in that course you were taught to divide by n 1 instead of n. In fact, what they told you in STOR 155 was right: it s better to divide by n 1 because that leads to an unbiased estimator (the definition of an unbiased estimator is that if the experiment is repeated many times over, the long-run average of the estimator is equal to its true value). However the two estimators are almost the same when n is large, and that is what really counts with maximum likelihood estimators (or MLEs for short). MLEs lead to approximately the most efficient estimators in large samples, but they are not necessarily best in small samples. Nevertheless, in most practical estimation problems (including nearly all of the estimation problems that arise in time series analysis), we simply have no way to calculate an estimator that is exactly unbiased for every n, whereas maximum likelihood is a very general technique that works in very many cases. Let s now extend (18) and (19) to the calculation of second-order derivatives. Again, the case of practical interest is when we set µ and τ to the MLEs, so we do that in the following calculation: n (20) 2 l n (µ, τ) µ 2 = n τ, (21) 2 l n (µ, τ) µ τ = 1 (Xi τ 2 µ) = 0, (22) 2 l n (µ, τ) τ 2 = n 2τ 2 1 τ 3 (Xi µ) 2 = n 2τ 2 (23) Let s assemble this as a matrix which we ll call H: H = 2 l n(µ,τ) µ 2 2 l n(µ,τ) 2 l n(µ,τ) µ τ 2 l n(µ,τ) = µ τ τ ( ) 2 n τ 0 n. 0 2τ 2 (24) Since H is diagonal, it s easy to calculate its inverse: ( τ H 1 = n 0 2τ 0 2 n ). (25) 5

6 At this point you can recognize that τ n is exactly the variance of X according to standard (STOR 155) statistical theory. Moreover, although you may not instantly recognize it, it s also true that 2τ 2 n is very nearly the variance of ˆτ. [ If you feel confident about the chi-squared distribution, here s the argument. Suppose µ is known. Then (Xi µ) 2 τ has a chi-squared distribution with n degrees of freedom, denoted χ 2 n. This distribution has mean n and variance 2n. Therefore, τ n E { (Xi µ) 2 } τ (Xi µ) 2 n = τ n n = τ has a mean of and a variance of τ 2 { n 2 Var (Xi µ) 2 } τ = τ 2 2τ 2 (2n) = n2 n. (Xi X) 2 This is more complicated when µ is unknown, when the correct statement is that τ has a χ 2 n 1 distribution. I ll leave you to figure out exactly what this implies for the variance of ˆτ. It s ] still correct that for large n, the variance is approximately 2τ 2 n. The general principle here is this: the matrix H is called the information matrix, and is calculated to be the matrix of second-order partial derivatives of l, evaluated at the MLE. Its inverse is an approximation to the covariance matrix of the estimators. In particular, the diagonal entries of H 1 are approximately the variances of the individual parameter estimates, in this case, ˆµ and ˆτ. Our purpose in going through the calculations was to illustrate how the general principle works in some well-known examples Time Series Models We state without proof the following fact (see also equation (5.2.1), page 158 of the course text). Suppose X 1,..., X n are random variables that have a normal distribution with mean 0, but instead of being independent, they have a covariance matrix Γ n. Then the joint density of X 1,..., X n is { L n = (2π) n/2 Γ n 1/2 exp 1 } 2 XT Γ 1 n X. (26) [ Here X denotes the vector X 1. X n, and indicates the determinant of a matrix. In this case, the principle of maximum likelihood says we choose the unknown parameters of the time series model to maximize L n. In a little more detail, this is what that means. Suppose we have an ARMA(p, q) model of mean 0, i.e. X t φ 1 X t 1... φ p X t p = Z t + θ 1 Z t θ q Z t q where Z t W N[0, σ 2 ]. There are m = p + q + 1 unknown parameters, 6 ]

7 φ 1,..., φ p, θ 1,..., θ q, σ 2. For convenience we write this as a single vector β = ( φ 1... φ p θ 1... θ q σ 2 ) T. Theory developed earlier in the course has shown how to calculate all the autocovariances as functions of β. Therefore, we can calculate the matrix Γ n as a function of β. We substitute this in (26). The result is written L n (β). We choose β to maximize this expression (or more often, its logarithm, l n (β)). In practice, this cannot be done analytically and we have to employ a numerical search, and for that purpose, it s useful if we have at least reasonably good estimates as starting values for the numerical search. This is the reason why other estimation methods, such as the Yule-Walker method or Burg s algorithm (in the case of AR models) are often used as preliminary estimates prior to running MLE. However, MLE itself is not dependent on the use of any particular algorithm: there are many possible methods to calculate it numerically. Properties of MLE. Let H n denote the matrix of second-order derivatives of l n (β) = log L n (β), evaluated at β = ˆβ. This is known as the information matrix associated with this particular likelihood function. Also let V n = Hn 1 (i.e. the usual matrix inverse). Then V n is approximately the variance-covariance matrix of ˆβ. In particular, if we write v 11 v v 1m v 21 v v 2m V n =.....,. v m1 v m2... v mm then v 11, v 22,..., v mm are approximately the standard deviations of ˆβ 1,..., ˆβ m. In popular terminology they are called the standard errors. In particular, these are the numbers shown as standard errors when ITSM fits an ARMA model by maximum likelihood. There is an analogy here with concepts that are well familiar from STOR 355. In that course, a linear regression model of the form Y = Xβ + ɛ, where the vector ɛ consists of independent errors with mean 0 and variance σ 2, was solved with an estimator ˆβ = (X T X) 1 X T Y, and you were taught that the covariance matrix of ˆβ is (X T X) 1 σ 2. Moreover, the square roots of the diagonal entries of that covariance matrix were used to give standard errors of the individual ˆβ j s. The use of V n as a covariance matrix in the present setting is analogous, except that the answers are only approximate not exact The Prediction Error Decomposition There is one other trick to learn about the use of MLE for time series models: usually the formula (26) is not calculated exactly, but through an alternative approach known as the prediction error decomposition. The idea is as follows. Suppose X 1 has a normal distribution with mean ˆX 1 = 0 and variance v 0. Typically we would just take v 0 to be γ X (0), the variance of a typical X t under the stationary distribution. Now suppose: The optimal prediction of X 2 given X 1 has mean predictor ˆX 2 and prediction variance (i.e. mean squared prediction error) v 1, The optimal prediction of X 3 given X 1, X 2 has mean predictor ˆX 3 and prediction variance v 2, 7

8 and so on up to The optimal prediction of X n given X 1,..., X n 1 has mean predictor ˆX n and prediction variance v n 1. Then an alternative way to write (26) is 1 2πv0 e X2 1 /(2v 0) 1 2πv1 e (X 2 ˆX 2 ) 2 /(2v 1) 1 2πvn 1 e (Xn ˆX n) 2 /(2v n 1 ) L n = { = (2π) n/2 (v 0 v 1...v n 1 ) 1/2 exp 1 n (X i ˆX i ) 2 }. (27) 2 v i=1 i 1 In practice the calculation of the ˆX t s and the v t 1 s follow the method for prediction from the finite past that was covered in Chapter 2 of the course text. The advantage of (27) over (26) is that when n is large, say of the order of thousands, calculating Γ n and Γ 1 n takes a lot of computer time and memory, whereas the recursive formulas used to calculate ˆX t and v t 1 are extremely fast. (If you read the whole of Chapter 2 rather than just the part of it that we summarized for this course, you will find detailed discussion of specific algorithms for this, such as the Durbin-Levinson method.) After taking account of such recursive formulas, (27) is much faster to calculate than (26) Homework Problem For an AR(1) process, X t = φx t 1 + Z t, Z t N[0, σ 2 ], find an expression for the exact MLE of φ and compare it with the Yule-Walker estimator. (For simplicity, assume that the mean of the process is known µ = 0 and that the variance σ 2 is known, though in practice we would also estimate these along with φ.) 1.6 Diagnostics Given estimates of the model parameters, one-step ahead forecasts ˆX t, t = 1,..., n and forecasting mean squared errors ˆv t 1, t = 1,..., n (given a hat solely because they are based on estimated parameters), we can form residuals Ŵ t = X t ˆX t ˆvt 1, t = 1,..., n. (28) If the model is correct, these should be approximately white noise. You can look at residuals in much the same way you look at residuals from a regression model try any way you can think of to look for deviations from the underlying white noise assumption, because any such deviation is indicative of the wrong model, either identifying the wrong p and q in the ARMA order selection, or failing in some other aspect of the initial analysis of the data (e.g. failing to transform the data in case of an obviously non-constant variance, or failure to perform differencing or trend removal in the case of long-term trends). Specific techniques include 8

9 Plotting the residuals looking for trends, outliers, evidence of heteroscedasticity, etc. QQ plots to test normality (note that ITSM also gives you an option of plotting a QQ-plot to test for a t distribution useful when you suspect the distribution may be long-tailed, in which case a t distribution may fit better than a normal distribution). Plotting the sample ACF and PACF if the model is correct, most of these should be within the confidence bands. Formal testing of randomness e.g. Ljung-Box test for autocorrelation, turning points test for a trend, Jarque-Bera test for normality (recall Section 1.6 of the text for a full discussion of these tests). 1.7 Automated Criteria for Model Selection The section addresses the issue of how to select the order of the model either the value of p for an AR(p) model, or p and q for ARMA(p, q) FPE In the case of AR models, a standard criterion is forward prediction error (usually abbreviated FPE). The idea is to come up with an approximation to the mean squared error of ˆXt the one-step forecast of X t given the entire past up to time t 1. The general theory of Chapter 2 shows that if the model is correctly identified, the mean squared error should be σ 2, the variance of the white noise process. But that won t be achieved in reality, because we don t know the exact parameters φ 1,..., φ p. Instead, we only have estimates ˆφ 1,..., ˆφ p. Asymptotic arguments (see text for further detail) lead to the approximation E {(X t ˆX t ) 2} ˆσ 2 n + p n p. (29) The right hand side of (29) is called FPE, and the FPE criterion essentially says we choose p to minimize this. Note that (29) balances two factors: as we increase p, we should expect ˆσ 2 to decrease, for the same reason as in regression models (the more covariates we add to the model, the smaller we should expect the residual mean squared error to be). However this is compensated by the n+p n p term, which increases as p increases. The two terms compensate and in most cases lead to a sensible choice of p AIC, AICC and BIC For general ARMA(p, q) models, the simple correction (29) no longer works, but there are a number of alternatives that aim to do the same thing, i.e. balance the decrease in estimated mean squared error that usually happens when the model order increases, with some penalty term that stops the model order getting too large. Note that an ARMA(p, q) model actually has m = p + q + 1 unknown terms, including σ 2 (but not including the mean µ as most other places in this course, we assume the series is centered to mean 0 before beginning the formal analysis). All these criteria start by calculating L n (ˆβ) the value of the maximize likelihood when the MLE ˆβ is substituted for the true unknown β. For simplicity we abbreviate L n (ˆβ) to ˆL n. 9

10 The criteria are the we choose p and q to minimize one of the following: AIC: 2 log ˆL n + 2m, AICC: 2 log ˆL n + 2mn n m 1, BIC: 2 log ˆL n + m log n. Note that the formula for BIC differs from what is presented in the text. Roughly, the distinction is this: AIC stands for Akaike Information Criterion, and was originally proposed by the Japanese statistician Akaike as an extension of the FPE criterion to ARMA models. However the objective is the same as FPE to minimize the mean squared error of a one-step prediction after taking into account the effect of estimating model parameters. AICC 2mn n m 1 is the corrected AIC the correction from 2m to is supposed to be a more accurate approximation, but it s still trying to do the same thing. BIC stands for Bayesian Information Criterion and was originally proposed as the basis for an approximation to the probability that a particular model is correct from a Bayesian statistics viewpoint. However the main thing that BIC is recognized for these days is that in large samples, it leads to consistent model selection if one of the models is actually correct, then as the sample size n tends to, the probability that BIC chooses the correct model tends to 1 (under a number of qualifying conditions that we don t try to list here). However this isn t a perfect criterion either we may not actually believe that any of our models is actually correct in an absolute sense, and all of these criteria are more guidelines than absolute rules. In practice, the important thing to be aware of is that BIC typically leads to a lower-order model being selected than either AIC or AICC. All of the model selection criteria should be viewed primarily as giving a range of models that is reasonable. If things are working well, the actual quantities of interest (e.g. future predictions) should be not too sensitive to the exact determination of the model, but by using AICC and BIC to guide the choice of models, we have a means of checking up on that. ITSM makes it especially convenient to use AICC because the Autofit option uses it, but we should really consider AICC alongside other checks, including BIC and more informal checking procedures based on residuals. 2 NONSTATIONARY AND SEASONAL TIME SERIES 2.1 ARIMA models Definition: Suppose X t is a time series. If Y t = (1 B) d X t is ARMA(p, q), then X t is said to be ARIMA(p, d, q). Another way of writing this is φ (B)X t = θ(b)z t, φ (B) = (1 B) d φ(b), Z t W N[0, σ 2 ], (30) the point being that φ (B) would not be a causal operator if taken on its own (because of the factor 1 B), but in the form (30), it is perfectly reasonable to require that φ(b) be causal and θ(b) be invertible, which are the same conditions as we have seen in earlier chapters. 10

11 The point about ARIMA processes is that we can often model a process with a trend in ARIMA form, when a direct ARMA model would fail. This point is reinforced by the following: Comment. If m(t) is a polynomial of degree d 1, then (1 B) d m(t) = 0. This is true because of the following. First, consider the case m(t) = c (a constant). This is a polynomial of degree 0, while (1 B)m(t) = c c = 0, so the comment is true when d = 1. Now consider a component the form t p for some p 1. We have (1 B)t p = t p (t 1) p = t p (t p pt p 1 p(p 1) + t p 2... ± 1) 2 = pt p 1 p(p 1) t p ) 2 which is a polynomial of degree p 1. Thus, if m(t) is a polynomial of degree d then (1 B)m(t) is a polynomial of degree d 1. Iterating, we deduce (1 B) d 1 is a polynomial of degree 0 (i.e. a constant) and then one further application of 1 B yields the result. The point about this example is that we could have a time series of form X t = m(t) + Y t, with m(t) a polynomial and Y t stationary so that (1 B) d X t = (1 B) d Y t and can be modeled as a stationary time series An Example This is a simulated example based on one in the course text, but has been worked out separately for this discussion. AR(1) Process, phi=0.9 ARIMA(1,1,0) Process, phi=0.9 y x Index Index Figure 5. Left: Plot of AR(1) series Y t with φ = 0.9. Right: the integrated series X t = Y 1 + Y Y t. Figure 5 (left) shows a simulated AR(1) model with Y t with φ = 0.9, and (right) the integrated form X t = Y Y t (so (1 B)X t = Y t ). The two series are on the course webpage at 11

12 and The first comment is that the two series do not look at all alike even though Y t is itself a highly autocorrelated series, X t has much smoother sample paths over long time periods (in other words, it would be reasonable to conclude just from visual inspection that Y t is an autocorrelated but still stationary time series, whereas X t is not stationary at all). This is further reinforced by the ACFs of the two series (Figure 6). Nevertheless, it s difficult in practice to decide whether a series is stationary or not (see later section on Unit Root Processes for some formal tests) so it s worth doing some more comparisons. Indeed, the PACF of X t could well lead one to the conclusion that the process with either AR(2) or AR(3). Series Yt Series Yt ACF Partial ACF Lag Lag Series Xt Series Xt ACF Partial ACF Lag Lag Figure 6. ACF and PACF of Y t, X t series. Here are some direct comparisons, run in ITSM. The Autofit option applied to X t indeed leads to the AR(2) model, fitted as X t = 1.888X t X t 2 + Z t (31) 12

13 with σ 2 = (as compared with true value σ 2 = 1) and an AICC of However if we check for stationarity in other words, solve the equation φ(z) = z z 2 = 0 we find solution z = ± i which has z = 1.056, quite close to the region of noncausality (recall that z > 1 is a necessary condition for the process to be causal). We also find that the parameter estimates are sensitive to the method of estimation. The Burg estimates in this case are X t = 1.895X t X t 2 + Z t (32) with σ 2 = 1.012, AICC=299.4, not very different from the MLEs, but the Yule-Walker estimates are with σ 2 = 3.41, AICC=416.4, which looks very different. X t = 1.259X t X t 2 + Z t (33) On the other hand an ARIMA(1,1,0) model fit (in ITSM, first do Transform, then difference at lag 1, then Autofit) leads to which is also equivalent to Y t = Y t 1 + Z t (34) X t = X t X t 2 + Z t, (35) quite similar to either (31) or (32), but without the problems of instability, and incidentally with lower AICC (292.3). Thus the conclusion in this case is that although either the AR(2) model applied to X t or the AR(1) model applied to Y t both seem to be plausible options, the latter model leads to a more stable representation of the time series and is identified by AICC as the better model (as does BIC, but not, surprisingly, FPE). 2.2 Identification Techniques Transformations A transformation of the data may be appropriate when the dat are clearly not normally distributed (e.g. because the distribution is highly left or right skewed), or in cases of heteroscedasticity (think back to the Australian wine data at the beginning of the course the variance was clearly increasing with time, but when we took logarithms, the variance was approximately constant, as it should be for a stationary time series model). Common transformations include Logarithmic self-explanatory Box-Cox named after a famous paper by George Box and David Cox in 1964, this refers to the transformation { X λ t 1 f λ (X t ) = λ if λ 0 (36) log X t if λ = 0 ITSM allows the range 0 λ

14 The point of the specific representation (36) is that the limit of xλ 1 λ as λ 0 is precisely log x, so (36) is a continuous family of transformations (in other words, continuous in λ). However when smoothness near λ = 0 is not an issue, (36) may be replaced by a simple X λ t for λ 0. Other possible transformations are Decomposition of X t into the sum of trend, seasonal and stationary terms, Differencing, Fitting a polynomial and/or harmonic regression as part of an initial model fit, then fit a stationary time series model to the residuals Identification and Estimation The broad strategy is that after making an initial transformation to make the series stationary, an ARMA(p, q) model is identified using one of several model identification techniques, such as FPE, AICC or BIC. In the technical language of time series analysis, identification means specifically how to choose p and q. The ITSM Autofit command makes this especially easy by automatically searching over a range of models to minimize AICC. Note, however, that you have to specify the maximum p and q, and this can be problematic in some cases. An alternative strategy is the subset model approach, that starts with a high-order ARMA model and selectively sets certain coefficients to 0 (rather like backward variables selection in standard regression). The Constrain optimization command in ITSM allows such models to be fitted as part of the maximum likelihood procedure. The text (pp ) contains a detailed example of these techniques applied to the Australian wine data. 2.3 Unit Roots We have already seen in Section 2.1 that it can be problematic to fit an AR model when the φ(b) operator contains a term 1 B. This is called the unit root problem for the following reason: if we search for the roots of the autoregressive polynomial φ(z), we will find one at z = 1 (because φ contains a factor 1 z). But we have already seen in Chapter 3 that one of the conditions for a causal process is that all the roots satisfy z > 1. So this case is on the boundary between causal and non-causal. Unit root tests are a class of statistical tests designed to detect this situation. In fact there are two types of unit root problem unit roots in the AR component, and unit roots in the MA component The Dickey-Fuller Test One of the earliest successful solutions of this problem was the Dickey-Fuller test, published in Suppose we have an AR(1) process with unknown mean, X t µ = φ 1 (X t 1 µ) + Z t, Z t W N[0, σ 2 ]. (37) 14

15 For large sample size n, the MLE ˆφ has an approximate normal distribution with mean φ 1 and variance 1 φ2 1 n. However, this is not applicable when φ 1 = 1. Dickey and Fuller instead set it up as a hypothesis testing problem: test the null hypothesis H 0 : φ 1 = 1 against the alternative H 1 : φ 1 1. We may write where φ 0 = µ(1 φ 1) and φ 1 = φ 1 1. X t = X t X t 1 = φ 0 + φ 1X t 1 + Z t, t 2, (38) Suppose ˆφ 1 is the ordinary least squares (OLS) estimator of φ 1 in other words, we just treat (38) as a standard regression equation, using SAS or any other regression program to regress X t on X t 1. The estimated standard error is SE( ˆφ 1) = S nt=2 (X t 1 X) 2 where S is the residual standard deviation, Consider S 2 = ( nt=2 X t ˆφ 0 ˆφ ) 2 1 X t 1. n 3 ˆτ µ = ˆφ 1 SE( ˆφ (39) 1 ). In normal regression terminology, ˆτ µ would be called the t statistic for ˆφ 1 and have a t n 3 distribution (which, in turn, is approximately N[0,1] if n is large). However, the rather peculiar structure of the problem (38), where X t and X t 1 are not independent of one another but have rather a complicated interdependence, means that ˆτ µ does not have a normal distribution, even asymptotically as n. Instead, Dickey and Fuller figured out the correct asymptotic distribution, which is henceforth destined forever to be known as the Dickey-Fuller distribution. The most important thing to know about this distribution is that its.01,.05 and.10 quantiles are 3.43, 2.86, For example, the Dickey-Fuller test would reject H 0 at the 5% level of significance if ˆτ µ < 2.86 (instead of ˆτ µ < 1.645, which would be correct in the case of an asymptotic N[0,1] distribution). Note that it s a one-sided test: the case φ 1 > 1 doesn t really come into consideration, because in that case the process would grow exponentially, and we don t need a formal test for that. The real case of interest is between φ 1 = 1 and φ 1 < 1, but the latter case corresponds to φ 1 = φ 1 1 < Extension to the general AR(p) case If we have an AR(p) model X t µ = φ 1 (X t 1 µ) φ p (X t p µ) + Z t, 15

16 we rewrite this as X t = φ 0 + φ 1X t 1 + φ 2 X t φ p X t p+1 + Z t, (40) where φ 0 = µ(1 φ 1... φ p ), φ 1 = p 1 φ i 1, φ j = p i=j φ i, j = 2,..., p. If there exists a unit root, then 0 = φ(1) = φ 1, so this becomes the natural null hypothesis. As in the AR(1) case, we form the t statistic (39) using a standard regression package, but the asymptotic distribution is again the Dickey-Fuller distribution, not the normal distribution Example SAS Code It should be possible to adapt the following straightforwardly to your own examples. *** apply unit root tests to "sales" data; *** this version for AR(1) model; options ls=64 ps=45 nonumber label; *** insert your own data name, file name and path directory; data sales; infile d:/my Documents/itsm2000/sales.tsm ; input y; array vara(0:1) y0 y1; vara(1)=vara(0); retain y0 y1; ; vara(0)=y; ydif=y0-y1; run; ; proc reg data=sales; model ydif=y1; run; ; *** this version for AR(2) model; options ls=64 ps=45 nonumber label; data sales; infile d:/my Documents/itsm2000/sales.tsm ; input y; array vara(0:2) y0 y1 y2; do i=2 to 1 by -1; VarA(i)=VarA(i-1); end; retain y0 y1 y2; ; vara(0)=y; ydif=y0-y1; 16

17 ydif1=y1-y2; run; ; proc reg data=sales; model ydif=y1 ydif1; run; ; *** this version for AR(3) model; options ls=64 ps=45 nonumber label; data sales; infile d:/my Documents/itsm2000/sales.tsm ; input y; array vara(0:3) y0 y1 y2 y3; do i=3 to 1 by -1; VarA(i)=VarA(i-1); end; retain y0 y1 y2 y3; ; vara(0)=y; ydif=y0-y1; ydif1=y1-y2; ydif2=y2-y3; run; ; proc reg data=sales; model ydif=y1 ydif1 ydif2; run; ; Analysis of Sales Dataset An initial analysis of the SALES.TSM dataset in ITSM led to the following conclusions. First, the Burg algorithm was used in connection with AICC to select the best-fitting AR model. The resulting model was AR(5): X t = X t X t X t X t X t 5 + Z t. The corresponding AICC value is However, it is easily checked that the sum of the AR coefficients is very nearly 1 ( 5 i=1 ˆφi = ). If it was exactly 1, we would have a unit root process. So there is a strong suspicion that the process is unit root, though we have not yet conducted a formal test for this. It s also possible to look for the best-fitting ARMA model using the Autofit command (without differencing). This led to an ARMA(4,4) model with an accompanying AICC of An incidental comment here is that it shows that Autofit does not always produce the best-fitting model (because 17

18 the AR(5) model is still better according to AICC). This reflects the fact that the search algorithms used to calculate the maximum likelihood estimates do not work perfectly, a caution with any use of Autofit or similar procedures. After differencing, the Burg algorithm yields an optimal AR model of AR(4) (AICC=515.8), while Autofit determines that the optimal model is ARMA(1,1) with AICC= Thus, both from AICC and from the ease of the fitting procedure (also from other indicators, not shown here, such as ACF/PACF plots and residual analysis), we conclude that the analysis works better with differencing than without, but this still leaves open the question of a formal unit root test if such a test led to acceptance of the null hypothesis (the null hypothesis in this case being that there is a unit root), we would be fully justified in differencing. We therefore now give the results of the ASA analyses for unit root tests with this dataset. The first program (AR(1)) produces the following table of estimates: Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept y The second program (AR(2)) produces the following table of estimates: Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept y ydif The third program (AR(3)) produces the following table of estimates: Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept y ydif ydif In all three cases, φ 1 is the coefficient of y1 (i.e. the coefficient of X t on X t 1 in our time series notation) and the corresponding t Value is the value of ˆτ µ (respectively, 0.17, 0.48, 0.66). You should ignore the Pr > t column for this variable because this is based on the standard t distribution, which we have already seen is not valid in the unit root context, but none of the three values of ˆτ µ is significant according to the Dickey-Fuller test, so we accept the null hypothesis of a unit root. As for the other parameters, we note that in the AR(3) model, the coefficients of ydif1 and ydif2 are both significant (for these parameters, the standard t distribution is approximately valid), so presumably, we should retain at least those terms. 18

19 Finally, we give the corresponding AR(5) analysis which exactly corresponds to the optimal AR analysis for the undifferenced data, according to the earlier ITSM analyses using AICC as a model-selection criterion. The key part of the SAS code is now array vara(0:5) y0 y1 y2 y3 y4 y5; do i=5 to 1 by -1; VarA(i)=VarA(i-1); end; retain y0 y1 y2 y3 y4 y5; ; vara(0)=y; ydif=y0-y1; ydif1=y1-y2; ydif2=y2-y3; ydif3=y3-y4; ydif4=y4-y5; run; ; proc reg data=sales; model ydif=y1 ydif1 ydif2 ydif3 ydif4; run; ; and leads to the following: Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept y ydif ydif ydif ydif The value of ˆτ µ (in the table, the t Value associated with the y1 variable) is now 1.01, still clearly not significant according to the Dickey-Fuller test (in other words, we accept the null hypothesis that there is a unit root). The t Values associated with ydif1 through ydif4 range from 0.99 to Unlike y1, the conventional t distributions for these variables are approximately satisfied, so the conclusion is that not all of these variables are statistically significant. On the other hand, if we drop ydif3 from the above analysis, the t Values associated with ydif1, ydif2 and ydif4 respectively are 2.71, 1.97, Thus, one interpretation of the result is that after differencing, the AR(4) model probably is the correct model though the coefficient φ 3 is not significant. 19

20 2.4 Forecasting ARIMA Models As an example, consider the DOW JONES dataset within ITSM. We fitted that as an ARIMA(1,1,0) model (in other words, difference the data first, the use Autofit which selects an AR(1) model for the differenced data). Then select Forecasting, ARMA, and the option Forecast the undifferenced data. The plot of the data and forecasts (with 95% prediction bounds) is in Figure 7. Index Time Figure 7. Plot of Dow Jones index, with forecasts and 95% forecast intervals for next 10 observations. The table of forecasts is as follows: Approximate 95 Percent Prediction Bounds Step Prediction sqrt(mse) Lower Upper E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E+03 In this section, we describe how these forecasts are computed, especially, the mean squared errors. 20

21 Let s explain exactly how these forecasts and their MSEs were calculated. The last six observations of the Dow Jones series are ( is most recent.) Differencing once, the last 5 observations of the differenced series are The overall mean of the differenced series is (obtained from ITSM). Therefore, subtracting the overall mean from every observation yields Moreover, ITSM tells us that the maximum likelihood estimates of the fitted AR(1) model are ˆφ 1 = , σ 2 = Let s do the forecasts h steps ahead up to h = 4. Standard theory from earlier in the course says the forecasts of Y t+h for h = 1, 2, 3, 4 are (φ 1 Y t, φ 2 1 Y t, φ 3 1 Y t, φ 4 1 Y t). Substituting φ 1 = , Y t = yields Note: If the model was something more complicated than AR(1), then at this step of the calculation you d have to do the optimal forecast for whatever model was being used this is the main place in this calculation that the method would be different for a general ARMA(p, q) process. Now add back in the sample mean to get forecast values of Y t+h with the mean back in: Thus, the forecast values of X t are: = = = = These agree to two decimal places with the ITSM forecasts reproduced earlier. Next, we state the formula for the mean squared prediction errors. If we write the ARIMA(p, d, q) model in the form φ (B)X t = θ(b)z t where φ (B) = (1 B) d φ(b), φ(b) and θ(b) being the usual AR and MA polynomials associated with the process (1 B) d X t, then we formally define ψ (B) = θ(b) φ (B) = ψ 0 + ψ 1B + ψ 2B 2 + ψ 3B (ψ 0 = 1) (41) 21

22 The mean squared prediction error of the optimal predictor P t X t+h (the predictor of X t+h given the infinite past, X s, s t) is given by MSP E = E {(X t+h P t X t+h ) 2} h 1 = σ 2 (ψj ) 2. (42) Example. In the ARIMA(1,1,0) model used for the Dow Jones dataset, we have Thus we have etc. ψ (B) = 1 (1 B)(1 φ 1 B) j=0 = (1 + B + B 2 + B )(1 + φ 1 B + φ 2 1B 2 + φ 3 1B ) = 1 + (1 + φ 1 )B + (1 + φ 1 + φ 2 1)B 2 + (1 + φ 1 + φ φ 3 1)B ψ 0 = 1, ψ 1 = 1 + φ 1, ψ 2 = 1 + φ 1 + φ 2 1, ψ 3 = 1 + φ 1 + φ φ 3 1, (43) When we substitute our earlier estimates for φ 1 and σ 2 into (43) and then (42), the values of MSPE for h = 1, 2, 3, 4 become , , , , agreeing to the first five decimal places with the ITSM results given earlier. Why does any of this work? Because the φ operator is not causal, the expansion (41) should not be valid in most cases it won t even be true that ψj 0 as j. (So this isn t one of those cases where the infinite series really is convergent but we just didn t take the trouble to check it. It s not convergent period.) However it works because the expansion (41) is valid if we just look at it term by term and don t worry about convergence. We ll just do the d = 1 case here higher values of d are similar. Assume Y t = (1 B)X t = j=0 ψ j Z t j where Z is white noise. Then X t+h = X t + Y t+1 + Y t Y t+h = X t + j=h 1 ψ j h+1 Z t+h j + j=h 2 ψ j h+2 Z t+h j ψ j Z t+h j. j=0 The coefficient of Z t+h j is ψ j = h k=h j ψ j h+k = j ψ k. (44) k=0 22

23 However in the standard ARMA notation, we define ψ(b) = j=0 ψ j B j = θ(b) φ(b) where θ( ) and φ( ) are the MA and AR operators of Y. The relationship between ψ (B) and ψ(b) is ψ (B) = ψ(b) 1 B = (1 + ψ 1 B + ψ 2 B 2 + ψ 3 B )(1 + B + B 2 + B ) = 1 + (1 + ψ 1 )B + (1 + ψ 1 + ψ 2 )B 2 + (1 + ψ 1 + ψ 2 + ψ 3 )B (45) Comparing (44) and (45), we see that ψ j is precisely the coefficient of Bj in (45), agreeing with (41). The final step is to explain why (44) implies (42). By the definition of ψ j in (44), we have h 1 X t+h = X t + ψj Z t+h j + terms that depend on Z s, s t. (46) j=0 Since, as noted many times through the course, the predictor of X t+h from time t is essentially defined by setting all the values of Z t+h j, j = 0,..., h 1 equal to 0, we will have X t+h P t X t+h = h ψj Z t+h j j=0 and the result (42) follows. 2.5 Seasonal ARIMA Models General motivation: If a time series is seasonal with period s, then we may have correlations that operate at multiples of s as well or instead of the usual autocorrelations at small lags. This suggests time series models where the autoregressive and moving average operators include terms in B s as well as B. Comment 1. B s X t = X t s. Thus, terms of this form reflect effects that take place at time intervals corresponding to compleye cycles of a cyclic/seasonal process, but without assuming the process is exactly cyclic (a deficiency of many simple trend + seasonal factor + noise models). Example: Seasonal economic variables such as house prices are often the corresponding month in the previous year, rather than the most recent month. Thus, statistically we are interested in X t X t 12 rather than X t X t 1. That, in turn, suggests trying to model the series in terms of (1 B 12 )X t rather than (1 B)X t. Comment 2. Throughout our discussion of seasonal time series, we shall assume that the period s is known. Thus for monthly data with an annual cycle, s = 12. If s is unknown then we really need a different theory (although we didn t cover Chapter 4 in this course, that chapter, on spectral analysis, covers the techniques that are required for a systematic treatment). Now let s describe the models we are using. The general model is called SARIMA, for seasonal autoregressive integrated moving average. The order of the model is written (p, d, q) (P, D, Q) s 23

24 to indicate that the regular ARIMA components are of orders p, d, q, and the seasonal ARIMA components (those that depend on B s ) are of orders P, D, Q. The form of the model is where Fitting SARIMA Models φ(b)φ(b s )(1 B) d (1 B s ) D X t = θ(b)θ(b s )Z t (47) φ(b) = 1 φ 1 B... φ p B p, Φ(B s ) = 1 Φ 1 B s... Φ P B P s, θ(b) = 1 + θ 1 B θ q B q, Θ(B s ) = 1 + Θ 1 B s Θ Q B Qs. (48) The algorithms in ITSM rely on the fact that any SARIMA model can be rewritten as a regular ARIMA but with constraints on the parameters. As an example, consider the ARIMA(0, 0, 1) (0, 0, 1) 12 model Y t = (1 + θ 1 B)(1 + Θ 1 B 12 )Z t = (1 + θ 1 B + Θ 1 B 12 + θ 1 Θ 1 B 13 )Z t. (49) This is an MA(13) model where we fix θ 2 = θ 3 =... = θ 11 = 0, identify θ 12 with Θ 1, and fix θ 13 = θ 1 θ 12. We can fix a model of this form in ITSM by maximum likelihood, using the Constrain Optimization option. Example. Consider the DEATHS dataset from Problem 3.9. We first difference at lag 1 and at lag 12, forming the series Y t = (1 B)(1 B 12 )X t. In Problem 3.9, we considered the model The question naturally arises: which is better, (49) or (50)? Y t = (1 + θ 1 B + θ 12 B 12 )Z t. (50) As a start, let us fit the MA(12) model without any constraints. In ITSM, under the Innovations algorithm we get estimates MA Coefficients Ratio of MA coeff. to 1.96 * (standard error) This shows clearly that of all the MA coefficients, only θ 1 and θ 12 are statistically significant. We therefore refit the model using only those coefficients, i.e. model (50). In ITSM, first select Specify to select the initial MA(12) model, with all coefficients 0 (you may need reset the model to p = q = 0 and then use Specify again to achieve this). 24

25 Next, select Estimation followed by Max likelihood followed by Constrain optimization. Highlight the coefficients Theta(2) through Theta(11) (in other words, these will be in blue, while Theta(1) and Theta(12) are white). Return to the main box and click OK. The model is fitted and leads to ˆθ 1 = (standard error ) and ˆθ 12 = (standard error ). The value of -2Log(Likelihood) is and AICC is (Your answers may differ very slightly from these but should not differ by very much.) Now let s fit the model (49). To do this, first return to Specify and set the model as AR(13), with all coefficients 0. Follow Estimation, Max likelihood and Constrain optimization, and highlight the coefficients Theta(2) through Theta(11) in blue, as before. Then, under Specify multiplicative relations, enter 1 in the box Number of relations. In the next line, enter 1 12 = 13. Click OK to complete the specification and then OK a second time to fit the model. You will now find the model with θ 1 = , θ 12 = and θ 13 = (note that these satisfy the constraint θ 13 = θ 1 θ 12 ). The standard errors of θ 1 and θ 12 are and respectively. In this case -2Log(Likelihood) is 849.1, and AICC is 855.5, both slightly smaller (i.e. better) than the corresponding values from model (50). Our final conclusion is therefore that model (49) is best, with estimates ˆθ 1 = (standard error ), and ˆΘ 1 = (standard error ). 2.6 Regression with ARMA Errors Model: In matrix notation, Y t = k x tj β j + W t, j=1 φ(b)w t = θ(b)z t. (Z t W N[0, σ 2 ]) (51) Y = Xβ + W, (52) where (in contrast to the usual situation with linear regression) the error vector W is not uncorrelated noise but has some non-trivial covariance matrix Γ n. Since W t is ARMA, Γ n itself may be computed by methods seen earlier in this course. The question for the present section is: how does this affect the estimation of the regression component, i.e. the vector β? Definition of ordinary least squares (OLS): ˆβ OLS = (X T X) 1 X T Y, ( ) Cov ˆβOLS = (X T X) 1 X T Γ n X(X T X) 1. (53) In the special case Γ n = σ 2 I n, the covariance in (53) reduces to (X T X) 1 σ 2, the usual formula in regression. The alternative is generalized least squares (GLS): ˆβ GLS = (X T Γ 1 n ( ) Cov ˆβOLS X) 1 X T Γ 1 n Y, = (X T Γ 1 n X) 1. (54) 25

26 This is BLUE (Best Linear Unbiased Estimator) in the sense that if β is some other unbiased linear estimator then for all vectors c of length k, In particular, (55) applies when β = ˆβ) OLS Maximum Likelihood Estimation Var(c T β) Var(c T ˆβ)GLS (55) The general principle is a direct extension of the method of maximum likelihood given in Chapter 5. We can write the covariance matrix Γ n as Γ n (φ, θ, σ 2 ) to indicate explicitly the dependence on the ARMA parameters φ, θ, σ 2. Of course, the vector of regression coefficients, β, is also an unknown parameter. The joint likelihood is { L(β, φ, θ, σ 2 ) = (2π) n/2 (det Γ n ) 1/2 exp 1 2 (Y Xβ)T Γ 1 n } (Y Xβ). (56) This is a direct extension of formula (5.2.1) of the course text, where instead of X n (which, in the context of formula (5.2.1), represented a time series of length n) we now write Y βx to denote the residuals of the observed series Y on the regression function βx. The maximum likelihood estimates are those which maximize (56). These are found numerically, through search algorithms similar to those used for ARMA processes without the regression component. In practice, we often do a two-stage fit: first estimate β by OLS, then maximize (56) with respect to φ, θ, σ 2 holding β fixed (this part is operationally identical with the regular maximum likelihood procedure found in Chapter 5), then repeat the estimation of β using GLS. This is more or less the way you have to do it in ITSM. If desired, the process can be repeated to improve the estimation of φ, θ, σ 2. Example 1: Lake Huron Data This is the LAKE dataset that we saw previously refer to pages 6 and 7 of the first set of course notes, where we pointed out (a) that there is an apparently significant linear trend; (b) the residuals are autocorrelated in a way apparently consistent with an AR(2) model; (c) that the regression analysis could lead to different results if the autocorrelation was taken into account. However at that point of the course, we did not have a mean of estimating a regression function with autocorrelated errors. The first and obvious analysis is a linear trend of the form Y t = β 1 + β 2 t + W t, (57) In the context of (51), this is equivalent to taking k = 2, x t1 = 1, x t2 = t. Model (57) could be fitted as a linear regression using SAS, or directly in ITSM, by the following procedure: After loading the data from LAKE.TSM into ITSM, click on Regression, then Specify. A window comes up: after Polynomial Regression, in the box labelled Enter Order, change 0 to 1. 26

ITSM-R Reference Manual

ITSM-R Reference Manual ITSM-R Reference Manual George Weigt February 11, 2018 1 Contents 1 Introduction 3 1.1 Time series analysis in a nutshell............................... 3 1.2 White Noise Variance.....................................

More information

STOR 356: Summary Course Notes

STOR 356: Summary Course Notes STOR 356: Summary Course Notes Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 7599-360 rls@email.unc.edu February 19, 008 Course text: Introduction

More information

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong STAT 443 Final Exam Review L A TEXer: W Kong 1 Basic Definitions Definition 11 The time series {X t } with E[X 2 t ] < is said to be weakly stationary if: 1 µ X (t) = E[X t ] is independent of t 2 γ X

More information

STAT Financial Time Series

STAT Financial Time Series STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR

More information

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis Chapter 12: An introduction to Time Series Analysis Introduction In this chapter, we will discuss forecasting with single-series (univariate) Box-Jenkins models. The common name of the models is Auto-Regressive

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends

More information

Ch 6. Model Specification. Time Series Analysis

Ch 6. Model Specification. Time Series Analysis We start to build ARIMA(p,d,q) models. The subjects include: 1 how to determine p, d, q for a given series (Chapter 6); 2 how to estimate the parameters (φ s and θ s) of a specific ARIMA(p,d,q) model (Chapter

More information

FE570 Financial Markets and Trading. Stevens Institute of Technology

FE570 Financial Markets and Trading. Stevens Institute of Technology FE570 Financial Markets and Trading Lecture 5. Linear Time Series Analysis and Its Applications (Ref. Joel Hasbrouck - Empirical Market Microstructure ) Steve Yang Stevens Institute of Technology 9/25/2012

More information

Time Series I Time Domain Methods

Time Series I Time Domain Methods Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT

More information

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1 Forecasting using R Rob J Hyndman 2.4 Non-seasonal ARIMA models Forecasting using R 1 Outline 1 Autoregressive models 2 Moving average models 3 Non-seasonal ARIMA models 4 Partial autocorrelations 5 Estimation

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis Model diagnostics is concerned with testing the goodness of fit of a model and, if the fit is poor, suggesting appropriate modifications. We shall present two complementary approaches: analysis of residuals

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

Introduction to Time Series Analysis. Lecture 11.

Introduction to Time Series Analysis. Lecture 11. Introduction to Time Series Analysis. Lecture 11. Peter Bartlett 1. Review: Time series modelling and forecasting 2. Parameter estimation 3. Maximum likelihood estimator 4. Yule-Walker estimation 5. Yule-Walker

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Autoregressive Moving Average (ARMA) Models and their Practical Applications Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:

More information

at least 50 and preferably 100 observations should be available to build a proper model

at least 50 and preferably 100 observations should be available to build a proper model III Box-Jenkins Methods 1. Pros and Cons of ARIMA Forecasting a) need for data at least 50 and preferably 100 observations should be available to build a proper model used most frequently for hourly or

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Empirical Market Microstructure Analysis (EMMA)

Empirical Market Microstructure Analysis (EMMA) Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg

More information

Forecasting. Simon Shaw 2005/06 Semester II

Forecasting. Simon Shaw 2005/06 Semester II Forecasting Simon Shaw s.c.shaw@maths.bath.ac.uk 2005/06 Semester II 1 Introduction A critical aspect of managing any business is planning for the future. events is called forecasting. Predicting future

More information

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA CHAPTER 6 TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA 6.1. Introduction A time series is a sequence of observations ordered in time. A basic assumption in the time series analysis

More information

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL B. N. MANDAL Abstract: Yearly sugarcane production data for the period of - to - of India were analyzed by time-series methods. Autocorrelation

More information

Univariate ARIMA Models

Univariate ARIMA Models Univariate ARIMA Models ARIMA Model Building Steps: Identification: Using graphs, statistics, ACFs and PACFs, transformations, etc. to achieve stationary and tentatively identify patterns and model components.

More information

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Basics: Definitions and Notation. Stationarity. A More Formal Definition Basics: Definitions and Notation A Univariate is a sequence of measurements of the same variable collected over (usually regular intervals of) time. Usual assumption in many time series techniques is that

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

MAT3379 (Winter 2016)

MAT3379 (Winter 2016) MAT3379 (Winter 2016) Assignment 4 - SOLUTIONS The following questions will be marked: 1a), 2, 4, 6, 7a Total number of points for Assignment 4: 20 Q1. (Theoretical Question, 2 points). Yule-Walker estimation

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them. TS Module 1 Time series overview (The attached PDF file has better formatting.)! Model building! Time series plots Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book;

More information

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr R Tsay An effective procedure for building empirical time series models is the Box-Jenkins approach, which consists of three stages: model

More information

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

A SARIMAX coupled modelling applied to individual load curves intraday forecasting A SARIMAX coupled modelling applied to individual load curves intraday forecasting Frédéric Proïa Workshop EDF Institut Henri Poincaré - Paris 05 avril 2012 INRIA Bordeaux Sud-Ouest Institut de Mathématiques

More information

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn } Stochastic processes Time series are an example of a stochastic or random process Models for time series A stochastic process is 'a statistical phenomenon that evolves in time according to probabilistic

More information

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010 Problem Set 2 Solution Sketches Time Series Analysis Spring 2010 Forecasting 1. Let X and Y be two random variables such that E(X 2 ) < and E(Y 2 )

More information

5 Transfer function modelling

5 Transfer function modelling MSc Further Time Series Analysis 5 Transfer function modelling 5.1 The model Consider the construction of a model for a time series (Y t ) whose values are influenced by the earlier values of a series

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting) Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting) (overshort example) White noise H 0 : Let Z t be the stationary

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

ARIMA Modelling and Forecasting

ARIMA Modelling and Forecasting ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first

More information

Parameter estimation: ACVF of AR processes

Parameter estimation: ACVF of AR processes Parameter estimation: ACVF of AR processes Yule-Walker s for AR processes: a method of moments, i.e. µ = x and choose parameters so that γ(h) = ˆγ(h) (for h small ). 12 novembre 2013 1 / 8 Parameter estimation:

More information

3 Theory of stationary random processes

3 Theory of stationary random processes 3 Theory of stationary random processes 3.1 Linear filters and the General linear process A filter is a transformation of one random sequence {U t } into another, {Y t }. A linear filter is a transformation

More information

SOME BASICS OF TIME-SERIES ANALYSIS

SOME BASICS OF TIME-SERIES ANALYSIS SOME BASICS OF TIME-SERIES ANALYSIS John E. Floyd University of Toronto December 8, 26 An excellent place to learn about time series analysis is from Walter Enders textbook. For a basic understanding of

More information

Applied time-series analysis

Applied time-series analysis Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 18, 2011 Outline Introduction and overview Econometric Time-Series Analysis In principle,

More information

Classical Decomposition Model Revisited: I

Classical Decomposition Model Revisited: I Classical Decomposition Model Revisited: I recall classical decomposition model for time series Y t, namely, Y t = m t + s t + W t, where m t is trend; s t is periodic with known period s (i.e., s t s

More information

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series 2. Robert Almgren. Sept. 21, 2009 Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models

More information

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models Facoltà di Economia Università dell Aquila umberto.triacca@gmail.com Introduction In this lesson we present a method to construct an ARMA(p,

More information

Estimation and application of best ARIMA model for forecasting the uranium price.

Estimation and application of best ARIMA model for forecasting the uranium price. Estimation and application of best ARIMA model for forecasting the uranium price. Medeu Amangeldi May 13, 2018 Capstone Project Superviser: Dongming Wei Second reader: Zhenisbek Assylbekov Abstract This

More information

The ARIMA Procedure: The ARIMA Procedure

The ARIMA Procedure: The ARIMA Procedure Page 1 of 120 Overview: ARIMA Procedure Getting Started: ARIMA Procedure The Three Stages of ARIMA Modeling Identification Stage Estimation and Diagnostic Checking Stage Forecasting Stage Using ARIMA Procedure

More information

STAT 443 (Winter ) Forecasting

STAT 443 (Winter ) Forecasting Winter 2014 TABLE OF CONTENTS STAT 443 (Winter 2014-1141) Forecasting Prof R Ramezan University of Waterloo L A TEXer: W KONG http://wwkonggithubio Last Revision: September 3, 2014 Table of Contents 1

More information

Chapter 9: Forecasting

Chapter 9: Forecasting Chapter 9: Forecasting One of the critical goals of time series analysis is to forecast (predict) the values of the time series at times in the future. When forecasting, we ideally should evaluate the

More information

Decision 411: Class 9. HW#3 issues

Decision 411: Class 9. HW#3 issues Decision 411: Class 9 Presentation/discussion of HW#3 Introduction to ARIMA models Rules for fitting nonseasonal models Differencing and stationarity Reading the tea leaves : : ACF and PACF plots Unit

More information

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] 1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet

More information

Time Series 4. Robert Almgren. Oct. 5, 2009

Time Series 4. Robert Almgren. Oct. 5, 2009 Time Series 4 Robert Almgren Oct. 5, 2009 1 Nonstationarity How should you model a process that has drift? ARMA models are intrinsically stationary, that is, they are mean-reverting: when the value of

More information

1 Introduction to Generalized Least Squares

1 Introduction to Generalized Least Squares ECONOMICS 7344, Spring 2017 Bent E. Sørensen April 12, 2017 1 Introduction to Generalized Least Squares Consider the model Y = Xβ + ɛ, where the N K matrix of regressors X is fixed, independent of the

More information

A time series is called strictly stationary if the joint distribution of every collection (Y t

A time series is called strictly stationary if the joint distribution of every collection (Y t 5 Time series A time series is a set of observations recorded over time. You can think for example at the GDP of a country over the years (or quarters) or the hourly measurements of temperature over a

More information

Chapter 3: Regression Methods for Trends

Chapter 3: Regression Methods for Trends Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38 ARIMA Models Jamie Monogan University of Georgia January 25, 2012 Jamie Monogan (UGA) ARIMA Models January 25, 2012 1 / 38 Objectives By the end of this meeting, participants should be able to: Describe

More information

7. Forecasting with ARIMA models

7. Forecasting with ARIMA models 7. Forecasting with ARIMA models 309 Outline: Introduction The prediction equation of an ARIMA model Interpreting the predictions Variance of the predictions Forecast updating Measuring predictability

More information

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Moving average processes Autoregressive

More information

Some Time-Series Models

Some Time-Series Models Some Time-Series Models Outline 1. Stochastic processes and their properties 2. Stationary processes 3. Some properties of the autocorrelation function 4. Some useful models Purely random processes, random

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Automatic seasonal auto regressive moving average models and unit root test detection

Automatic seasonal auto regressive moving average models and unit root test detection ISSN 1750-9653, England, UK International Journal of Management Science and Engineering Management Vol. 3 (2008) No. 4, pp. 266-274 Automatic seasonal auto regressive moving average models and unit root

More information

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27 ARIMA Models Jamie Monogan University of Georgia January 16, 2018 Jamie Monogan (UGA) ARIMA Models January 16, 2018 1 / 27 Objectives By the end of this meeting, participants should be able to: Argue why

More information

AR, MA and ARMA models

AR, MA and ARMA models AR, MA and AR by Hedibert Lopes P Based on Tsay s Analysis of Financial Time Series (3rd edition) P 1 Stationarity 2 3 4 5 6 7 P 8 9 10 11 Outline P Linear Time Series Analysis and Its Applications For

More information

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Dynamic Time Series Regression: A Panacea for Spurious Correlations International Journal of Scientific and Research Publications, Volume 6, Issue 10, October 2016 337 Dynamic Time Series Regression: A Panacea for Spurious Correlations Emmanuel Alphonsus Akpan *, Imoh

More information

Lecture 4a: ARMA Model

Lecture 4a: ARMA Model Lecture 4a: ARMA Model 1 2 Big Picture Most often our goal is to find a statistical model to describe real time series (estimation), and then predict the future (forecasting) One particularly popular model

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

Econometría 2: Análisis de series de Tiempo

Econometría 2: Análisis de series de Tiempo Econometría 2: Análisis de series de Tiempo Karoll GOMEZ kgomezp@unal.edu.co http://karollgomez.wordpress.com Segundo semestre 2016 IX. Vector Time Series Models VARMA Models A. 1. Motivation: The vector

More information

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series Journal of Mathematics and Statistics 8 (4): 500-505, 2012 ISSN 1549-3644 2012 doi:10.3844/jmssp.2012.500.505 Published Online 8 (4) 2012 (http://www.thescipub.com/jmss.toc) Seasonal Autoregressive Integrated

More information

5 Autoregressive-Moving-Average Modeling

5 Autoregressive-Moving-Average Modeling 5 Autoregressive-Moving-Average Modeling 5. Purpose. Autoregressive-moving-average (ARMA models are mathematical models of the persistence, or autocorrelation, in a time series. ARMA models are widely

More information

System Identification

System Identification System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 26, 2013 Module 6 Lecture 1 Arun K. Tangirala System Identification July 26, 2013 1 Objectives of this Module

More information

Time Series: Theory and Methods

Time Series: Theory and Methods Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition With 124 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition vn ix CHAPTER 1 Stationary

More information

4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore,

4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore, 61 4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2 Mean: y t = µ + θ(l)ɛ t, where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore, E(y t ) = µ + θ(l)e(ɛ t ) = µ 62 Example: MA(q) Model: y t = ɛ t + θ 1 ɛ

More information

Chapter 8: Model Diagnostics

Chapter 8: Model Diagnostics Chapter 8: Model Diagnostics Model diagnostics involve checking how well the model fits. If the model fits poorly, we consider changing the specification of the model. A major tool of model diagnostics

More information

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each) GROUND RULES: This exam contains two parts: Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each) The maximum number of points on this exam is

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Moving Average (MA) representations

Moving Average (MA) representations Moving Average (MA) representations The moving average representation of order M has the following form v[k] = MX c n e[k n]+e[k] (16) n=1 whose transfer function operator form is MX v[k] =H(q 1 )e[k],

More information

Econometrics II Heij et al. Chapter 7.1

Econometrics II Heij et al. Chapter 7.1 Chapter 7.1 p. 1/2 Econometrics II Heij et al. Chapter 7.1 Linear Time Series Models for Stationary data Marius Ooms Tinbergen Institute Amsterdam Chapter 7.1 p. 2/2 Program Introduction Modelling philosophy

More information

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1 Forecasting using R Rob J Hyndman 3.2 Dynamic regression Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Stochastic and deterministic trends 3 Periodic seasonality 4 Lab session 14 5 Dynamic

More information

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models Module 3 Descriptive Time Series Statistics and Introduction to Time Series Models Class notes for Statistics 451: Applied Time Series Iowa State University Copyright 2015 W Q Meeker November 11, 2015

More information

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, March 18, 2013 1 Basics of Modeling

More information

Lecture 6a: Unit Root and ARIMA Models

Lecture 6a: Unit Root and ARIMA Models Lecture 6a: Unit Root and ARIMA Models 1 2 Big Picture A time series is non-stationary if it contains a unit root unit root nonstationary The reverse is not true. For example, y t = cos(t) + u t has no

More information

Ross Bettinger, Analytical Consultant, Seattle, WA

Ross Bettinger, Analytical Consultant, Seattle, WA ABSTRACT DYNAMIC REGRESSION IN ARIMA MODELING Ross Bettinger, Analytical Consultant, Seattle, WA Box-Jenkins time series models that contain exogenous predictor variables are called dynamic regression

More information

Econometric Forecasting

Econometric Forecasting Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 1, 2014 Outline Introduction Model-free extrapolation Univariate time-series models Trend

More information

Chapter 6: Model Specification for Time Series

Chapter 6: Model Specification for Time Series Chapter 6: Model Specification for Time Series The ARIMA(p, d, q) class of models as a broad class can describe many real time series. Model specification for ARIMA(p, d, q) models involves 1. Choosing

More information

Univariate Time Series Analysis; ARIMA Models

Univariate Time Series Analysis; ARIMA Models Econometrics 2 Fall 24 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Outline of the Lecture () Introduction to univariate time series analysis. (2) Stationarity. (3) Characterizing

More information

Modeling and forecasting global mean temperature time series

Modeling and forecasting global mean temperature time series Modeling and forecasting global mean temperature time series April 22, 2018 Abstract: An ARIMA time series model was developed to analyze the yearly records of the change in global annual mean surface

More information

Covariances of ARMA Processes

Covariances of ARMA Processes Statistics 910, #10 1 Overview Covariances of ARMA Processes 1. Review ARMA models: causality and invertibility 2. AR covariance functions 3. MA and ARMA covariance functions 4. Partial autocorrelation

More information

MEI Exam Review. June 7, 2002

MEI Exam Review. June 7, 2002 MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Homework 4. 1 Data analysis problems

Homework 4. 1 Data analysis problems Homework 4 1 Data analysis problems This week we will be analyzing a number of data sets. We are going to build ARIMA models using the steps outlined in class. It is also a good idea to read section 3.8

More information

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator Lab: Box-Jenkins Methodology - US Wholesale Price Indicator In this lab we explore the Box-Jenkins methodology by applying it to a time-series data set comprising quarterly observations of the US Wholesale

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Part 1. Multiple Choice (40 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 6 points each)

Part 1. Multiple Choice (40 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 6 points each) GROUND RULES: This exam contains two parts: Part 1. Multiple Choice (40 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 6 points each) The maximum number of points on this exam is

More information

Elements of Multivariate Time Series Analysis

Elements of Multivariate Time Series Analysis Gregory C. Reinsel Elements of Multivariate Time Series Analysis Second Edition With 14 Figures Springer Contents Preface to the Second Edition Preface to the First Edition vii ix 1. Vector Time Series

More information