ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

Similar documents
Fitting an ARIMA Process to Data

Random Variables, Sampling and Estimation

Output Analysis and Run-Length Control

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Chapter 4. Fourier Series

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

ECON 3150/4150, Spring term Lecture 3

Lecture 19: Convergence

Efficient GMM LECTURE 12 GMM II

Properties and Hypothesis Testing

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

1 General linear Model Continued..

Sequences and Series of Functions

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Solution to Chapter 2 Analytical Exercises

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

6.3 Testing Series With Positive Terms

Linear Regression Demystified

Fall 2013 MTH431/531 Real analysis Section Notes

MA Advanced Econometrics: Properties of Least Squares Estimators

Infinite Sequences and Series

Math 113 Exam 3 Practice

Math 113 Exam 4 Practice

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

Math 113 Exam 3 Practice

Properties and Tests of Zeros of Polynomial Functions

4.3 Growth Rates of Solutions to Recurrences

Homework 3 Solutions

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

1 Covariance Estimation

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology.

Zeros of Polynomials

CS284A: Representations and Algorithms in Molecular Biology

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

1 Inferential Methods for Correlation and Regression Analysis

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

The Growth of Functions. Theoretical Supplement

Lecture 2: Monte Carlo Simulation

Math 61CM - Solutions to homework 3

Seunghee Ye Ma 8: Week 5 Oct 28

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

INTEGRATION BY PARTS (TABLE METHOD)

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Statistics 511 Additional Materials

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Algebra of Least Squares

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

Correlation Regression

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

A statistical method to determine sample size to estimate characteristic value of soil parameters

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

The Sample Variance Formula: A Detailed Study of an Old Controversy

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Estimation of the Mean and the ACVF

1.010 Uncertainty in Engineering Fall 2008

1 Introduction to reducing variance in Monte Carlo simulations

Recurrence Relations

Frequentist Inference

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Optimization Methods MIT 2.098/6.255/ Final exam

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Machine Learning Brett Bernstein

Problem Set 4 Due Oct, 12

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Machine Learning Brett Bernstein

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lesson 11: Simple Linear Regression

Machine Learning for Data Science (CS 4786)

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

1 Approximating Integrals using Taylor Polynomials

Estimation for Complete Data

Lecture 12: September 27

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Section 11.8: Power Series

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Signals & Systems Chapter3

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

b i u x i U a i j u x i u x j

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Transcription:

ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three objects, we may cosider a rich class of models called: Autoregressio To start, cosider a AR() model: Autoregressive Itegrated Movig Averages }{{}}{{}}{{} AR I MA y t = φy t + ɛ t Right away you otice this is o differet from ay stadard regressio y i = βx i + ɛ i. We have simply relabeled the coefficiet β φ ad the right-had-side variable is a lag of the depedet variable x y t. Sice it is o differet from ay other regressio, a exogeous error term is eough for OLS to be cosistet: The ˆφ ols = If E(ɛ t ) = 0 ad E(ɛ t y t ) = 0 ad E(y t ) < t= y ty t t= y t = t= (φy t + ɛ t )y t t= y t plim φ + E(ɛ ty t ) E(yt ) = φ I words, as log as assumptios -3 hold, OLS is cosistet. Origially, assumptio 4, homoskedasticity, did ot affect the ubiasedess or cosistecy of OLS. With lagged depedet variables, this is o loger true. Serial correlatio i the error term geerates edogeeity bias, assumptio, due to omitted variables bias. This is a implicit violatio of assumptio, i.e., we have misspecified the model. Let s start with some basic ituitio. Suppose the serial correlatio of the error term is itself AR(): ɛ t = ρɛ t + η t The it is clear that the error term is correlated with the right-had-side variable: y t = φ y }{{} t + depeds o ɛ t ɛ t }{{} depeds o ɛ t So, what should we do? If we had a way for Eviews to cotrol for the serial correlatio of the error, the the remaiig error would be exogeous, ad we could perform OLS. Well, that s exactly what the AR() fuctio i Eviews does. To clarify, this is a fuctio i Eviews that cotrols for AR() serial correlatio i the error term, ot y, regardless of

whether that autocorrelatio geerates bias. Thus, we could use it i ay regressio, usig Cochrae-Orcutt: If y i = βx i + ɛ i, ad ɛ i = ρɛ i + η i, the ls y x AR() If we have a lagged depedet variable, the this solutio works as well, usig Iterated- Cochrae-Orcutt: If y t = φy t + ɛ t, ad ɛ t = ρɛ t + η t, the ls y y(-) AR() The coefficiet o y(-) will be ˆφ ad o AR() will be ˆρ. (Or will they? More later...) However, as I said earlier, this is really a violatio of assumptio. To see this, first substitute the AR() error equatio ito the AR() mai equatio: y t = φy t + ρɛ t + η t Now, substitute the lagged mai equatio i for ɛ t : y t = φy t + ρ(y t φy t ) + η t Collectig terms, we ca see that the true model (the oe we implicitly wrote dow) is: y t = (φ + ρ)y t (ρφ)y t + η t where the error is ow exogeous. We may choose to re-write the equatio; usig differet letters to differetiate the true coefficiets from the misspecified model: y t = λ y t + λ y t + η t If we try to solve for φ ad ρ as fuctios of λ ad λ, we fid: φ = λ + λ + 4λ, ρ = λ λ + 4λ OR φ = λ λ + 4λ, ρ = λ + λ + 4λ We have o way of kowig which, ad for the purposes of forecastig, it does t matter. If Eviews sets ˆφ = ρ ad ˆρ = φ, we will have the exact same forecast. Whether Eviews coverges to the correct solutio, or the reverse, depeds upo the iitial coditio used for Iterated-Cochrae-Orcutt. However, why ot simply ru OLS o the correctly specified equatio, ad geerate idetical forecasts: ls y y(-) y(-) The lesso embedded i this problem is importat. I geeral, ay AR(p) model for y with a AR(b) model error term is, ifact, a misspecified AR(p+h) model of y. Thus, fidig the correct specificatio for ay autoregressive process will resolve the autocorrelatio i the error term ad, hece, remove the bias. This is why we should reject ay AR model with

serially correlated residuals ad try higher order AR models, rather tha try to cotrol for the serial correlatio of the error directly. Okay. Lesso leared, let s go ru a buch of AR models. But wait, if the AR() commad i Eviews refers to the error term, ot the depedet variable, the what s the commad we wat? Well, if we believe y is AR() ad the error is exogeous, the we ru OLS: y t = φy t + ɛ t ɛ t = η t ls y y(-) Alteratively, we could regress y o othig, but assume the error is serially correlated: y t = ɛ t ɛ t = φɛ t + η t Why is this a equivalet model? Repeat the steps from above. First, substitute the serial correlatio equatio ito the mai equatio: y t = φɛ t + η t Secod, use the mai equatio to replace ɛ t : Therefore, i Eviews we ru the commad: y t = φy t + η t ls y AR() which literally says regress y o othig, but cotrol for a AR() serially correlated error term. Yet the result is a estimatio the exact same model. This result is also importat because it exteds to all cases. Suppose we believe that y is a AR(p) process whe the model is correctly specified model: y t = φ y t + φ y t + + φ p y t p + ɛ t ɛ t = η t The we may estimate this equatio i Eviews as: y t = ɛ t ɛ t = φ ɛ t + φ ɛ t + + φ p ɛ t p + η t ls y AR() AR() AR(p) The mai differece will ow be that Eviews uderstads you are performig time series aalysis ad stores the auto-correlatio fuctios for the model, so you should always do it this way. Okay, so ow you uderstad. Lagged depedet variables with serial correlatio i the residuals meas you should try a differet AR(p) specificatio usig the AR() AR(p) commads. Likewise, for a regressio without lagged depedet variables, but with serially correlated errors, you may add the AR() AR(p) commads to restore homoskedasticity. Both methods work with the same simple, flexible commads (or at least that s the idea). 3

Movig Average So what s a movig average? It is most simple to uderstad with real data ad eve weightig. Suppose we have ay radom data over time. We may ask, what s the three day ruig average? Of course, we eed the first three umbers i order to calculate the first term, so we will oly have averages whe we re doe: More geerally, we may costruct a movig average of order q for ay data: q x t (q) = q i= x t i We do t eve require equal weights: x t (q) = q α i x t i where i= q α i = i= From this perspective, movig averages seem quite simple. What makes our movig averages difficult is that they are defied for ɛ, the uobservable error term. Moreover, it is assumed that the error term is exogeous, i.e., the AR process is correctly specified so that ɛ t = η t. Fially, our weights do t sum to oe. Istead, they satisfy the uit root restrictio. Agai, i Eviews the MA() fuctio is a assumptio about the error, so we could estimate a MA() as follows: y t = ɛ t ɛ t = θη t + η t ls y MA() 4

Agai, this is more familiar if we substitute the error equatio ito the mai equatio: Likewise, we may imagie ay MA(q) model: y t = θη t + η t y t = θ η t + θ η t + + θ q η t q + η t Ad we could estimate ay such model i Eviews as: ls y MA() MA() MA(q) It is importat to ote that the movig average is with respect to the exogeous error term. Thus, i order to have ay chace at accurately estimatig the movig average coefficiets, we must first believe that the residuals we observe are ot serially correlated. This takes us back to the pricipal questio. How are we to select a ARMA model? The aswer:. We must select a AR(p) process that is a plausibly correct specificatio. Oe ecessary (but ot sufficiet) coditio is that the residuals ot be serially correlated. We should add as may terms as ecessary but o more.. Oce we ca obtai ubiased residuals, we may use them to estimate a movig average o the exogeous error. We should add as may terms as ecessary but o more. We do all of this simultaeously by ruig may ARMA(p,q) models. We must throw out ay models with serially correlated residuals. Amog the remaiig models, we must balace out our desire for correct specificatio with parsimoy (simplicity). Oe method is to select the model with the miimum Akaike Iformatio Criterio or (ofte preferred) the miimum Schwarz Criterio (miimum meas most egative). However, these are by o meas the oly methods for selectig a model. We may also appeal to graphical argumets (correlograms), test statistics, or forecastig performace whe selectig a model. A Techical Note I have omitted a costat for the usual reaso: easier math. However, you may otice that: ls y c AR() AR(p) MA() MA(q) AND ls y c y(-) y(-p) MA() MA(q) produce differet estimates of the costat coefficiet ĉ. The short aswer is, Who cares about the costat ayway? It has o ecoomic sigificace. I do t mea to imply that you should drop the costat, as that could cause omitted variable bias (the bias we just worked so hard to resolve). Rather, subtract off the mea of the depedet variable from each observatio (yt = y t ȳ). The you ca drop the costat from the regressio sice the process will be mea zero by costructio (assumig statioarity). (From here o out we shall assume that the ARMA model is well specified. So ɛ is purely exogeous, ɛ = η. This is called White Noise i time series ecoometrics.) 5

Itegrated ARMA I order for ARMA estimatio to work at all, we must believe that the depedet variable is statioary. There two defiitios:. Weakly Statioary: the covariace Cov(y t, y t j ) = σ j does ot chage over time. Strictly Statioary: the distributio of y t does ot chage over time The weak defiitio is sufficiet for ARMA models; although, it s easier to imagie the strict defiitio. I order to trasform o-statioary data ito somethig statioary, we will cosider takig first ad secod order differeces; a process kow as itegratio. Cosider a time-series data with a time tred. Oe optio is to de-tred the data: y t = α + µt + ɛ t I this case, y t is called tred-statioary, ad addig @tred i Eviews is sufficiet to restore statioarity. O the other had, we may have a radom walk with drift: y t = µ + y t + ɛ t I this case, the model is called differece-statioary because de-tredig solves the ostatioarity of the drift, ot the radom walk, while first-differecig solves both. We could easily ru this model i Eviews usig the d() fuctio, which tells the software to calculate the first differece: ls d(y) c Because Eviews uderstads d(y) to mea the depedet variable is the first differece of y, this sytax is carried through to the AR() commads. We may also wat to calculate the secod order differece, i.e., the differece of the differece: [(y t y t ) (y t y t )] = φ [(y y y t ) (y t y t 3 )] + ɛ t To ru this i Eviews we would iterate the differeces: ls d(d(y)) AR() While it is mathematically straightforward to exted this cocept idefiitely, we typically do ot go beyod first or secod differecig, as it is hard to imagie the applicability. More geerally, a ARIMA(p,,q), a first order itegrated ARMA(p,q) model, looks like: (y t y t ) = φ (y t y t ) + + φ p (y t p y t p ) + θ ɛ t + + θ q ɛ t q + ɛ t This could be ru i Eviews as: ls d(y) AR() AR(p) MA() MA(q) 6