Time Series I Time Domain Methods

Similar documents
STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Time Series Analysis -- An Introduction -- AMS 586

Time Series: Theory and Methods

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Classic Time Series Analysis

γ 0 = Var(X i ) = Var(φ 1 X i 1 +W i ) = φ 2 1γ 0 +σ 2, which implies that we must have φ 1 < 1, and γ 0 = σ2 . 1 φ 2 1 We may also calculate for j 1

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

FE570 Financial Markets and Trading. Stevens Institute of Technology

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Introduction to Time Series Analysis. Lecture 11.

3 Theory of stationary random processes

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Estimation and application of best ARIMA model for forecasting the uranium price.

Covariances of ARMA Processes

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

AR(p) + I(d) + MA(q) = ARIMA(p, d, q)

Time Series Outlier Detection

STOR 356: Summary Course Notes

Time series analysis of activity and temperature data of four healthy individuals

Time Series Analysis

A time series is called strictly stationary if the joint distribution of every collection (Y t

STAT 443 (Winter ) Forecasting

Forecasting using R. Rob J Hyndman. 2.5 Seasonal ARIMA models. Forecasting using R 1

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

The Problem. Regression With Correlated Errors. Generalized Least Squares. Correlated Errors. Consider the typical regression model.

Multivariate Time Series

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Time Series Modeling. Shouvik Mani April 5, /688: Practical Data Science Carnegie Mellon University

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

Empirical Market Microstructure Analysis (EMMA)

Exercises - Time series analysis

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Minitab Project Report - Assignment 6

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

Ross Bettinger, Analytical Consultant, Seattle, WA

AR, MA and ARMA models

Part II. Time Series

Ch 6. Model Specification. Time Series Analysis

at least 50 and preferably 100 observations should be available to build a proper model

Univariate Time Series Analysis; ARIMA Models

ITSM-R Reference Manual

Suan Sunandha Rajabhat University

Econ 623 Econometrics II Topic 2: Stationary Time Series

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

STAT 248: EDA & Stationarity Handout 3

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Chapter 6: Model Specification for Time Series

ARIMA Models. Richard G. Pierse

SOME BASICS OF TIME-SERIES ANALYSIS

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore,

MCMC analysis of classical time series algorithms.

Analysis. Components of a Time Series

Review Session: Econometrics - CLEFIN (20192)

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9)

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

STAT 436 / Lecture 16: Key

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) Outline. Return Rate. US Gross National Product

Forecasting. Simon Shaw 2005/06 Semester II

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Stat 565. (S)Arima & Forecasting. Charlotte Wickham. stat565.cwick.co.nz. Feb

Time Series HILARY TERM 2008 PROF. GESINE REINERT

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

X t = a t + r t, (7.1)

Homework 4. 1 Data analysis problems

Univariate ARIMA Models

A Data-Driven Model for Software Reliability Prediction

Automatic seasonal auto regressive moving average models and unit root test detection

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Some Time-Series Models

The log transformation produces a time series whose variance can be treated as constant over time.

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Forecasting Bangladesh's Inflation through Econometric Models

2. An Introduction to Moving Average Models and ARMA Models

Modelling using ARMA processes

ESSE Mid-Term Test 2017 Tuesday 17 October :30-09:45

Advanced Econometrics

STAT Financial Time Series

Lecture 2: Univariate Time Series

Elements of Multivariate Time Series Analysis

Identifiability, Invertibility

Gaussian Copula Regression Application

We will only present the general ideas on how to obtain. follow closely the AR(1) and AR(2) cases presented before.

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

Econometrics I: Univariate Time Series Econometrics (1)

3. ARMA Modeling. Now: Important class of stationary processes

Transcription:

Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007

Overview Filtering and the Likelihood Function

Time series is the study of data consisting of a sequence of DEPENDENT random variables (or vectors). This contrasts a sequence of independent observations and regression which studies the dependence of one variable on another. In this tutorial, we will discuss models which give rules to generate future observations based on current observations.

Overview Filtering and the Likelihood Function Regression Model A basic regression model is typically written as follows: y t = β 0 + β 1 x t + ɛ t for t = 1,..., n where β 0 and β 1 are fixed coefficients and x t is a covariate. The sequence ɛ t are independent and identically distributed normal random variables with variance σ 2.

Overview Filtering and the Likelihood Function Autoregressive (AR) Model A basic time series model related to regression is the Autoregressive Model (AR) x t = φx t 1 + w t for t = 1,..., n where φ is a constant and w t is a sequence of independent and identically distributed normal random variables (often called a white noise sequence in this context). Note that the result of this model will be a single dependent series a time series, x 1,..., x t. This contrasts the regression model which relates two variables.

Overview Filtering and the Likelihood Function Vector Autoregressive Model An obvious generalization of this model is a vector version x t = Φx t 1 + w t for t = 1,.., n where x t = (x t1,..., x tp ) and w t is a sequence of independent p 1 normal random vectors with covariance matrix Q. The matrix Φ is a p p matrix.

Overview Filtering and the Likelihood Function The Linear State Space Model Now, imagine that we cannot actually observe our system of interest x t which is a Vector Autoregressive Model. Instead we may observe a linear transformation of x t with additional observational error. In other words, the complete model is as follows: y t = Ax t + v t, x t = Φx t 1 + w t where y t is a q 1 vector, A is a q p matrix, and v t is a sequence of independent normal random variables with mean zero and covariance matrix R. Also, the sequence v t is independent of the sequence w t. The equation on the left is generally called the observation equation, and the equation on the right is called the system equaion.

Overview Filtering and the Likelihood Function What can we do with the state space model? Maximum Likelihood estimation of the parameters (including standard errors of our estimates). Bayesian estimation of parameters. Filtering conditional distribution of the systems given our observations. We will, therefore, have a guess of our unseen system, x t given our observations y t. Prediction predict the next observation given the observations up to the current time.

Overview Filtering and the Likelihood Function Filering Suppose you may observe y 1,..., y n, but you are really interested in x 1,.., x n and estimating parameters such as φ. While you cannot know x t, you can have an optimal estimate of x t. The goal will be to calculate p(x t y t,..., y 1 ) For a Gaussian model this means that you ll know E(x t y t,..., y 1 ) (your guess) and also the conditional variance of x t.

Overview Filtering and the Likelihood Function Steps for Filtering Here s an outline of how that works assume that you know all the parameters. Assume that you have the guess at the last time step, i.e. p(x t 1 y t 1,..., y 1 ). 1. Predict the next system observation based on what you have. p(x t y 1,..., y t 1 = p(x t x t 1 )p(x t 1 y t 1,..., y 1 )dx t 1 2. Calculate the guess for the observation, y t, based on this prediction. p(y t y t 1,..., y 1 ) = p(y t x t, y t 1,..., y 1 )p(x t y t 1,..., y 1 )dx t 3. Use Bayes rule to update the prediction for x t with the current observation y t p(x t y t,..., y 1 ) = p(x t y t 1,..., y 1 )p(y t x t, y t 1..., y 1 ) p(y t y t 1,..., y 1 )

Overview Filtering and the Likelihood Function The Likelihood Function Remember that the likelihood function is simply the density of the data evaluated at the observations. p(y T,..., y 1 ) = Π T t=1p(y t y t 1,..., y 1 ) Now, we have a likelihood to maximize to obtain parameters such as φ. When the errors are Gaussian, finding p(x t y t,..., y 1 ) for each t is known as calculating the Kalman filter. In general, this density is called the filter density. These calculations are reduced to matrix operations in the linear Gaussian case.

The ARMA model is the most basic time series model. It can be fit with every major statistical package. The advantage of the ARMA model is that it can be used to efficiently fit many, if not most, stationary time series. The disadvantage is that it is often not very descriptive of the underlying processes; the parameters are difficult to interpret.

Autoregressive (AR) Model of order p We have already seen a basic AR(1) model. We can extend the AR model to include more previous observations. The AR(p) model is as follows: x t = φ 1 x t 1 + φ 2 x t 2 +... + φ p x t p + w t

Autoregressive (AR) Model of order p Note that the AR(p) model can be expressed as a single dimension of a multivariate AR(1) process. x t = φ 1 x t 1 + φ 2 x t 2 +... + φ p x t p + w t x t x t 1 x t 2... x t p+1 = φ 1 φ 2 φ 3... φ p 1 0 0... 0 0 1 0... 0............... 0 1... 1 0 x t 1 x t 2 x t 3... x t p + w t where w t has covariance p p matrix with σ 2 as the first entry and zeros otherwise.

Moving Average Model Another basic time series model is the Moving Average Model. The MA(q) model is as follows: x t = w t + θ 1 w t 1 +... + θ q w t q This model is the moving average across a window of size q + 1. Note that this is quite different than the Autoregressive model expand on this. The autoregressive model is recursive which leads to memory that falls off over lags longer than the order of the model. The moving average model has no dependency for observations that do not have overlapping windows.

ARMA model These two models, Autoregressive (AR) and Moving Average (MA) can be combined into an ARMA(p,q)model: x t = φ 1 x t 1 + φ 2 x t 2 +... + φ p x t p + w t + θ 1 w t 1 +... + θ q w t q How can we fit such a model? How do we select a good model? What are the steps if we are handed data that follows such a model?

ARMA model using the Backshift Operator There is an alternative way to write an ARMA model. This is done with the backshift operator. We use the symbol B to denote moving back one time step. Bx t = x t 1 This allows us to write the ARMA model as where and φ(b)x t = θ(b)w t φ(b) = 1 φ 1 B φ 2 B 2... φ p B p θ(b) = 1 + θ 1 B + θ 2 B 2 +... + θ q B q This is going to help us write the ARMA model in other forms, and to write extensions the model such as SARIMA.

ARMA as a special case of the linear state space model. We have already seen that an AR(p) can be written as the system equation of a linear state space model. How can we incorporate the MA(q) part of the model. This can be seen using the backshift operator formulation. Let y t = θ(b)x t. Since x t is AR(p), it can be represented as φ(b)x t = w t. Putting these together, we obtain φ(b)y t = θ(b)w t.

ARMA as a special case of the linear state space model. Putting these things together we obtain y t = (1 θ 1... θ r )x t and x t x t 1 x t 2... x t r+1 = φ 1 φ 2 φ 3... φ r 1 0 0... 0 0 1 0... 0............... 0 1... 1 0 x t 1 x t 2 x t 3... x t p + w t where r is the maximum of p and q and any undefined parameters are set to zero.

ARMA as a special case of the linear state space model. The important thing to note is that the ARMA model is simply a special case of the linear state space model and, therefore, requires no additional computational methods. One implication of this is that assuming that we observe an ARMA process with additional observational error can be easily incorporated. Missing data can be easily incorporated into this framework.

Wolfer sunspots 1770-1869 sun 0 50 100 150 0 20 40 60 80 100 Time

ACF of Sunspot Data Series sun ACF 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag

What is the ACF? The ACF is a plot of the sample autocorrelation function, ˆρ(h), which is defined as ˆρ(h) = ˆγ(h) ˆγ(0) where n h t=1 ˆγ(h) = (x t+h x)(x t x). n which is the sample autocovariance function. This is an estimator for the correlation between x t and x t h. For an MA(q) process, this ACF should cut off after q lags.

PACF of Sunspot Data Series sun Partial ACF 0.5 0.0 0.5 5 10 15 20 Lag

What is the PACF? The PACF is a plot of the sample partial autocorrelation function. This plot estimates the correlation between the x t and the x t h with the linear effect of the intermediate observations, x t 1,..., x t h+1, removed. The algorithm to do this (Durbin-Levinson) is a little complicated, but can be done quickly. For an AR(p) model, the PACF should cut off after the pth lag.

Outline Fit 1 arima(x = sun, order = c(1, 0, 0)) ar1 intercept 0.8199 50.4814 s.e. 0.0568 11.4640 sigma^2 estimated as 459.9: log likelihood = -449, aic = 904 Standardized Residuals 1 1 2 3 0 20 40 60 80 100 Time ACF of Residuals ACF 0.4 0.2 0.8 0 5 10 15 20 Lag p values for Ljung Box statistic p value 0.0 0.4 0.8 2 4 6 8 10 lag

Outline Fit 2 arima(x = sun, order = c(2, 0, 0)) ar1 ar2 intercept 1.4067-0.7117 48.3502 s.e. 0.0705 0.0701 4.9705 sigma^2 estimated as 228.7: log likelihood = -414.79, aic = 837.58 Standardized Residuals 2 0 2 0 20 40 60 80 100 Time ACF of Residuals ACF 0.2 0.4 1.0 0 5 10 15 20 Lag p values for Ljung Box statistic p value 0.0 0.4 0.8 2 4 6 8 10 lag

Outline Fit 3 arima(x = sun, order = c(1, 0, 1)) ar1 ma1 intercept 0.7239 0.7523 50.2528 s.e. 0.0726 0.0681 9.9355 sigma^2 estimated as 257.8: log likelihood = -420.73, aic = 849.46 Standardized Residuals 2 0 2 0 20 40 60 80 100 Time ACF of Residuals ACF 0.2 0.4 1.0 0 5 10 15 20 Lag p values for Ljung Box statistic p value 0.0 0.4 0.8 2 4 6 8 10 lag

Outline Fit 4 Call: arima(x = sun, order = c(2, 0, 1)) ar1 ar2 ma1 intercept 1.2241-0.5591 0.3844 48.6226 s.e. 0.1125 0.1078 0.1321 6.0308 sigma^2 estimated as 214.5: log likelihood = -411.67, aic = 833.35 Standardized Residuals 2 0 2 0 20 40 60 80 100 Time ACF of Residuals ACF 0.2 0.4 1.0 0 5 10 15 20 Lag p values for Ljung Box statistic p value 0.0 0.4 0.8 2 4 6 8 10 lag

Outline Fit 5 Call: arima(x = sun, order = c(3, 0, 0)) ar1 ar2 ar3 intercept 1.5528-1.0018 0.2073 48.6030 s.e. 0.0981 0.1543 0.0989 6.0927 sigma^2 estimated as 218.9: log likelihood = -412.65, aic = 835.29 Standardized Residuals 2 0 2 0 20 40 60 80 100 Time ACF of Residuals ACF 0.2 0.4 1.0 0 5 10 15 20 Lag p values for Ljung Box statistic p value 0.0 0.4 0.8 2 4 6 8 10 lag

The ARIMA Model The first important extension of the model is the ARIMA model (I is for integrated.). The assumption here is that the data will follow an ARMA model after differencing. There is a tacit assumption that an ARMA model is stationary, i.e. that the dependency between x t and x t h depends only on lag, h. (Restrictions must be put on the parameters of ARMA models to guarantee stationarity.) Differencing also eliminates unwanted trends.

An ARIMA(p,d,q) φ(b) d x t = θ(b)w t where φ(b) = 1 φ 1 B φ 2 B 2... φ p B p, θ(b) = 1 + θ 1 B + θ 2 B 2 +... + θ q B q, and = 1 B

The SARIMA model The SARIMA model is the seasonal ARIMA model. The SARIMA model allows us to model dependency between nearby observations and also across seasons. For example, the temperature in January could depend as much on last January s temperature as it does on December s temperature. Note that these models are useful when a KNOWN and FIXED season is to be modeled.

An ARIMA(p, d, q) (P, D, Q) s Model Φ(B s )φ(b) s x t = Θ(B)θ(B)w t where φ(b) = 1 φ 1 B... φ p B p, θ(b) = 1 + θ 1 B +... + θ q B q and Φ(B) = 1 Φ 1 B.. Φ P B P, Θ(B) = 1 + Θ 1 B +... + Θ q B q. Also, s = 1 B s, = 1 B.

Other Models Extended Kalman Filter. What if our model is not linear? Using Taylor expansions, we can approximate the non-linear model with a linear model. Particle Filters We can use increased computational capabilities to use simulation to compute filters and likelihood functions via simulation. This is especially useful for Bayesian analysis but can be used for likelihood. There are a number of other time series that are more tractable such as ARCH which allows for heteroskedastic error terms.

Research Robert Shumway and David Stoffer. Time Series Analysis and Its Applications. Springer NY, 2006. Peter Brockwell and Richard Davis. Time Series: Theory and Methods, Second Ed. Springer NY, 1991. G.E.P. Box and G.M. Jenkins. Time Series Analysis, Forecasting, and Control. Oakland, CA: Holden Day, 1970.