STAT 153: Introduction to Time Series

Similar documents
Basics: Definitions and Notation. Stationarity. A More Formal Definition

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -18 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Statistics 349(02) Review Questions

Minitab Project Report - Assignment 6

Classic Time Series Analysis

MCMC analysis of classical time series algorithms.

Statistics 153 (Time Series) : Lecture Three

Spectral Analysis. Al Nosedal University of Toronto. Winter Al Nosedal University of Toronto Spectral Analysis Winter / 71

STAD57 Time Series Analysis. Lecture 23

Time Series Modeling. Shouvik Mani April 5, /688: Practical Data Science Carnegie Mellon University

Minitab Project Report Assignment 3

5 Autoregressive-Moving-Average Modeling

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -20 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Lecture 2: Univariate Time Series

Time Series Examples Sheet

ARIMA modeling to forecast area and production of rice in West Bengal

Chapter 3, Part V: More on Model Identification; Examples

Stat 565. (S)Arima & Forecasting. Charlotte Wickham. stat565.cwick.co.nz. Feb

X random; interested in impact of X on Y. Time series analogue of regression.

EEM 409. Random Signals. Problem Set-2: (Power Spectral Density, LTI Systems with Random Inputs) Problem 1: Problem 2:

Time Series Analysis -- An Introduction -- AMS 586

Time Series Examples Sheet

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

E 4101/5101 Lecture 6: Spectral analysis

Some Time-Series Models

Simple Descriptive Techniques

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

Modeling and forecasting global mean temperature time series

Lecture 5: Estimation of time series

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Computer Exercise 1 Estimation and Model Validation

STAT 436 / Lecture 16: Key

Time Series I Time Domain Methods

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Analysis. Components of a Time Series

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

Lecture 19 Box-Jenkins Seasonal Models

FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

Time Series and Forecasting

Asian Economic and Financial Review. SEASONAL ARIMA MODELLING OF NIGERIAN MONTHLY CRUDE OIL PRICES Ette Harrison Etuk

New York University Department of Economics. Applied Statistics and Econometrics G Spring 2013

Advanced Econometrics

Time Series 4. Robert Almgren. Oct. 5, 2009

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science : Discrete-Time Signal Processing

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

Forecasting Bangladesh's Inflation through Econometric Models

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

TMA4285 December 2015 Time series models, solution.

Time Series and Forecasting

2. An Introduction to Moving Average Models and ARMA Models

Using wavelet tools to estimate and assess trends in atmospheric data

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

Chemistry 125. Physical Chemistry Laboratory Spring 2007

Transformations for variance stabilization

Time Series: Theory and Methods

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Econometría 2: Análisis de series de Tiempo

STAT 248: EDA & Stationarity Handout 3

at least 50 and preferably 100 observations should be available to build a proper model

Forecasting using R. Rob J Hyndman. 2.3 Stationarity and differencing. Forecasting using R 1

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Detection & Estimation Lecture 1

Exercises - Time series analysis

AS 102 The Astronomical Universe (Spring 2010) Lectures: TR 11:00 am 12:30 pm, CAS Room 316 Course web page:

Analysis of Violent Crime in Los Angeles County

Methods of Mathematics

Big Bang, Black Holes, No Math

Chapter 3: Regression Methods for Trends

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Stat 565. Spurious Regression. Charlotte Wickham. stat565.cwick.co.nz. Feb

Forecasting Network Activities Using ARIMA Method

Univariate ARIMA Models

Probability and Statistics for Final Year Engineering Students

ITSM-R Reference Manual

Technical note on seasonal adjustment for Capital goods imports

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Statistics Homework #4

SOME BASICS OF TIME-SERIES ANALYSIS

Forecasting: principles and practice 1

Lecture 8: ARIMA Forecasting Please read Chapters 7 and 8 of MWH Book

STA 6857 Signal Extraction & Long Memory ARMA ( 4.11 & 5.2)

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Time series analysis of activity and temperature data of four healthy individuals

Time Series Analysis

FE570 Financial Markets and Trading. Stevens Institute of Technology

White Noise Processes (Section 6.2)

Time Series HILARY TERM 2008 PROF. GESINE REINERT

Chapter 11 - Lecture 1 Single Factor ANOVA

V3: Circadian rhythms, time-series analysis (contd )

Forecasting. Simon Shaw 2005/06 Semester II

Announcements Monday, September 18

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Scenario 5: Internet Usage Solution. θ j

Transcription:

STAT 153: Introduction to Time Series Instructor: Aditya Guntuboyina Lectures: 12:30 pm - 2 pm (Tuesdays and Thursdays) Office Hours: 10 am - 11 am (Tuesdays and Thursdays) 423 Evans Hall GSI: Brianna Heggeseth Section: 10 am - 12 pm or 12 pm - 2 pm (Fridays) Office Hours and Location: TBA Announcements, Lecture slides, Assignments, etc. will be posted on the course site at bspace.

Course Outline A Time Series is a set of numerical observations, each one being recorded at a specific time. Examples of Time Series data are ubiquitous. The aim of this course is to teach you how to analyze such data.

Population of the United States US Population 0.0e+00 1.0e+08 2.0e+08 3.0e+08 1800 1850 1900 1950 2000 Year(once every ten years)

Course Outline (continued) There are two approaches to time series analysis: Time Domain Approach Frequency Domain Approach (also known as the Spectral or Fourier analysis of time series) Very roughly, 60% of the course will be on Time Domain methods and 40% on Frequency Domain methods.

Time Domain Approach Seeks an answer to the following question: Given the observed time series, how does one guess future values? Forecasting or Prediction

Time Domain (continued) Forecasting is carried out through three steps: Find a MODEL that adequately describes the evolution of the observed time series over time. Fit this model to the data. In other words, estimate the parameters of the model. Forecast based on the fitted model.

Time Series Models Most of our focus in the Time Domain part of the course will be on the following two classes of models: Trend + Seasonality + Stationary ARMA Differencing + Stationary ARMA (consist of the ARIMA and Seasonal ARIMA models) In the Time Domain Part of the course, we study these models and learn how to execute each of the three steps outlined in the previous slide with them.

Time Series Models (continued) These provide a sturdy toolkit for analyzing many practical time series data sets. State-Space models are a modern and very powerful class of time series models. Forecasting in these models is carried out via an algorithm known as the Kalman Filter. We shall spend some time on these models although we will not have time to study them in depth.

Frequency Domain Approach Brightness of a variable star on 600 consecutive nights Brightness 0 5 10 15 20 25 30 35 0 100 200 300 400 500 600 day

Frequency Domain (continued) Based on the idea that the observed time series is made up of CYCLES having different frequencies. In the Frequency Domain Approach, the data is analyzed by discovering these hidden cycles along with their relative strengths. The key tool here is the Discrete Fourier Transform (DFT) or, more specifically, a function of the DFT known as the Periodogram.

Frequency Domain (continued) In the Frequency Domain part of the course, we shall study the periodogram and its performance in discovering periodicities when the data are indeed made up of many different cycles. It turns out that the raw periodogram is often too variable as an estimator of the true Spectrum and we shall study methods for improving it.

Rest of this Lecture Some more Time Series Data Examples Simplest Time Series Model: Purely Random Process (Section 3.4.1) Sample Autocorrelation Coefficients and the Correlogram (Section 2.7 and Page 56)

Annual Measurements of the level of Lake Huron 1875-1972 Level in feet 576 577 578 579 580 581 582 1880 1900 1920 1940 1960 Year

Monthly Accidental Deaths in the US from 1973-1978 Number of Deaths 7000 8000 9000 10000 11000 1973 1974 1975 1976 1977 1978 1979 Time

The first step in the time domain analysis of a time series data set is to find a model that well describes the evolution of the data over time. Basic Modelling Strategy: Start Simple Build Up Simplest Model: X,t = 1,,n independent N ( 0,σ 2 ) t Purely Random Process or Gaussian White Noise

100 Observations from Gaussian White Noise with unit Variance Purely Random Process -2-1 0 1 2 3 0 20 40 60 80 100 Time

Is this data set from a purely random process Data -3-2 -1 0 1 2 0 20 40 60 80 100 Time

How to check if a given time series is purely random?

How to check if a given time series is purely random? Answer: Think in terms of Forecasting.

How to check if a given time series is purely random? Answer: Think in terms of Forecasting. For a purely random series, the given data can NOT help in predicting Xn+1. The best estimate of Xn+1 is E(Xn+1) = 0. In particular, X1 can not predict X2 and X2 can not predict X3 and so on.

How to check if a given time series is purely random? Answer: Think in terms of Forecasting. For a purely random series, the given data can NOT help in predicting Xn+1. The best estimate of Xn+1 is E(Xn+1) = 0. In particular, X1 can not predict X2 and X2 can not predict X3 and so on. Therefore, the correlation coefficient between Y = (X1,..., Xn-1) and Z = (X2,..., Xn) must be close to zero.

The formula for the correlation between Y and Z is r = n 1 t =1 n 1 t =1 ( X t X )( (1) X t +1 X ) (2) n 1 t =1 ( X t X ) 2 (1) X t +1 X (2) ( ) 2 X (1) = n 1 X t t =1 n 1 X (2) = n 1 X t +1 t =1 n 1 This formula is usually simplified to obtain r 1 = n 1 t =1 ( X t X) ( X t +1 X) n t =1 ( X t X) 2 X = n X t t =1 n Note the subscript on the left hand side above.

Sample Autocorrelation Coefficients The quantity r1 is called the Sample Autocorrelation Coefficient of X1,..., Xn at lag one. Lag one because this correlation is between Xt and Xt+1. When X1,..., Xn are obtained from a Purely Random Process, r1 is close to zero, particularly when n is large. One can similarly consider Sample Autocorrelations at other lags: r k = n k t =1 ( X t X) ( X t + k X) n t =1 ( X t X) 2 k = 1, 2,...

Correlogram Mathematical Fact: When X1,..., Xn are obtained from a Purely Random process, r1, r2,... are independently distributed according to N(0, 1/n). So one way of testing if the series is purely random is to plot the sample autocorrelations. This plot is known as the Correlogram. Use the function acf() in R to get the Correlogram. ts.obs = rnorm(100) acf(ts.obs, lag.max = 20, type = correlation, plot = T, drop.lag.0 = F)

Correlogram of a Purely Random Series of 100 Observations ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag The correlogram plots rk against k. r0 always equals 1. The blue bands correspond to levels of ±1.96/ n

Interpreting the Correlogram When X1,..., Xn are obtained from a Purely Random process, the probability that a fixed rk lies outside the blue bands equals 0.05. A value of rk outside the blue bands is significant i.e., it gives evidence against pure randomness. However, the overall probability of getting at least one rk outside the bands increases with the number of coefficients plotted. If 20 rks are plotted, one expects to get one significant value under pure randomness.

Rules of Thumb (for deciding if a correlogram indicates departure from randomness) Chatfield (page 56) A single rk just outside the bands may be ignored, but two or three values well outside indicate a departure from pure randomness. A single significant rk at a lag which has some physical interpretation such as lag one or a lag corresponding to seasonal variation also indicates evidence of non-randomness.

Is this data set from a purely random process Data -3-2 -1 0 1 2 0 20 40 60 80 100 Time

Correlogram of the Data in the Previous Slide ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag This data was generated from a moving average process.

Is this data set from a purely random process? Data -3-2 -1 0 1 2 3 0 20 40 60 80 100 Time

Correlogram of the Data in the Previous Slide ACF -0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag Again, there is more structure in this dataset compared to pure randomness.

Is this data set from a purely random process? Data -3-2 -1 0 1 2 0 20 40 60 80 100 Time

Correlogram of the Data in the Previous Slide ACF -0.2 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag Lots of structure here.

Conclusions for Today Purely Random Process or Gaussian White Noise. Sample Autocorrelation Coefficient, rk. Correlogram. How do rks behave under pure randomness? How to tell from the Correlogram if there is evidence of departure from pure randomness?