Scenario 5: Internet Usage Solution. θ j

Similar documents
FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

Box-Jenkins ARIMA Advanced Time Series

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Suan Sunandha Rajabhat University

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Univariate ARIMA Models

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

at least 50 and preferably 100 observations should be available to build a proper model

Classic Time Series Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

Decision 411: Class 9. HW#3 issues

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

COMPUTER SESSION 3: ESTIMATION AND FORECASTING.

STAT 436 / Lecture 16: Key

Time Series Modeling. Shouvik Mani April 5, /688: Practical Data Science Carnegie Mellon University

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Analysis of Violent Crime in Los Angeles County

Ch 6. Model Specification. Time Series Analysis

MCMC analysis of classical time series algorithms.

Decision 411: Class 7

The ARIMA Procedure: The ARIMA Procedure

Statistical Methods for Forecasting

Justin Appleby CS 229 Machine Learning Project Report 12/15/17 Kevin Chalhoub Building Electricity Load Forecasting

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

The log transformation produces a time series whose variance can be treated as constant over time.

Ch3. TRENDS. Time Series Analysis

Time Series I Time Domain Methods

Estimation and application of best ARIMA model for forecasting the uranium price.

Empirical Approach to Modelling and Forecasting Inflation in Ghana

Minitab Project Report - Assignment 6

2. An Introduction to Moving Average Models and ARMA Models

MODELING MAXIMUM MONTHLY TEMPERATURE IN KATUNAYAKE REGION, SRI LANKA: A SARIMA APPROACH

Transformations for variance stabilization

Time Series Outlier Detection

Lecture 19 Box-Jenkins Seasonal Models

Modeling and forecasting global mean temperature time series

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

5 Transfer function modelling

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Empirical Market Microstructure Analysis (EMMA)

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Chapter 3: Regression Methods for Trends

Forecasting Area, Production and Yield of Cotton in India using ARIMA Model

Automatic seasonal auto regressive moving average models and unit root test detection

STUDY ON MODELING AND FORECASTING OF MILK PRODUCTION IN INDIA. Prema Borkar

Theoretical and Simulation-guided Exploration of the AR(1) Model

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Data Mining Techniques

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05

SAS/ETS 14.1 User s Guide. The ARIMA Procedure

Univariate, Nonstationary Processes

Step 2: Select Analyze, Mixed Models, and Linear.

Automatic Forecasting

Implementation of ARIMA Model for Ghee Production in Tamilnadu

AE International Journal of Multi Disciplinary Research - Vol 2 - Issue -1 - January 2014

Using SPSS for One Way Analysis of Variance

Agriculture Update Volume 12 Issue 2 May, OBJECTIVES

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

Time Series 2. Robert Almgren. Sept. 21, 2009

Chapter 4: Models for Stationary Time Series

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

A Beginner s Introduction. Box-Jenkins Models

Forecasting of meteorological drought using ARIMA model

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Quantitative Finance I

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Time Series Analysis Model for Rainfall Data in Jordan: Case Study for Using Time Series Analysis

SOME BASICS OF TIME-SERIES ANALYSIS

Analysis. Components of a Time Series

Ch 4. Models For Stationary Time Series. Time Series Analysis

Using Analysis of Time Series to Forecast numbers of The Patients with Malignant Tumors in Anbar Provinc

Univariate ARIMA Forecasts (Theory)

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

1 Linear Difference Equations

Stat 565. (S)Arima & Forecasting. Charlotte Wickham. stat565.cwick.co.nz. Feb

Stat 565. Spurious Regression. Charlotte Wickham. stat565.cwick.co.nz. Feb

Chapter 5: Models for Nonstationary Time Series

A SEASONAL TIME SERIES MODEL FOR NIGERIAN MONTHLY AIR TRAFFIC DATA

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

Solar irradiance forecasting for Chulalongkorn University location using time series models

AR(p) + I(d) + MA(q) = ARIMA(p, d, q)

Forecasting. Simon Shaw 2005/06 Semester II

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

Technical note on seasonal adjustment for Capital goods imports

STAT Financial Time Series

TRANSFER FUNCTION MODEL FOR GLOSS PREDICTION OF COATED ALUMINUM USING THE ARIMA PROCEDURE

Financial Econometrics Review Session Notes 3

arxiv: v1 [stat.me] 5 Nov 2008

Chapter 9: Forecasting

THE ROYAL STATISTICAL SOCIETY 2009 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULAR FORMAT MODULE 3 STOCHASTIC PROCESSES AND TIME SERIES

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

Transcription:

Scenario : Internet Usage Solution Some more information would be interesting about the study in order to know if we can generalize possible findings. For example: Does each data point consist of the total number of users that logged in during a -minute period, or is it an instantaneous measurement at some point during the minute? Do users have to be active in order to count or is being connected sufficient? Recall that for linear regression, independence of the observations is necessary. In this case, this assumption does not hold. Consecutive observations are correlated with each other, which is called autocorrelation. Thus, we will use a model that is very common in time series analysis: the ARIMA(p,d,q) (auto regressive integrated moving average) model, where p is the number of autoregressive terms, d is the number of nonseasonal differences, and q is the number of moving average terms. The ARIMA(p,d,q) model is given by i = p φ i B B d X t i= j q θ j B a t j= where B i X t = X t-j, φ i are the autoregressive parameters, θ j are the moving average parameters, and a t are standard normal errors. Figure shows a time plot of the dataset. 00 0 00 Minutes Figure : Time plot of the number of internet users

One assumption necessary for the ARIMA model is that the variance is stationary. Since the variation over time appears constant based on the graph, this assumption seems to be met. In order to find the AR and MA component of the model, the mean has to be constant over time. Since this is clearly not the case, we will correct this by first define d, the number of differences required to have a constant mean. In Figure, it can be seen that the mean is not constant. To address this problem, we will difference the series until the mean appears stationary. That is, we will now use the differenced values w t = X t X t- values for t. Figure shows the series after taking one difference. It appears that the mean is now stable, so we will use d =. If this differencing is not done, then the predicted model is not valid. 0 0 - -0-0 0 0 0 0 Minutes Transforms: difference() Figure : Time plot after one difference Now we must determine the orders for p and q. To determine q, the number of autoregressive terms, we will consider the sample autocorrelation function () of the differenced logged observations. We look for q such that the decays to 0 after lag q. To determine p, the number of moving average terms, we will consider the sample partial autocorrelation function (P) of the differenced logged observations. We look for p such that the P decays to 0 after lag p. The and P of the differenced series can be seen in Figure.

0. 0. 0. 0. Partial - - -0. -0. -0. -0. 0 0 Figure : Sample and P Although the is significant for the first six lags, it definitely starts to decay to 0 after the first lag. The first three lags are significant in the P, but there is damped sine wave decay to 0 after the first lag. For this reason, I suspect that the best model will be an ARIMA(,,). However, it is possible that more terms are necessary. In order to assess the model fit, we will look for patterns in the residuals (ideally having no pattern and being normally distributed). We will also consider the and P of the residuals. If any lags lie outside of the white noise boundaries, this suggests that we need to increase the order of the model. We will also look at the significance of each of the terms. Additionally the AIC gives a indication how well a model fits. Smaller values of the AIC indicate a better model fit than higher values. We will start with smaller order models ARIMA(,,0) Both the and P (Figure ) of the residuals show two significant lags. This suggests that more terms should be included in the model. With a p-value of 00 (Table ), the autoregressive term is statistically significant. Additionally we can find the AIC to be.. Error for usage from ARIMA(,,0) Error for usage from ARIMA(,,0) 0. 0. 0. 0. Partial - - -0. -0. -0. -0. 0 0 Figure : and P of residual of an ARIMA(,,0) model Parameter Estimates

Estimates Std Error t Approx Sig Non-Seasonal Lags AR..0.0.000 Constant.00..0.0 Melard's algorithm was used for estimation. Table : Parameter estimates of a ARIMA(,,0) model ARIMA(0,,) The and P of the residuals, shown in Figure are both significant through the fourth lag. The P shows decay after the first lag, suggesting that an autoregressive term is needed. Table shows a p-value of 00, the moving average term is statistically significant and we can find the AIC to be.. Since the AIC is larger than before, this model is actually worse than the previous one. Error for usage from ARIMA(0,,) Error for usage from ARIMA(0,,) 0. 0. 0. 0. Partial - - -0. -0. -0. -0. 0 0 Figure : and P of residual of an ARIMA(0,,) model Parameter Estimates Estimates Std Error t Approx Sig Non-Seasonal Lags MA -..0 -..000 Constant....0 Melard's algorithm was used for estimation. Table : Parameter estimates of a ARIMA(0,,) model ARIMA(,,) The and P of the residuals (Figure ) stay within the white noise lines. There also seem to be no problems with the diagnostics. There is no pattern in the residuals. With a p-values of 00 (Table ), both the autoregressive and moving average terms are statistically significant and we also find the lowest AIC of..

Error for usage from ARIMA(,,) Error for usage from ARIMA(,,) 0. 0. 0. 0. Partial - - -0. -0. -0. -0. 0 0 Figure : and P of residual of an ARIMA(,,) model Parameter Estimates Non-Seasonal Lags Estimates Std Error t Approx Sig AR..00.0.000 MA -.0.00 -.0.000 Constant..0.. Melard's algorithm was used for estimation. Table : Parameter estimates of a ARIMA(,,) model Thus, the model that best fits this data is the ARIMA(,,). Now that we have chosen an appropriate model we want to predict the internet usage for the next 0 minutes. Figure shows a graph of the internet usage together with the predictions based on the above model. Clearly the model works really well for the time period we had data collected. For the predictions, we see that it appears to follow the pattern, but the confidence limits associated with the prediction get wide quickly. This on the one hand is due to the short time series we have available and on the other due to the fact the our model depends on the previous values. Consequently any mistake made in the prediction will be carried forward which leads to large confidence limits.

00 0 Prediction % Lower Confidence % Upper Confidence 00 0 00 0 0 0 Minutes Figure : Internet users and prediction using an ARIMA(,,) model Getting the Results in SPSS Time Plot of Series:. Open the data file www.sav in SPSS.. On the toolbar, click on Graphs > Sequence. In the Sequence Charts box, select the usage variable, and input it into the Variables box.. Click OK. Time Plot of Differenced Series:. On the toolbar, click on Graphs > Sequence. In the Sequence Charts box, select the usage variable, and input it into the Variables box.. Check the box to the left of Difference, and type in the box to the right of Difference.. Click OK. Plots of and P:

. On the toolbar, click on Graphs > Time Series > Autocorrelations. In the Autocorrelations box, select the usage variable, and input it into the Variables box.. Check the box to the left of Difference, and type in the box to the right of Difference.. Click OK. Fitting the ARIMA Model:. On the toolbar, click on Analyze > Time Series > ARIMA. In the ARIMA box, select the usage variable, and input it into the Dependent box.. Enter the values of p, d, and q in the appropriate box.. Click OK. Plots of and P of the residuals:. On the toolbar, click on Graphs > Time Series > Autocorrelations. In the Autocorrelations box, select the Err_ variable, and input it into the Variables box.. Make sure that the box to the left of Difference is NOT checked.. Click OK. Predicting with the ARIMA Model:. On the toolbar, click on Analyze > Time Series > ARIMA. In the ARIMA box, select the usage variable, and input it into the Dependent box.. Enter the values of p, d, and q in the appropriate box.. Click on Save and under Predicted Cases select Predict through.. In the box next to Observations enter 0 and click continue.. Click OK.. On the toolbar, click on Graphs > Sequence. In the Sequence Charts box, select the usage variable, FIT_, LCL_ and UCL_ and input it into the Variables box.. Click OK.