UPPSALA UNIVERSITY Department of Mathematics Jesper Rydén Stationary Stochastic Processes 1MS025 Autumn 2010 COMPUTER SESSION: ARMA PROCESSES 1 Introduction In this computer session, we work within the R environment. We will use this software to e.g. estimate the covariance function of a given data set. Moreover, we will study ARMA processes: simulation and estimation. Hence, we will focus on stochastic processes with discrete time, or in other words time series. One of the strengths of R is that packages and libraries for specific statistical disciplines are available. Many functions for time series analysis are found in the standard library stats, which is ready to use whenever R is started. You will find the help function useful. In R, simply type a question mark before the command you want to know more about. For instance,?mean. 2 Estimation 2.1 Time series object OneofthestrengthsofRisthefrequentuseofvariousobjects.Oftenintimeseries analysis with R, the initial data is transformed into a special type of object, a socalled ts object. Crucial in time series analysis is not only the values themselves, but also information about the dates and frequencies at which the observations were recorded. With R, this can be handled conveniently. Consider the data of yearly minimum temperatures from Uppsala, found in the data file ymin.dat which can be downloaded from Studentportalen 1. In this file, only the temperature values are found, but we know that they belong to the years 1840-2001 (with no gaps). Hence, we read the data and create a new object (here called ty): ym <- scan("ymin.dat") ty <- ts(ym,start=1840,freq=1) 1 After downloading,apath has to be set to directorywhere the downloadedfile is located. In Windows, use the menu system File Change dir...and then choose the directory in question. 1
We may plot the time series (the command win.graph allows us to control the proportions of the axis): win.graph(width=10,height=5) plot.ts(ty) 2.2 Estimation of mean and covariance function Let x 1,...,x n be observations of a time series. From theory, we know that the sample mean of x 1,...,x n is x = 1 n n x t, t=1 the sample covariance function is ˆr(τ) = 1 n τ (x t x)(x t+τ x), τ 0, n t=1 and the sample correlation function is ˆρ(τ) = ˆr(τ) ˆr(0), τ 0. The commands for estimation in R are mean and acf 2. Let us draw the covariance function and correlation function for the temperature data: par(mfrow=c(2,1)) acf(ym,type="covariance") acf(ym) Note that the default option for acf is to estimate and draw the autocorrelation function. Most often, the estimated correlation function is of interest. Note in the plot the dashed lines that can be used to check for independence (maybe doubtful here?). 3 ARMA models 3.1 Introduction For the simplest AR and MA models, explicit expressions for the ACVF (and ACF) have been derived, e.g. for the MA(1) process X t = e t +c 1 e t 1 2 The common abbreviation acf stands for autocovariance function. 2
where {e t } is a white-noise sequence (see the textbook), σ 2 (1+c 2 1), τ = 0 r X (τ) = σ 2 c 1, τ = ±1 0, τ > 1 and for the AR(1) process X t +a 1 X t 1 = e t where again {e t } is a white-noise sequence, r X (τ) = σ2 ( a 1 a 2 1 ) τ. 1 For ARMA(p, q) processes in general, no simple expressions exist. Given the orders p, q and numerical values of the related parameters a 1,...,a p, c 1,...,c q, the function ARMAacf computes the ACF. The routine can be used also for AR and MA models, respectively. Some examples: par(mfrow=c(2,1)) # Plot in two subplots plot( ARMAacf(ar = 0.9,lag.max=20),type= b ) # AR(1) proc plot( ARMAacf(ar = -0.8,lag.max=20),type= b ) # AR(1) proc plot( ARMAacf(ma = 0.7,lag.max=10),type= b ) # MA(1) proc plot( ARMAacf(ma = -0.7,lag.max=10),type= b ) # MA(1) proc For AR(2) processes, the character of the ACF can be understood through the stationarity parameter region. Try for instance the following: plot( ARMAacf(ar = c(0.5,0.25),lag.max=12),type= b ) # AR(2) proc plot( ARMAacf(ar = c(1.0,-0.6),lag.max=12),type= b ) # AR(2) proc Finally, try an ARMA model, e.g. the ARMA(2,2) as follows: plot( ARMAacf(ar = c(-0.3,0.1),ma=c(0.5,0.8),lag.max=12),type= b ) 3.2 Simulation By the routine arima.sim, realizations of the processes may be created 3. Let us start with examples of AR(1) processes. The function arima.sim creates objects of the class ts. We plot the simulated observations as well as the (estimated) sample covariance functions. 3 The name relates to integrated ARMA models, so-called ARIMA models, but the function can be used for the simpler models as well (ARIMA models are out of the scope of this course). 3
par(mfrow=c(2,1)) x1 <- arima.sim(n=50,list(ar=c(0.8)),sd=1); str(x1) plot.ts(x1,type= b ); acf(x1) x2 <- arima.sim(n=50,list(ar=c(-0.8)),sd=1); str(x2) plot.ts(x2,type= b ); acf(x2) Using the plot command ts.plot (for multiple time series), you may plot the series in the same plot with different colours: ts.plot(x1,x2,col=c("blue","red")) Try on your own to simulate an MA(1) process in the same manner as with the AR(1) process above, e.g. x1 <- arima.sim(n=50,list(ma=c(0.3)),sd=1); Further, simulate e.g. the following ARMA(3,2) process: x <- arima.sim(n=50,list(ar=c(-0.8,0,0.5),ma=c(0.2,0.8)),sd=1); plot.ts(x,type= b ); acf(x) Repeat the simulation several times and notice the variability from one realization to another. 4
3.3 Estimation The routine arima is useful for estimation of AR, MA, ARMA processes. One has to suggest in one of the input arguments the possible order of the process to estimate. When working in practice, a model selection procedure should be used. This is an important topic in applied statistics, and several techniques and criteria have been developed for various statistical models. For instance, Akaike s Information Criterion (AIC) is often used in time series analysis. This is a relative measure based on likelihood functions, computed for a number of candidate models. The model with the lowest AIC should be chosen. The AR(1) process First,simulate 100observationsofanAR(1)processwitha 1 = 0.8.The estimated model is assigned to the object called estmod as follows, and estimated parameters can be extracted from the object: estmod <- arima(x,order=c(1,0,0),method= ML ) # Data in object x phi <- estmod$coef[1] Alternatively, the parameter estimate can be accessed directly by arima(x,order=c(1,0,0),method= ML )$coef[1] By typing the name of the object, in this example, estmod, the whole object is printed on the screen. The AIC value is accessed by estmod$aic. Now, study the sampling distribution of the parameter a 1 by simulating 1000 time series, each of 100 observations. The distribution is visualized in a histogram (by hist). a <- 0 for (k in 1:1000) { x <- arima.sim(n=100,list(ar=0.8)); a[k] <- arima(x,order=c(1,0,0),method= ML )$coef[1] } hist(a); grid() Would you claim that the distribution seems normal? Let us check unbiasedness by comparing the true value (a 1 = 0.8) to the value obtained by mean(a). Also, compare the estimate from data, sd(a), with the large-sample result: It can be shown as a large-sample result that V[â 1 ] = 1 â 2 1 n. Use this result to compare with the estimate from data, sd(a). 5
The AR(2) process Now, consider an AR(2) process. Experiment with parameters from various regions (cf. Figure 1). Estimation is here performed by Yule Walker. x <- arima.sim(n=100,list(ar=c(0.6,0.3))) mest.yw <- ar(x,order.max=2,method= yw ) MA and ARMA processes Use of arima for estimation in the more complex models is straight-forward; here is an example: x1 <- arima.sim(n=200,list(ar=0.7,ma=-0.4)) # An ARMA(1,1) mest.ml <- arima(x1,order=c(1,0,1),method= ML ) 3.4 Diagnostics A crucial point in statistical modelling is to separate systematic and random variation (recall regression models). The residuals are of interest when evaluating the model: do they satisfy the model assumptions, e.g. for the basic time series models, an innovation sequence? In other words, the residuals, after fitting a certain ARMA model, should resemble a white-noise sequence. A common assumption of distribution is normality; this is needed if confidence intervals for the model parameters are to be calculated. From the point of view of R, the residuals are easily obtained from the object resulting from a fitting by arima. Suppose the model is summarized in the object mod; the residuals are then found as mod$res. In the simplest model checks for independence, one investigates the covariance function of these (using again acf), at the basic level, by visual inspection. Statistical tests are found in the literature, but out of the scope of this course. To investigate the normal assumption, the residuals are plotted in normal probability paper(using qqnorm) or in a histogram (using hist). A common statistical test for normality is the Shapiro Wilk test (use shapiro.test). Check the assumptions in your fitted models! Check also a case where the fitted model is obviously wrong (e.g. you simulate an AR process but try to fit an MA process). When several candidate models are possible, which is the best? The answer is not straight-forward. The different aspects mentioned above should be investigated, and for time-series models, the AIC could be a helpful tool. Important to keep in mind is the principle of parsimony: if not necessary, keep the model as simple as possible. 6