System Identification

Similar documents
Moving Average (MA) representations

Estimating trends using filters

Parameter estimation: ACVF of AR processes

Ch 6. Model Specification. Time Series Analysis

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Introduction to Time Series Analysis. Lecture 11.

Time Series: Theory and Methods

Empirical Market Microstructure Analysis (EMMA)

Univariate Time Series Analysis; ARIMA Models

Lecture 2: Univariate Time Series

Automatic Autocorrelation and Spectral Analysis

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Differencing Revisited: I ARIMA(p,d,q) processes predicated on notion of dth order differencing of a time series {X t }: for d = 1 and 2, have X t

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Autoregressive Moving Average (ARMA) Models and their Practical Applications

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Elements of Multivariate Time Series Analysis

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Time Series I Time Domain Methods

Multivariate Time Series

Classical Decomposition Model Revisited: I

CH5350: Applied Time-Series Analysis

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

A time series is called strictly stationary if the joint distribution of every collection (Y t

Figure 29: AR model fit into speech sample ah (top), the residual, and the random sample of the model (bottom).

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

The Identification of ARIMA Models

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

FE570 Financial Markets and Trading. Stevens Institute of Technology

ITSM-R Reference Manual

Applied time-series analysis

Econ 623 Econometrics II Topic 2: Stationary Time Series

System Identification

at least 50 and preferably 100 observations should be available to build a proper model

Estimating Moving Average Processes with an improved version of Durbin s Method

EECE Adaptive Control

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Nonlinear time series

EEG- Signal Processing

Gaussian processes. Basic Properties VAG002-

Some Time-Series Models

THE PROCESSING of random signals became a useful

6.3 Forecasting ARMA processes

STAT Financial Time Series

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Advanced Econometrics

Econometric Forecasting

Computer Exercise 1 Estimation and Model Validation

COMPUTER SESSION: ARMA PROCESSES

Problem Set 2: Box-Jenkins methodology

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

EL1820 Modeling of Dynamical Systems

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Introduction to ARMA and GARCH processes

Univariate ARIMA Models

Akaike criterion: Kullback-Leibler discrepancy

Time Series Analysis

Chapter 2: Unit Roots

ADAPTIVE FILTER THEORY

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

On Moving Average Parameter Estimation

Econometrics I: Univariate Time Series Econometrics (1)

MAT3379 (Winter 2016)

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Time Series Econometrics 4 Vijayamohanan Pillai N

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Introduction to Time Series Analysis. Lecture 12.

11. Further Issues in Using OLS with TS Data

Midterm Suggested Solutions

Lecture Wigner-Ville Distributions

3. ARMA Modeling. Now: Important class of stationary processes

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm

Chapter 4: Models for Stationary Time Series

Chapter 9: Forecasting

Note: The primary reference for these notes is Enders (2004). An alternative and more technical treatment can be found in Hamilton (1994).

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Forecasting. Simon Shaw 2005/06 Semester II

Basics: Definitions and Notation. Stationarity. A More Formal Definition

COMPUTER SESSION 3: ESTIMATION AND FORECASTING.

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

A Data-Driven Model for Software Reliability Prediction

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

3 Theory of stationary random processes

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström

A Diagnostic for Seasonality Based Upon Autoregressive Roots

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Introduction to Maximum Likelihood Estimation

4 Derivations of the Discrete-Time Kalman Filter

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Vector autoregressions, VAR

Gaussian Copula Regression Application

Chapter 6: Model Specification for Time Series

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

Applied Time. Series Analysis. Wayne A. Woodward. Henry L. Gray. Alan C. Elliott. Dallas, Texas, USA

Transcription:

System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 26, 2013 Module 6 Lecture 1 Arun K. Tangirala System Identification July 26, 2013 1

Objectives of this Module In this module, the objectives are to learn concepts pertaining to Estimation of time-series models Methods for estimating response-based (non-parametric) descriptions Prediction-error methods for estimating parametric models Arun K. Tangirala System Identification July 26, 2013 2

Lectures in this module This module contains four (4) lectures: Lecture 1: Estimation of time-series models Lecture 2: Estimation of impulse / step response models Lecture 3: Estimation of frequency response functions Lecture 4: Estimation of parametric input-output models Arun K. Tangirala System Identification July 26, 2013 3

Contents of Lecture 1 In this lecture, we shall: Learn the different techniques for estimating AR models Briefly discuss methods for estimating MA models Learn how to estimate ARMA and ARIMA models Arun K. Tangirala System Identification July 26, 2013 4

Background In Module 3 we studied different time-series models for linear stationary random processes. A general description is given by the ARIMA model. In this module, we shall learn how to estimate these models using the methods presented in Module 4. The estimated model is useful in identification in (i) developing the noise (disturbance) model and (ii) estimation of power spectral densities. Auto-regressive models result in linear predictors - therefore a linear OLS method suffices. The linear nature of the AR predictors also attracts a few other specialized methods. The historical nature and the applicability of this topic is such that numerous texts and survey/tutorial articles (references) dedicated to this topic have been written. We shall only discuss four popular estimators, namely i. Yule-Walker method ii. LS / Covariance method iii. Modified covariance method iv. Burg s estimator Arun K. Tangirala System Identification July 26, 2013 5

Estimation of auto-regressive models The AR estimation problem is stated as follows. Given N observations of a stationary process {v[k]}, k = 0,, N 1, fit an AR(P ) model. v[k] = P ( d j )v[k j] + e[k] (1) j=1 One of the first methods used to estimate AR models was the Yule-Walker method based on the Yule-Walker equations discussed in Lecture 3.6. This method belongs to the class of MoM estimators presented in Lecture 4.4. It is also one of the simplest to use. However, under some conditions the Y-W method is known to suffer from certain shortcomings as we shall learn shortly. Gradually more powerful and sophisticated alternatives exist. Arun K. Tangirala System Identification July 26, 2013 6

Yule-Walker method The Y-W method is an MoM approach as outlined in Lectures 3.6 and 4.4. Idea: The second-order moments of the bivariate p.d.f. f(v[k], v[k l]), i.e., the ACVFs of an AR(P ) process are related to the parameters of the model as, σ vv[0] σ vv[1] σ vv[p 1] σ vv[1] σ vv[0] σ vv[p 2]... σ vv[p 1] } σ vv[p 2] {{ σ vv[0] } Σ P d 1 d 2. d P }{{} θ P σ 2 v + σ T P θ P = σ 2 e σ vv[1] σ vv[2] =. σ vv[p ] }{{} σ P Thus, the Y-W estimates of the AR(P ) model and the innovations variance σ 2 e are ˆθ = ˆΣ 1 P ˆσP ˆσ 2 e = ˆσ 2 v + ˆσ T P ˆθ = ˆσ 2 v ˆσ T P provided ˆΣ P is invertible, which is guaranteed so long as σ[0] > 0. 1 ˆΣ P ˆσP (2b) (2a) Arun K. Tangirala System Identification July 26, 2013 7

Y-W method The matrix ˆΣ P is constructed using the biased estimator of the ACVF (recall L4.6) ˆσ[l] = 1 N 1 (v[k] v)(v[k l] v) (3) N k=l The Y-W estimates can be shown as the solution to the OLS minimization where ε[k] = v[k] ˆv[k k 1] = v[k] N+P 1 ˆθ YW = arg min ε 2 [k] (4) θ k=0 P ( d i )v[k i] i=1 Remark: The summation in (4) starts from k = 0 and runs up to k = N + P 1. In order to compute the prediction errors from k = 0,, P 1 and k = N,, N + P 1, the method pads p zeros to both ends of the series. This approach is frequently referred to as pre- and post-windowing of data. Arun K. Tangirala System Identification July 26, 2013 8

Properties of Y-W estimator The Y-W estimates, in general, enjoy good asymptotic properties: 1 For a model of order P, if the process {v[k]} is also AR(P ), the parameter estimates asymptotically follow a multivariate Gaussian distribution d N(θ θ0) N ( 0, σeγ 2 1 ) P (5) In practice, the theoretical variance and covariance matrix are replaced by their respective estimates. 2 The 95% CI for the individual parameter θ i0 is approximately constructed from the diagonals of ˆΣ ˆθ ˆθ i ± 1.96ˆσe (ˆΣ 1 P ) 1/2 ii (6) N Arun K. Tangirala System Identification July 26, 2013 9

Properties of Y-W estimators... contd. 3 Using the first property, if {v[k]} is an AR(P 0) process and an AR model of order P > P 0 is fit to the series, then the coefficients in excess of the true order are distributed as Nθl AN(0, 1) l > P 0 (7) To verify this fact, consider fitting an AR(P ) model to a white-noise process, i.e., when P 0 = 0. Then Σ P = σ 2 ei. 4 Recall that the last coefficient of an AR(P ) model is the PACF coefficient φ P P of the series. By the present notation, φ ll = d l = θ l (8) It follows from the above property that if the true process is AR(P 0), the 95% significance levels for PACF estimates at lags l > P 0 are 1.96 N ˆφ ll 1.96 N (9) Arun K. Tangirala System Identification July 26, 2013 10

Properties of Y-W estimator... contd. 5 From (5) it follows the Y-W estimates of an AR model are consistent. 6 The Y-W estimator suffers from a drawback. It may produce poor (high variability) estimates when the generating auto-regressive process has poles close to unit circle (reference). The cause is the poor conditioning of the auto-covariance matrix ˆΣ P for such processes combined with the bias in the ACVF estimator. The effects of the latter (bias) always prevail, but are magnified when ˆΣ P is poorly conditioned. 7 The Durbin-Levinson s algorithm is used to compute the parameter estimates in a recursive manner without having to explicitly invert ˆΣ P. 8 The Toeplitz structure of ˆΣ P and the biased ACVF estimator guarantee that the resulting model is stable and minimum phase. Arun K. Tangirala System Identification July 26, 2013 11

Example Y-W method A series consisting of N = 500 observations of a random process is given. Fit an AR(2) model using the Y-W method. Solution: The variance and ACF estimates at lags l = 1, 2 are computed to be ˆσ[0] = 7.1113, ˆρ[1] = 0.9155, ˆρ[2] = 0.7776 respectively. Plugging in these estimates into (2a) produces [ ] [ ˆd1 1 0.9155 ˆθ = = ˆd 2 0.9155 1 ] 1 [ ] 0.9155 0.7776 [ ] 1.258 = 0.374 The estimate of the innovations variance can be computed using (2b) [ ] [ ] ˆσ e 2 1.258 = 7.1113 + 0.9155 0.7776 = 0.9899 (11) 0.374 (10) Arun K. Tangirala System Identification July 26, 2013 12

Example... contd. The errors in the estimates can be computed from (5) by replacing the theoretical values with their estimated counterparts. [ ] 1 [ ] 1 0.9155 0.0017 0.0016 Σ ˆθ = 0.9899 = (12) 0.9155 1 0.0016 0.0017 Consequently, approximate 95% C.I.s for d 1 and d 2 are [ 1.3393, 1.1767] and [0.2928, 0.4554] respectively. Comparing the estimates with the true values used for simulation d 1,0 = 1.2; d 20 = 0.32 (13) we observe that the method has produced reasonably good estimates and that the C.I.s contain the true values. Note: The Y-W estimator is generally used when the data length is large and it is known a priori that the generating process has poles well within the unit circle. In general, it is used to initialize other non-linear estimators. Arun K. Tangirala System Identification July 26, 2013 13

Least squares / Covariance method The least squares method, as we learnt in L4.4, obtains the estimate as ˆθ LS = arg min θ N 1 k=p ε 2 [k] (14) Comparing with the standard linear regression form, we have [ ] T [ ] T ϕ[k] = v[k 1] v[k P ] ; θ = d = d 1 d P (15) Using the LS solution, we have ˆθ LS = ˆd LS = (Φ T Φ) 1 Φ T v = ( ) 1 ( ) 1 1 N P ΦT Φ N P ΦT v (16) where [ Φ = ϕ[p ] ϕ[p + 1] ϕ[n 1] [ v = v[p ] v[p + 1] v[n 1] ] T ] T Arun K. Tangirala System Identification July 26, 2013 14

LS / COV method A careful examination of (16) suggests that it can be written as a MoM estimate ˆθ = ˆΣ P ˆσ P (17) by introducing ˆσ vv[1, 1] ˆσ vv[1, 2] ˆσ vv[1, P ] 1 ˆΣ P N P ΦT Φ =... (18) ˆσ vv[p, 1] ˆσ vv[p, 2] ˆσ P [P, P ] ˆσ P 1 N P ΦT v = ˆσ vv[1, 1]. where the estimate of the ACVF is given by Observe that ˆΣ P ˆσ vv[l 1, l 2] = (19) ˆσ vv[p, 1] N 1 1 v[n l 1]v[n l 2] (20) N P n=p is a symmetric matrix by virtue of (20). Due to the equivalence above, the method is also known as the covariance method. Arun K. Tangirala System Identification July 26, 2013 15

Modified covariance method The modified covariance (MCOV) method stems from a modification of the objective function in the LS approach. It minimizes the sum squares of both forward and backward prediction errors, ε F and ε B respectively. ˆθ LS = arg min θ N 1 k=p N p 1 ε 2 F [k] + k=0 ε 2 B[k] (21) By a change of summation index, the objective function can also be written as N 1 k=p N p 1 ε 2 F [k] + k=0 N 1 ε 2 B[k] = (ε 2 F [k] + ε 2 B[k P ]) (22) The backward prediction error is defined in a similar way as the forward version: k=p ε B [k] = v[k] ˆv[k {v[k + 1],, v[k + P ]}] = v[k] P ( d i )v[k + i] (23) i=1 Arun K. Tangirala System Identification July 26, 2013 16

MCOV method Thus, the objective in the MCOV method is to minimize ( N 1 2 ( ) 2 P P v[k] + d i v[k i]) + v[k P ] + d i v[k P + i] (24) k=p i=1 i=1 The solution to this optimization problem is of the same form as from the LS/COV method but by replacing the auto-covariance estimate with the one given below. ˆθ MCOV = ˆΣ P ˆσ P N 1 ˆσ vv [l 1, l 2 ] = (v[k l 1 ]x[k l 2 ] + x[k P + l 1 ]v[k P + l 2 ]) k=p ˆΣ P,ij = ˆσ[i, j]; ˆσ P,i = ˆσ[i, 1], i = 1,, P ; j = 1,, P (25a) (25b) (25c) Note: The covariance matrix ˆΣ P as the D-L method cannot be applied. is no longer Toeplitz and therefore a recursion algorithm such Arun K. Tangirala System Identification July 26, 2013 17

Properties of covariance estimators 1 In both the LS and MCOV methods, the regressor ϕ[k] and the prediction error are constructed from k = P to k = N 1 unlike in the Y-W method. Thus, the LS and the MCOV methods do not pad the data. 2 The asymptotic properties of the covariance (LS) and the MCOV estimators are, however, identical to that of the Y-W estimator (reference). 3 Application of these methods to the estimation of line spectra (sinusoids embedded in noise) produces better results than the Y-W method, especially for short data records. The modified covariance estimator fares better than the OLS in this respect. 4 On the other hand, stability of the resulting models is not guaranteed while using the covariance-based estimators. Moreover, the variance-covariance matrix does not possess a Toeplitz structure, which is disadvantageous from a computational viewpoint. Arun K. Tangirala System Identification July 26, 2013 18

Example Estimating AR(2) using LS and MCOV For the series of the example illustrating Y-W method, estimate the parameters using the LS and MCOV methods. Solution: The LS method yields The MCOV method yields ˆd 1 = 1.269; ˆd2 = 0.3833 ˆd 1 = 1.268; ˆd2 = 0.3827 which only slightly differ among each other and the Y-W estimates. The standard errors in both estimates are identical to those computed in the Y-W case by virtue of the properties discussed above. Arun K. Tangirala System Identification July 26, 2013 19

Burg s estimator Burg s method (Burg s reference) minimizes the same objective as the MCOV method except that it aims at incorporating two desirable features: i. Stability of the estimated AR model ii. A D-L like recursion algorithm for parameter estimation. The key idea is to employ the reflection coefficient (negative PACF coefficient)- based AR representation. Therefore, the reflection coefficients κ p, p = 1,, P are estimated instead of the model parameters. Stability of the model is guaranteed by requiring the magnitudes of the estimated reflection coefficients to be each less than unity. The optimization problem remains the same as in the MCOV method. ˆθ Burg = arg min κ p N 1 (ε 2 F [k] + ε 2 B[k P ]) (26) k=p Arun K. Tangirala System Identification July 26, 2013 20

Burg s method... contd. In order to arrive at a D-L like recursive solution, the forward and backward prediction errors associated with a model of order p are re-written as follows: p [ ] [ ] ε (p) F [k] = v[k] + 1 d iv[k i] = v[k] v[k p] θ (p) (27) i=1 p [ ] [ ] ε (p) B [k p] = v[k p] + d iv[k p + i] = v[k] v[k p] θ(p) (28) 1 i=1 Then, using [ ] θ (p) θ (p 1) + κ p = θ(p 1) κ p (29) the following recursive relations can be obtained: ε (p) B ε (p) F [k] = ε(p 1) F [k] + κ pε (p 1) B [k p] (30) [k p] = ε(p 1) B [k p] + κ pε (p 1) F [k] (31) Arun K. Tangirala System Identification July 26, 2013 21

Burg s method... contd. Inserting the recursive relations into the objective function and solving ˆκ p = 2 N 1 n=p ( N 1 n=p (ε (p 1) F ε (p 1) F [n]ε (p 1) B [n p] [n]) 2 + (ε (p 1) B [n p]) 2) (32) Stability of the estimated model can be verified by showing that the optimal reflection coefficient in (32) satisfies κ p 1, p. The estimates of the innovations variance are also recursively updated as: ˆσ 2(p) e = ˆσ 2(p 1) e (1 ˆκ 2 p) (33) Given that the reflection coefficients are always less than unity in magnitude, the innovations variance is guaranteed to decrease with increase in order. Arun K. Tangirala System Identification July 26, 2013 22

Burg s estimation procedure A basic procedure for Burg s algorithm thus follows: Burg s method 1 Set p = 0 and θ (0) = 0 so that the forward and backward prediction errors are initialized to ε (0) F [k] = v[k] = ε(0) F [k]. 2 Increment the order p by one and compute κ p+1 using (32). 3 Update the parameter vector θ (p+1) using (29). 4 Update the prediction errors for the incremented order using (27) and (28) 5 Repeat steps 2-4 until a desired order p = P. It is easy to see that the optimal estimate of κ 1 with the initialization above is ˆκ 1 = ρ vv[1], which is also the optimal LS estimate of an AR(1) model. A computationally efficient version of the above algorithm, known by the name Burg s recursion, updates the denominator of (32) recursively. Arun K. Tangirala System Identification July 26, 2013 23

Properties of Burg s estimator Asymptotic properties of optimal estimates of κ p are not trivial to derive particularly when the postulated model order is lower than the true order P 0. It is even more difficult to analyze the properties of parameter estimates since they are not explicitly optimized. The following is a summary of facts on the Burg s estimator from extensive studies by several researchers: 1 The bias of Burg s estimates are as large as those of the LS estimates, but lower than those of the Yule-Walker, especially when the underlying process is auto-regressive with roots near the unit circle. 2 The variance of ˆκ p for models with orders p P 0 is given by 1 κ 2 p var(ˆκ p ) = N, p = P 0 (34) 1 N, p > P 0 The case of p > P 0 is consistent with the result for the variance of the PACF coefficient estimates at lags l > P 0 given by (??) and (7). Arun K. Tangirala System Identification July 26, 2013 24

Properties of Burg s estimator... contd. 3 The innovations variance estimate is asymptotically unbiased, again when the postulated order is at least equal to the true order ( E(ˆσ e) 2 = σe 2 1 p ), p P 0 = lim N N E(ˆσ2 e) = σe 2 (35) 4 All reflection coefficients for orders p P 0 are independent of the lower order estimates. 5 By the asymptotic equivalence of Burg s method with the Y-W estimator, the distribution and covariance of resulting parameter estimates are identical to that given in (5). The difference is in the point estimate of θ and the estimate of the innovations variance. 6 Finally, a distinct property of Burg s estimator is that it guarantees stability of AR models. Arun K. Tangirala System Identification July 26, 2013 25

Example Simulated AR(2) series For the simulated series considered in the previous examples, obtain Burg s estimates of the model parameters. Solution: ˆd 1 = 1.267; ˆd2 = 0.3827 which are almost identical to the MCOV estimates. Once again given the large sample size, the asymptotic properties can be expected to be identical to those of previous methods. Arun K. Tangirala System Identification July 26, 2013 26

Estimation of MA models The problem of estimating an MA model is more involved than that of the AR parameters primarily because the predictor is non-linear in the unknowns. With an MA(M) model the predictor is ˆv[k k 1] = c 1 e[k 1] + + c M e[k M], k M (36) wherein both the parameters and the past innovations are unknown. Thus, the non-linear least squares estimation method and the MLE are popularly used for estimating MA models. Both these methods require a proper initialization so as to not get lost in local minima. A few popular methods for obtaining preliminary estimates are briefly discussed. For details, read (reference). Arun K. Tangirala System Identification July 26, 2013 27

Preliminary estimates of MA models Four popular methods are used to seed the NLS and MLE algorithms. 1 Method of moments: Same as Y-W method, but now the equations are non-linear. For instance, to estimate an MA(1) model, we have ˆρ vv [1] = c 1 1 + c 2 1 (37) giving rise to two solutions. Only invertible solutions are accepted. 2 Durbin s estimator: The idea underlying Durbin s estimator is to first generate the innovation sequence through a high-order AR model. Subsequently, the MA(M) model is re-written as v[k] ê[k] = M c i ê[k i] (38) where ê[k] = ˆD(q 1 )v[k] is the estimate obtained from the AR model. The order of the AR model used for this purpose can be selected in different ways, for e.g., using AIC or BIC. A simple guideline recommends P = 2M. For a more detailed reading, see Broersen (reference). Arun K. Tangirala System Identification July 26, 2013 28 i=1

Preliminary estimates of MA models 3 Innovations algorithm: It is similar to the D-L algorithm for AR models. The key idea is to use the innovations representation of the MA model by recalling that the white-noise sequences are also theoretically the one-step ahead predictions. Defining c 0 1 M M v[k] = c ie[k i] = c i(v[k] ˆv[k k 1]) (39) i=0 i=0 A recursive algorithm can be now constructed. i. Set m = 0 and ˆσ 2 e,0 = ˆσ 2 v. ii. Compute j 1 ĉ m,m j = (ˆσ e,m) (σ 2 1 vv[m j] i=0 ĉ j,j iĉ m,m iˆσ 2 e,i M 1 iii. Update the innovations variance ˆσ e,m 2 = ˆσ v 2 ĉ 2 m,m j ˆσ e,j 2 j=0 iv. Repeat steps (ii) and (iii) until a desired order m = M. ), 0 j < m (40) Arun K. Tangirala System Identification July 26, 2013 29

Preliminary estimates of MA models... contd. 4 Hannan-Rissanen s method: The approach is similar to that of Durbin s estimator in the sense that the innovations are replaced by their estimates from an AR model. However, the difference is that the parameters are estimated from a linear least-squares regression of v[k] on estimated past innovations: ˆv[k] = M c i ê[k i], k M (41) i=1 The past terms of ê[k] are obtained as the residuals of a sufficiently high AR (p ) model. The parameter estimates can be further updated using an additional step, but it can be usually avoided. For additional details, refer to Brockwell [2002]. Arun K. Tangirala System Identification July 26, 2013 30

Estimation of ARMA models Given a set of N observations {v[0], v[1],, v[n 1]} of a process, estimate the P = P + M parameters θ = d 1 d P c 1 c M of [ ] T the ARMA(P, M) model P M v[k] + d j v[k j] = c i e[k i] + e[k] (42) j=1 and the innovations variance σ 2 e. It is assumed without loss of generality that the generating process is zero-mean. i=1 Due to the presence of the MA component, once again the predictor is non-linear in unknowns rendering the optimization problem complicated. Standard solvers are based on either NLS or ML methods. Arun K. Tangirala System Identification July 26, 2013 31

Estimation of ARMA models... contd. With the nonlinear LS method, typically a Gauss-Newton method is used. Analytical expressions are used to compute the gradients (of the predictor) at each iteration. In the MLE approach, the likelihood function is set up using the prediction error (innovations) approach and a nonlinear optimization solver such as the G-N method is used. Any one of the four methods discussed earlier for MA models can be used to initialize the algorithms. The Y-W method is the standard choice. See Schumway and Stoffer (reference) for a theoretical discussion of the NLS and MLE algorithms, i.e., how to evaluate the gradients for the former or set up the likelihood functions for the latter. Arun K. Tangirala System Identification July 26, 2013 32

NLS and ML estimators of ARMA models Deriving the asymptotic properties of the NLS and ML estimators is beyond the scope of this text. Only the main result is stated (see Brockwell and Davis [1991], Brockwell [2002], Shumway and Stoffer [2006]). The parameter estimates of an ARMA(P, M) model obtained from the unconditional, conditional least squares and the ML estimators initialized with the method of moments are asymptotically consistent. Further, N( ˆθ θ0) AN ( 0, σ 2 es(θ 0) 1) (43) The (P + M) (P + M) covariance matrix S is given by [ ] E(x P x T P ) E(x P wm T ) S = E(w M x T P ) E(w M wm T ) (44) where x P x P = w M = and w M are constructed from two auto-regressive processes [ ] T 1 x[k 1] x[k 2] x[k P ] ; x[k] = [ w[k 1] w[k 2] w[k M] e[k] (45) D(q 1 ) ] T 1 ; w[k] = e[k] (46) C(q 1 ) Arun K. Tangirala System Identification July 26, 2013 33

Remarks The block diagonals S 11 (P P ) and S 22 (M M) are essentially the auto-covariance matrices of x[k] and w[k] respectively, while the off-diagonals are the matrices of cross-covariance functions between x[k] and w[k]. A few special cases are discussed 1 AR(1): For this case, S is a scalar. Using (44), S = E(x[k 1]x[k 1]) = σ 2 e/(1 d 2 1) = var( ˆd 1) = 1/(1 d 2 1) (47) Thus, as the pole of the AR(1) process draws closer to unit circle, the variance estimate increases drastically. This makes a case for building ARIMA models. 2 MA(1): Using (44), S = E(w[k 1]w[k 1]) = σ 2 /(1 c 2 1) = var(ĉ 1) = 1/(1 c 2 1) (48) Just as with the AR case, when the zero of the MA model is on the unit circle, the variance of the estimate is very large. For small samples, no expression for the variance or the distribution exists. In such cases, the bootstrapping method is an effective alternative. Arun K. Tangirala System Identification July 26, 2013 34

Procedure to fit an ARMA model Systematic procedure 1 Carry out a visual examination of the series. Inspect the data for outliers, drifts, significantly differing variances, etc. 2 Perform the necessary pre-processing of data (e.g., removal of trends, transformation) to obtain a stationary series. 3 If the intent is to develop a pure AR model, use PACF and likewise for a pure MA model, use ACF for estimating the orders. For ARMA models, a good start is an ARMA(1,1) model. 4 For AR models, use the MCOV or Burg s method with the chosen order. If the purpose is spectral estimation, then prefer the MCOV method. For MA and ARMA models, generate preliminary estimates (typically using the Y-W or the H-R method) with the chosen orders. Use these preliminary estimates with an MLE or NLS algorithm to obtain optimal estimates. 5 Subject the model to a quality (diagnostic) check. If the model passes all the checks, then accept this model. Else work towards an appropriate model order until satisfactory results are obtained. Arun K. Tangirala System Identification July 26, 2013 35

Example Estimating an ARMA model The objective of this exercise is to build an ARMA representation for the process whose ACF and PACF plots are shown below. 1.2 Auto correlation function 1 Partial auto correlation function 1 0.8 0.8 0.6 ACF 0.6 0.4 PACF 0.4 0.2 0.2 0 0 0.2 0.2 0 5 10 15 20 Lags 0.4 0 5 10 15 20 Lags Beginning with an ARMA(1,1) choice, the estimated model is Ĥ(q 1 ) = 1 + (±0.025) 0.418 q 1 (49) 1 0.629 q 1 (±0.02) Arun K. Tangirala System Identification July 26, 2013 36

Model Assessment The standard errors in the estimates reported below each coefficient estimate reveal that the model has a good precision (low variability). Additionally, the model is both stationary and invertible. The ACF of residuals from the estimated model is shown below. The model is thus satisfactory in both respects. 1.2 1 0.8 ACF 0.6 0.4 0.2 0 0.2 0 5 10 15 20 Lags ACF of the residuals from the ARMA(1,1) model Arun K. Tangirala System Identification July 26, 2013 37

ARMA estimation... contd. It is of interest to note that the true process is also an ARMA(1,1) representation: H(q 1 ) = 1 + 0.4q 1 1 0.6q 1 It is a coincidence thus that the orders of the estimated model and the generating process agree. When the residual whiteness test indicates the need for increasing the model order, there is no definitive way of determining whether the numerator or denominator order or both should be increased. The solution has to be determined by trial and error. Fortunately, we can converge to a working model within a handful of iterations since an ARMA(2,2) representation is capable of representing a large class of stationary processes (reference). In general, when competing models are available, the decision on the final model is based on information criteria measures (see Module 8). Arun K. Tangirala System Identification July 26, 2013 38

MATLAB code for estimating the ARMA model 1 % Generate data 2 Hq = i d p o l y ( 1, [ ], [ 1 0. 4 ], [ 1 0. 6 ], [ ], N o i s e v a r i a n c e, 1 ) ; 3 ek = randn ( 1 0 0 0, 1 ) ; 4 vk = sim (Hq, ek ) ; 5 6 % Remove mean 7 vkd = d e t r e n d ( vk, c o n s t a n t ) ; 8 9 % Plot ACF and PACF 10 a c f ( vkd. y, 2 0, 1 ) 11 p a c f ( vkd. y, 2 0, 1 ) 12 13 % F i t an ARMA( 1, 1 ) model 14 mod arma = armax ( vk, [ 1 1 ] ) ; 15 p r e s e n t ( mod arma ) 16 17 % ACF o f r e s i d u a l s 18 e r r a r m a = pe ( mod arma, vk ) ; % N o t i c e e r r a r m a i s not i d d a t a o b j e c t 19 a c f ( e r r a r m a, 2 0, 1 ) ; Arun K. Tangirala System Identification July 26, 2013 39

Estimation of ARIMA models In Module 3 we learnt that non-stationarities are of two types, deterministic (e.g., trend type non-stationarity) and stochastic (e.g., mean, variance and integrating type non-stationarity). Of particular interest are the difference stationary processes which are nicely represented by ARIMA models, d v[k] = C(q 1 ) D(q 1 ) e[k], = 1 q 1 (50) which are capable of handling trend-type non-stationarities as well. Arun K. Tangirala System Identification July 26, 2013 40

Estimating ARIMA models... contd. The additional step in ARIMA modelling is determining the degree of differencing d, the orders of the ARMA components P and M and the parameters of the C and D polynomials. Given that ARIMA models are primarily meant for difference stationary processes and that unnecessary differencing can cause more harm than good, it is first important to examine the data for the presence of non-stationarities and also determine their type before arriving at a decision to fit an ARIMA model. These are the preliminary steps in a general procedure for building ARIMA models as outlined next. Arun K. Tangirala System Identification July 26, 2013 41

Steps for building an ARIMA model Procedure 1 Examine/test the series for integrating type non-stationarity using visual inspection of the series and/or the ACF plots and the unit root tests (e.g., Dickey-Fuller, Phillips-Perron tests). If the series exhibits strong evidence for unit roots, then an ARIMA model can be fit after following steps 2 and 3 below. Conducting unit root tests can be challenging and involved. They have to be performed with care and should be corroborated with visual observations of the series as well as the ACF/PACF plots. 2 If there is a strong evidence (additionally) for trend type non-stationarities, remove them by fitting polynomial functions to the series (using OLS method for example) and work with the residuals of this fit. Denote these by w[k]. 3 If the residuals (or the series in the absence of trends) is additionally known to contain growth effects, then a logarithmic transformation is recommended. Call the resulting series as w[k] or ṽ[k] as the case maybe. 4 Determine the appropriate degree of differencing d (by a visual or statistical testing of the differenced series). 5 Fit an ARMA model to d w[k] or d ṽ[k] (or to the respective untransformed series if step 3 is skipped). Arun K. Tangirala System Identification July 26, 2013 42

Summary of Lecture 1 AR models are much easier to estimate than MA models because they give rise to linear predictors A variety of methods are available to estimate AR models - popular ones being the Yule-Walker, LS / COV, modified covariance and Burg s method. Among the four methods, Y-W and Burg s method guarantee stability, but the latter is better for processes with poles close to unit circle. MCOV method is preferred when AR models are used in spectral estimation. ML methods are generally not used for estimating AR models because the improvement achieved is marginal ARMA (and MA) models give rise to non-linear optimization algorithms that require preliminary estimates NLS and ML estimators both yield asymptotically similar ARMA model estimates. The best ARMA model is almost always determined iteratively, but in a systematic manner. Arun K. Tangirala System Identification July 26, 2013 43