X t = a t + r t, (7.1)

Similar documents
State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

The Kalman Filter. An Algorithm for Dealing with Uncertainty. Steven Janke. May Steven Janke (Seminar) The Kalman Filter May / 29

Time Series I Time Domain Methods

Module 9: Stationary Processes

ARIMA Modelling and Forecasting

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

Statistics 910, #15 1. Kalman Filter

STRUCTURAL TIME-SERIES MODELLING

Lesson 4: Stationary stochastic processes

Part I State space models

Chapter 9: Forecasting

4 Derivations of the Discrete-Time Kalman Filter

Open Economy Macroeconomics: Theory, methods and applications

Econometría 2: Análisis de series de Tiempo

Elements of Multivariate Time Series Analysis

Statistics Homework #4

CS281A/Stat241A Lecture 17

CS229 Lecture notes. Andrew Ng

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Forecasting. This optimal forecast is referred to as the Minimum Mean Square Error Forecast. This optimal forecast is unbiased because

Part II. Time Series

ECO 513 Fall 2009 C. Sims CONDITIONAL EXPECTATION; STOCHASTIC PROCESSES

Statistics of stochastic processes

2D Image Processing. Bayes filter implementation: Kalman filter

Classic Time Series Analysis

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Introduction to Stochastic processes

Some Time-Series Models

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Time-Varying Parameters

Introduction to Probabilistic Graphical Models: Exercises

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

CS281 Section 4: Factor Analysis and PCA

Monitoring and data filtering II. Dan Jensen IPH, KU

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

9 Multi-Model State Estimation

7. Forecasting with ARIMA models

STAT 518 Intro Student Presentation

Linear Dynamical Systems

2D Image Processing. Bayes filter implementation: Kalman filter

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Factor Analysis and Kalman Filtering (11/2/04)

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends

A time series is called strictly stationary if the joint distribution of every collection (Y t

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Time Series Analysis -- An Introduction -- AMS 586

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Exercises - Time series analysis

Class: Trend-Cycle Decomposition

Univariate Nonstationary Time Series 1

Time Series 2. Robert Almgren. Sept. 21, 2009

Class 1: Stationary Time Series Analysis

Time Series Examples Sheet

Review Session: Econometrics - CLEFIN (20192)

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Characteristics of Time Series

Lecture Note 1: Probability Theory and Statistics

Master Degree in Data Science. Particle Filtering for Nonlinear State Space Models. Hans-Peter Höllwirth. Director: Dr. Christian Brownlees

A Practical Guide to State Space Modeling

6.041/6.431 Fall 2010 Quiz 2 Solutions

Econ 623 Econometrics II Topic 2: Stationary Time Series

Generalized Autoregressive Score Smoothers

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Chapter 17: Undirected Graphical Models

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Time Series Solutions HT 2009

Time Series: Theory and Methods

Modelling Non-linear and Non-stationary Time Series

Lecture 2: From Linear Regression to Kalman Filter and Beyond

SUPPLEMENT TO TESTING FOR REGIME SWITCHING: A COMMENT (Econometrica, Vol. 80, No. 4, July 2012, )

The Kalman Filter ImPr Talk

Bayes Filter Reminder. Kalman Filter Localization. Properties of Gaussians. Gaussians. Prediction. Correction. σ 2. Univariate. 1 2πσ e.

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

Lecture 2: ARMA(p,q) models (part 2)

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Discrete time processes

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5)

ENGR352 Problem Set 02

Applied Time. Series Analysis. Wayne A. Woodward. Henry L. Gray. Alan C. Elliott. Dallas, Texas, USA

Introduction to Probabilistic Graphical Models

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

Multivariate GARCH models.

5: MULTIVARATE STATIONARY PROCESSES

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Autonomous Navigation for Flying Robots

Empirical Market Microstructure Analysis (EMMA)

5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model

Lecture 2: Univariate Time Series

You must continuously work on this project over the course of four weeks.

Transcription:

Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical Decomposition Model of Chapter 2 as special cases, but go well beyond both They are important because (i) they provide a rich family of naturally interpretable models for data, and (ii) they lead to highly efficient estimation and forecasting algorithms, through the Kalman recursions see 73 They are widely used and, perhaps in consequence, are known under several different names: structural models (econometrics), dynamic linear models (Statistics), Bayesian forecasting models (Statistics), linear system models (engineering), Kalman filtering models (control engineering), The essential idea is that behind the observed time series X t there is an underlying process S t which itself is evolving through time in a way that reflects the structure of the system being observed 72 The Model Example 1: The Random Walk plus Noise Model For a series X t with trend, but no seasonal or cyclic variation the Classical Decomposition of 23 is based on X t = a t + r t, (71) where a t represents deterministic trend the underlying general level or signal at t and r t represents random variation or noise To make this into a more precisely specified model we might suppose that r t is white noise WN(0, σ 2 r) Also, instead of supposing that a t is deterministic we could take it to be random as well, but not changing much over time A way of representing this would be to suppose a t = a t 1 + η t, (72) 76

where η t is white noise WN(0, σ 2 η), uncorrelated with the r t Equations (71) and (72) together define the Random Walk plus Noise model, also known as the Local Level model A realization from this model, with a 0 = 0 and σ 2 r = 6, σ 2 η = 3, is given below Realization from Local Level Model -15-10 -5 0 5 0 20 40 60 80 100 Time The model might be suitable for an industrial process, with process level a t intended to be within design limits, but not directly observable itself The sizes of σ 2 r and σ 2 η will determine local and large-scale smoothness respectively Review Question: How will σ 2 r and σ 2 η affect local and large-scale smoothness? How would the graph of a process for which σ 2 r/σ 2 η is large differ from that of one for which it is small? Example 2: A Seasonal Model with Noise In the Classical Decomposition Method the seasonal/cyclic components s t with a period c say were taken to be numbers which repeated themselves every c time units and summed to 0 over the c seasons : that is, for each t, c s t+c = s t, s t+j = 0 for each t j=1 Given any c 1 values s 1,, s c 1, we can generate such a sequence by setting and s c = s 1 s c 1 (73) s t+c = s t for all t = 1, 2, (74) The result is a pattern exactly reproducing itself every c time units Note that (73) and (74) together amount to saying that s t can be found successively from s t = s t c+1 s t c+2 s t 1, t = c, c + 1, (75) 77

We can introduce some variability into the seasonal pattern by adding a white noise perturbation to (75): s t = s t c+1 s t c+2 s t 1 + η t, where η t is, say, WN(0, σ 2 η) On average (in expectation) the s t s will still sum to 0 over the seasons, but individual variability is now possible If, as in the previous example, we suppose that actual observations on the process are subject to error, we get X t = s t + r t (76) c 1 s t = s t j + η t, (77) j=1 where the observation error r t might be taken to be WN(0, σ 2 r) The two equations (76) and (77) define the model A realization with c = 12 is shown below Realization from Seasonal Model -2-1 0 1 2 3 4 10 15 20 25 30 35 40 Time in Cycles If we write s t,, s t c+2 as a vector, S t say, then (77) can be written in matrix form as s t 1 1 1 s t 1 η t s t 1 1 0 0 s t 2 0 S t = = 0 1 0 +, 0 0 1 0 0 that is, s t c+2 s t c+1 S t = F S t 1 + V t, (78) 78

say, where F is the (c 1) (c 1) matrix above and V t is the random vector V t = (η t, 0,, 0) The other equation defining the model, (76), may be written in terms of S t as X t = (1, 0,, 0)S t + r t (79) General Form of State Space Model In the examples each model consists of two parts: an underlying process m t or S t, called in general the state process of the system, whose evolution is governed by one equation ((72) and (78) respectively), and another process the observed time series X t itself called in general the observation process, which is related to the state process by another equation ((71) and (79) respectively) In general a state space model consists of a pair of random quantities X t and S t whose evolution and relationship are described by the equations: X t = G t S t + ɛ t (710) S t = F t S t 1 + V t (711) where S t denotes the state at time t, G t and F t are known matrices, possibly depending on time, ɛ t is WN(0, σ 2 ɛ ), V t is a vector of white noise processes, each uncorrelated with the ɛ t process Equation (710) is called the observation equation, and equation (711) the state or system equation The component white noise processes of V t may be correlated with each other, though components of V t and V s for t s are taken to be uncorrelated We use the notation V t WN(0, {Q t }) to mean that the random vector V t consists of univariate WN components (so that it has mean 0) and has variance-covariance matrix Q t, that is: E(V t V t) = Q t In Example 1 S t can be identified directly with a t, and ɛ t with r t G t and F t are both equal to the degenerate 1-dimensional unit matrix, and V t is the 1-dimensional vector with component η t The covariance matrix Q t is therefore 79

simply σ 2 η In Example 2 the matrix G t = (1, 0,, 0) as in (79), ɛ t is r t, the matrix F t and the vector V t are as in (78), and Q t is the (c 1) (c 1) matrix with all entries zero except the top left hand one, which is equal to ση 2 Example 3: AR(1) Model The stationary AR(1) process given by X t = αx t 1 + η t (712) is another example of a state space model Identify the state S t with X t itself, so that the state equation can be taken to be (712) if we set F = α and V t = η t ; and the observation equation is just X t = S t, which has the form (710) with G = 1 and ɛ = 0 Notes (a) By iterating the state equation (711) we get S t = F t S t 1 + V t = F t (F t 1 S t 2 + V t 1 ) + V t = (F t F t 1 F 2 )S 1 + (F t F 3 )V 2 + + F t V t 1 + V t (713) = f t (S 1, V 2,, V t ) for a function f t From the observation equation therefore X t = G t f t (S 1, ) + ɛ t = g t (S 1, V 2,, V t, ɛ t ) for a function g t Thus the process is driven (through the G t and F t ) by the white noise terms and the initial state S 1 (b) It turns out to be possible to put a large number of time series models including, for example, all ARIMA models into a state space form An advantage of doing so is that the state equation gives a simple way of analysing the process S t, and from that it is easy via the observation equation to find out about the observation process X t If S 1 and V 2,, V t are independent (as opposed to just being uncorrelated) then S t has the Markov property, that is, the distribution of S t given S t 1, S t 2,, S 1 is the same as the distribution of S t given S t 1 alone 73 The Kalman Recursions 731 Filtering, Prediction and Smoothing In state space models the state is generally the aspect of greatest interest, but it is not usually observed directly What are observed are the X t s So we d like to have methods for estimating S t from the observations Three scenarios are: 80

Prediction Problem Estimate S t from X t 1, X t 2, Filtering Problem Estimate S t from X t, X t 1, Smoothing Problem Estimate S t from X n, X n 1,, where n > t A further problem, which turns out to have an answer useful for other things too, is X-Prediction Problem Estimate X t from X t 1, X t 2, 732 General Approach Note that equation (713) above shows that S t and X t are linear combinations of the initial state S 1 and the white noise processes V t and ɛ t If these are Gaussian, then both S t and X t will be Gaussian too for every t and their distributions will be completely determined by their means and covariances Thus the whole evolution of the model will be known if the means and covariances can be calculated The Kalman recursions give a highly efficient way of computing these means and covariances by building them up successively from earlier values The recursions lead to algorithms for the problems above and for fitting the models to data They are an enormously powerful tool for handling a wide range of time series models The basis of the Kalman recursions is the following simple result about multivariate Normal distributions 733 Conditioning in a Multivariate Normal Distribution Let Z and W denote random vectors with Normal distributions and with covariance matrix Z N (µ z, Σ zz ) W N (µ w, Σ ww ) E((Z µ z )(W µ w ) ) = Σ zw, so that the distribution of the vector obtained by stacking Z on W is Then ( Z W ) N (( µz µ w ) ( Σzz Σ, zw Σ wz Σ ww )) Z W N ( µ z + Σ zw Σ 1 ww(w µ w ), Σ zz Σ zw Σ 1 wwσ wz ) (714) For a proof, write down the ratio of the probability densities of (Z, W ) and of W and complete the square in the exponent term 734 The Recursions Suppose after data D t 1 = {X 1,, X t 1 } have been observed we know by some means that the state S t 1 has mean m t 1 and covariance matrix P t 1, so that S t 1 D t 1 N (m t 1, P t 1 ) 81

The recursions are built on relating the distributions of to this (a) S t D t 1, and (b) S t {X t, D t 1 } For (a), because S t is related to S t 1 through the state equation (711) (S t = F t S t 1 + V t ), it follows that, given D t 1, S t is also Normally distributed and its mean vector and covariance matrix are and S t D t 1 N (m t t 1, P t t 1 ), (715) m t t 1 = E(S t D t 1 ) = E(F t S t 1 D t 1 ) + E(V t D t 1 ) = F t m t 1 (716) P t t 1 = E ( (S t m t t 1 )(S t m t t 1 ) D t 1 ) = Ft P t 1 F t + Q t (717) For (b), from (715) and the observation equation (710), (X t = G t S t + ɛ t ) we find E(X t D t 1 ) = G t m t t 1, (718) so that and hence Similarly X t E(X t D t 1 ) = G t (S t m t t 1 ) + ɛ t Var(X t D t 1 ) = G t P t t 1 G t + σ 2 ɛ (719) Cov(X t, S t D t 1 ) = G t P t t 1, Cov(S t, X t D t 1 ) = P t t 1 G t Thus, given the data D t 1, S t and X t have the joint distribution ( ) (( ) ( St X t D mt t 1 Pt t 1 P t 1 N, t t 1 G t G t m t t 1 G t P t t 1 G t P t t 1 G t + σɛ 2 )) It follows from the result in 733 that the conditional distribution of S t given the new observation X t in addition to D t 1 (that is, the distribution of S t D t ) is where S t {X t, D t 1 } N (m t, P t ), m t = m t t 1 + P t t 1 G tvar(x t D t 1 ) 1 (X t G t m t t 1 ) (720) P t = P t t 1 P t t 1 G tvar(x t D t 1 ) 1 G t P t t 1 (721) The three equations (719), (720) and (721) called the updating equations together with (716) and (717) called the prediction equations are collectively referred to as the Kalman Filter equations Given starting values m 0 and P 0 they can be applied successively to calculate the distribution of the state vector as each new observation becomes available At any time they give values which contain all the information needed to make optimal predictions of future values of both the state and the observations, as follows 82

735 The Prediction and Filtering Problems By the general result about minimum mean square error forecasts in 61, the conditional mean of S t given D t 1, m t t 1, is the minimum mean square error estimate of the state S t given observations up to and including time t 1 The covariance matrix P t t 1 gives the estimation error variances and covariances Thus the prediction equations (716) and (717) give the means to solve the Prediction Problem of 731 In the same way the conditional mean of S t given D t, m t, is the solution to the Filtering Problem of 731, and the variances and covariances of the error in estimating S t by m t are given by P t 736 The X-Prediction Problem The minimum mean square error forecast of X t given observations up to time t 1, that is, given D t 1, is simply X t = G t m t t 1, by (718) The prediction error, which we will denote by e t, is therefore e t = X t X t = X t G t m t t 1 = G t (S t m t t 1 ) + ɛ t e t is also known as the innovation at time t since it consists of the new information in the observation at t From the updating equation (720) it can be seen that the innovations play a key part in the updating of the estimate of S t 1 to S t The further e t is from the zero vector, the greater the correction in the estimator of S t 1 The innovations have means E(e t ) = 0, and variances, which we will denote by φ t, given by φ t = Var(e t ) = E(X t G t m t t 1 ) 2 = G t P t t 1 G t + σ 2 ɛ, (722) from (719) and (721) The φ t can be calculated straightforwardly from the Kalman filter equations 737 Likelihood The likelihood function, L say, for any model is the probability density (or probability in the discrete case) of the observed data, taken as a function of the unknown parameters, θ say For a state space model therefore, if data X 1 = x 1,, X t = x t have been observed, and if p is the joint probability density function of X 1,, X t, L(θ : x) = p (x : θ) = p (x 1 θ) t p(x s D s 1 : θ) where the density function p(x s D s 1 ) is that of the Normal distribution (of X s given D s 1 ) with mean E(X s D s 1 ) = X s = G s m s s 1 and variance φ s given by (722) 83

Thus log L = const + log p(x 1 θ) 1 2 = const + log p(x 1 θ) 1 2 t log φ s 1 2 t log φ s 1 2 t (x s X s ) 2 which is easily calculated from the innovations and their variances, and p (x 1 θ) if necessary Standard methods of numerical maximization may then be used to estimate the unknown parameters θ This is the approach described in 531 t e 2 s φ s, φ s Summary of Ideas in Chapter 6 State space models specified by state variables observation variables and by state equations describing the evolution of states, and observation equations describing the relationship of observations to states Special cases include the ARIMA models studied in earlier chapters, and models underlying the Decomposition Method of 23 of Chapter 2 Various problems in relation to forecasting in state space models may be specified: prediction filtering smoothing X-prediction Solutions are based on the fact that if variables are Gaussian, then to describe the whole evolution of the system all that s needed are the means and variances-covariances of the variables through time These can be calculated recursively by the Kalman recursions The Kalman recursions yield immediately forecasts and their error variances efficient computation of the likelihood function, and therefore a powerful way of fitting models 84