Linear Prediction Theory

Similar documents
ESE 524 Detection and Estimation Theory

For final project discussion every afternoon Mark and I will be available

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD

Statistical and Adaptive Signal Processing

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Some Time-Series Models

ADAPTIVE FILTER THEORY

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

12.4 Known Channel (Water-Filling Solution)

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Detection and Estimation Theory

Properties of Zero-Free Spectral Matrices Brian D. O. Anderson, Life Fellow, IEEE, and Manfred Deistler, Fellow, IEEE

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

2 Introduction of Discrete-Time Systems

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

SIMON FRASER UNIVERSITY School of Engineering Science

ADAPTIVE FILTER THEORY

Time-Varying Systems and Computations Lecture 3

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Notes on Time Series Modeling

X t = a t + r t, (7.1)

Previously on TT, Target Tracking: Lecture 2 Single Target Tracking Issues. Lecture-2 Outline. Basic ideas on track life

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

FINANCIAL ECONOMETRICS AND EMPIRICAL FINANCE -MODULE2 Midterm Exam Solutions - March 2015

RECURSIVE ESTIMATION AND KALMAN FILTERING

Title. Description. var intro Introduction to vector autoregressive models

Lecture 8: Bayesian Estimation of Parameters in State Space Models

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

F denotes cumulative density. denotes probability density function; (.)

Derivation of the Kalman Filter

Lecture 2: Univariate Time Series

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

{ ϕ(f )Xt, 1 t n p, X t, n p + 1 t n, { 0, t=1, W t = E(W t W s, 1 s t 1), 2 t n.

Lecture 7: Linear Prediction

Machine Learning Lecture Notes

Time Series Examples Sheet

New Introduction to Multiple Time Series Analysis

ARMA Estimation Recipes

10-701/15-781, Machine Learning: Homework 4

(Extended) Kalman Filter

Latent Variable Models and EM algorithm

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

ENGR352 Problem Set 02

8 Basics of Hypothesis Testing

Time Series: Theory and Methods

The Unscented Particle Filter

Introduction. Chapter 1

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

FE570 Financial Markets and Trading. Stevens Institute of Technology

Stabilization with Disturbance Attenuation over a Gaussian Channel

Hypothesis testing (cont d)

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Institute of Actuaries of India

EUSIPCO

DATA ASSIMILATION FOR FLOOD FORECASTING

Introduction: The Perceptron

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Model-based Correlation Measure for Gain and Offset Nonuniformity in Infrared Focal-Plane-Array Sensors

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

Frequentist-Bayesian Model Comparisons: A Simple Example

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Mathematical Formulation of Our Example

Elements of Multivariate Time Series Analysis

Robust Backtesting Tests for Value-at-Risk Models

14 - Gaussian Stochastic Processes

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Gaussian Message Passing on Linear Models: An Update

A HOMOTOPY CLASS OF SEMI-RECURSIVE CHAIN LADDER MODELS

If we want to analyze experimental or simulated data we might encounter the following tasks:

MCMC analysis of classical time series algorithms.

Q-Learning and Stochastic Approximation

Linear Models for Regression CS534

Decomposition. bq (m n) R b (n n) r 11 r 1n

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Exercises - Time series analysis

The Kalman Filter. Data Assimilation & Inverse Problems from Weather Forecasting to Neuroscience. Sarah Dance

DETECTION theory deals primarily with techniques for

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Information Formulation of the UDU Kalman Filter

Linear Processes in Function Spaces

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

STATIC AND DYNAMIC RECURSIVE LEAST SQUARES

Analysis of Redundant-Wavelet Multihypothesis for Motion Compensation

Multiple realizations: Model variance and data uncertainty

State Observers and the Kalman filter

Parameter Estimation, Sampling Distributions & Hypothesis Testing

ARIMA Modelling and Forecasting

Dependence and independence

The Viterbi Algorithm EECS 869: Error Control Coding Fall 2009

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Detection and Estimation Theory

Gradient-Adaptive Algorithms for Minimum Phase - All Pass Decomposition of an FIR System

Transcription:

Linear Prediction Theory Joseph A. O Sullivan ESE 524 Spring 29 March 3, 29 Overview The problem of estimating a value of a random process given other values of the random process is pervasive. Many problems in forecasting fall into this category. The Kalman filter applies if there is a nown underlying state space model for the system. If no such state space model exists, then some approximation must be used. If the covariance function for a zero mean random process is nown, then the use of a linear predictor is often a good choice. In linear prediction, the next value is estimated as a linear combination of past values, the parameters being chosen to minimize the mean square error between the next value and its estimate. If the true distribution is Gaussian, then the linear estimator of a given order minimizes the mean square error over all estimators of that order. This ability to use the algorithms independent of the true distribution maes them universal in the sense that they achieve performance for any true distributions. When the true distribution is an autoregressive model of order, then the th order linear filter is optimal. In that case, the previous inputs can be considered the state, and the Kalman filter applied. The steady state version of the Kalman filter is equivalent to the th order linear filters here. The choice of the order of a linear estimator (that is, the number of past values used to predict the current value) is important. The mean square error monotonically decreases with increasing order of the estimator. Including too many coefficients increases complexity unnecessarily. In addition, if there is some uncertainty in the covariance function, then increasing the model order beyond some critical value may lead to overfitting the data, increasing the mean square error. When the model order increases, the coefficients used may be computed recursively. These recursive computations are classical, being based on efficient inversion due to the Toeplitz structure of the data covariance matrix. For the theoretical analysis, we assume Gaussian statistics and derive the optimal estimators. The linear predictors defined here form the basis for much of modern adaptive signal processing, including the least mean square (LMS) and recursive least squares (RLS) algorithms and their many variants. The algorithms are also instructive for many other estimation problems including array signal processing. The filter structures that result include the transversal and lattice filters. 2 Summary of Recursive Estimation Equations Let r n be a stationary, zero mean Gaussian random process with covariance function c l Er n r n l. ()

The problem of interest is estimating (predicting) one value of the random process given previous values, Er n r n,r n 2,...,r n. Linearity and Gaussian statistics yield ˆr n w r n + w 2 r n 2 +...+ w r n (2) w T r (n ). (3) The coefficients do not depend on time due to stationarity. In this equation, The orthogonality principle states that and thus that r (n ) r n r n +... r n 2 r n T (4) w w w... w 2 w T. (5) E(r n w T r (n ))r (n ) T (6) γ T w T Γ (7) w Γ γ. (8) Here, Γ is a array with (i, j) elementequaltoc i j. The vector γ is a vector γ c c... c 2 c T. (9) Note the following recursive structures: γ + c+ γ () and Γ + γ T c c (Jγ ) T Γ γ Jγ Γ, () (2) where J is called an exchange matrix and has ones along its antidiagonal and zeros elsewhere; the matrix J has the property that J times a vector equals that vector with its entries reordered from bottom to top. These two decompositions of Γ + yield two different, but closely related ways of recursively computing its inverse. These two ways rely on defining the ( +) vector a w Using this vector, the forward estimation (prediction) error may be written F (n) r n. (3) w l r n l (4) l a T r +(n). (5) 2

The inverse of Γ + using () may be written as where Γ + Γ + p a a T, (6) p c γ T Γ γ (7) E(r n w T r (n )) 2 (8) EF (n) 2 (9) is the th order forward prediction error variance. Note that γ Γ Γ + a w c γ T w c γ T Γ p γ (2) (2) (22) This result actually verifies the inversion formula above because ( Γ Γ + I γ T Γ I γ T Γ + p a a T ) (23) + p p a T (24) + γ T Γ (25) I. (26) To write the inverse related to the second decomposition of Γ + in a similar form, we define a new vector b to have the elements of a in the opposite order. Then we have the following equations: The vector b determines the bacward prediction error G (n), JΓ + J Γ + (27) Ja b (28) JJ I (29) G (n) r n Er n r n +,...,r n,r n (3) r n (Jw) T r (n) (3) r T + b. (32) This vector satisfies b Jw (33) 3

and JΓ + a JΓ + JJa (34) JΓ + Jb (35) Γ + b. (36) We also have so JΓ + a J p p Γ + b p (37) (38) (39) where The inverse of Γ + using (2) may be written as Γ + Γ The recursive structure is further clarified through + p b b T. (4) w + Γ + γ + (4) c+ Γ + b γ b T p γ + (42) + Δ b, (43) w p Δ b T γ + (44) a T Jγ + (45) Plugging this last form for w + into the definition of a + yields w+ a + (46) w Δ b (47) p Δ b. (48) a p Similarly, plugging into the definition of b + yields b + Jw + (49) 4

Jw b Δ Δ p p a Jb (5). (5) 2. Recursive Transversal Filter Coefficient Computation Inputs: c,c,c 2,... Outputs: prediction error filters in transversal and lattice forms.. Initialization step: p c ; a b ;γ c ;Γ c ;. 2. Reflection coefficient and prediction variance computation: Δ a T Jγ + (52) p a T γ (53) c 3. Update forward and bacward prediction error filters: a + a b b + Δ p Δ p b a (54). (55) 4. Recursion step: +; γ + c+ γ ; (56) return to reflection coefficient computation. The computational complexity of this algorithm is determined by the computations for the reflection coefficient and the filter update steps. In the reflection coefficient and prediction variance computation, there are 2 multiplies and 2 additions. In the two filter update equations, the multiplies are all the same and the resulting values are just reordered versions of each other (reordered using J). Thus, there is one division and there are multiplies and additions. The number of computations is thus 3 multiplications, division, and 3 additions per stage. The total number of computations from stage through stage is 3( + )/2 multiplies, divisions, and 3( + )/2 additions. Some divisions can be avoided by using as input the sequence of correlation coefficients,c /c,c 2 /c,... instead of c,c,c 2,..., the sequence of correlations. 2.2 Lattice Filter Structure Two equivalent representations of the filters described above are the transversal and lattice filter representations. The transversal filters are described in terms of the coefficients a, a 2,...,a and b, b 2,...,b.The lattice filters are described in terms of the coefficients Δ p, Δ p,..., Δ p. 5

The update equations (54) and (55) may be used to describe the lattice structure in terms of the forward prediction errors F (n) and the bacward prediction errors G (n): F + (n) a T + r +2(n) (57) T r a +2(n) Δ T b R p +2(n) (58) a T r + (n) Δ b T r + (n ) p (59) F (n) Δ G (n ), p (6) where the ey step is recognizing that zeros in a T and b T correspond to reducing the length of r +2 (n) from +2to +, and shifting one time unit in the latter case. Similarly, G + (n) b T + r +2(n) (6) T b R +2(n) Δ T R p a +2(n) (62) b T r +(n ) Δ a T p R +(n) (63) G (n ) Δ p F (n). (64) In matrix form, one stage of the lattice filter has the form F+ (n) Δ p G + (n) Δ p F (n) G (n ). (65) If the lattice filter structure is used in the implementation, then the multiplications needed to update a and b are not needed. The computational complexity of the algorithm may be reduced even further as described below. 2.3 Faster Computations In the recursive algorithm, the computations associated with the reflection coefficient and the prediction variance can essentially be eliminated. To see this, consider the forward and bacward filters used with inputs equal to the correlations c n. At time (noting that c n c n ), the output of the forward prediction error filter of order equals c w T γ which equals p.attime, the output of the bacward prediction error filter of order equals c (Jw ) T γ b T γ + (66) Δ. (67) In this interpretation of the computations, the filters up through order may be used to compute the quantities needed for the next update by simply running the correlation coefficients through the filters. This saves computations by using the lattice filter structure: p Δ a T b T γ+ c, (68) 6

where the zeros in the matrix on the right side are scalars; that is, the matrix is 2 ( + 2) and has a zero in the upper left and bottom right corners. Δ p Δ p F (n) G (n ). (69) 2.4 Key Properties 2.5 Autoregressive Gaussian Processes A zero mean, stationary, Gaussian random process r,r 2,... is an mth order autoregressive process if r n a r n a 2 r n 2... a m r n m + w n, (7) for all n, wherew n are independent and identically distributed Gaussian random variables with zero mean and variance σ 2. An mth order autoregressive process is mth order Marov in the sense that the probability density function of r n given r n,r n 2,...,r equals the probability density function of r n given r n,r n 2,...,r n m. Defining the vector a m a m a m... a 2 a T, (7) may be rewritten as Let the covariance function for the random process be C,so a T m r m+(n) w n. (7) C E{r n r n }. (72) Comment: In order for this equation to model a stationary random process and to be viewed as a generative model for the data, the corresponding discrete time system must be stable. That is, if one were to compute the transfer function in the Z-transform domain, then all of the poles of the transfer function must be inside of the unit dis in the complex plane. These poles are obviously the roots of the characteristic equation with coefficients a j. a. Using the autoregressive model in equation (??), show that the covariance function satisfies the equations C + a C + a 2 C 2 +...+ a m C m σ 2 (73) C + a C + a 2 C 2 +...+ a m C m, (74) where the second equation holds for all >. Hint: Multiply both sides of (??) by a value of the random sequence and tae expected values. Use the symmetry property of covariance functions for the first equality. 2.6 Bacground and Understanding of Autoregressive Models Suppose that r,r 2,...is a stationary sequence of Gaussian random variables with zero mean. The covariance function is determined by an autoregressive model which the random variables satisfy. The autoregressive model is an mth order Marov model, meaning that More specifically, suppose that b. Derive a recursive structure for computing the logarithm of the probability density function of r n,r n 2,...,r. More specifically, let v n lnp(r,r 2,...,r n ). (75) 7

Derive an expression for v n in terms of v n and an update. Focus on the case where n>m. Hint: This is a ey part of the problem, so mae sure you do it correctly. It obviously relates to the Marov property expressed through the autoregressive model in (??). c. Consider the special case of m. Suppose that C. Find a relationship between a and σ 2 (essentially you must solve (74) in this general case). Comment: Note that the stability requirement implies that a <. 2.7 Recursive Detection for Autoregressive Models Suppose that one has to decide whether data arise from an autoregressive model or from white noise. In this problem, the log-lielihood ratio is computed recursively. Under hypothesis H, the data arise from the autoregressive model (??). Under hypothesis H, the data R n are i.i.d. Gaussian with zero mean and variance C. That is, under either hypothesis the marginal distribution on any sample R n is the same. The only difference between the two models is in the covariance structure. a. Find the log-lielihood ratio for n samples. Call this log-lielihood ratio l n. Derive a recursive expression for l n in terms of l n and an update. Focus on the case n>m. b. Consider the special case of m. Write down the recursive structure for this case. c. The performance increases as n grows. This can be quantified in various ways. One way is to compute the information rate functions for each n. In this problem, you will compute a special case. Consider again m. Find the log-moment generating function for the difference between l n and l n conditioned on each hypothesis, and conditioned on previous measurements; call these two log-moment generating functions m (s) andm (s): m (s) lne{e s(ln ln ) H,r,r 2,...,r n }. (76) Compute and plot the information rate functions I (x) andi (x) for these two log-moment generating functions. Comment: These two functions quantify the increase in information for detection provided by the new measurement. 2.8 Recursive Estimation for Autoregressive Models In this problem, you will estimate the parameters in an autoregressive model given observations of the data r n,r n,...,r. a. First, assume that the maximum lielihood estimate for the parameters given data r n,r n 2,...,r satisfies B n â n d n, (77) where the vector â n is the maximum lielihood estimate of the parameter vector a a a 2... a m T. (78) Find the update equations for B n and d n. These may be obtained by writing down the lielihood equation using the recursive update for the log-lielihood function, and taing the derivative with respect to the parameter vector. b. The computation for â n may also be written in recursive form. This is accomplished using the matrix inversion lemma. The matrix inversion lemma states that a ran one update to a matrix yields a ran one 8

update to its inverse. More specifically, if A is an m m symmetric, invertible matrix and f is an m vector, then (A + ff T ) A A f +f T A f f T A. (79) Use this equation to derive an equation for the estimate â n in terms of â n. Hint: The final form should loo lie â n â n + g n r n + â T n (r n r n 2...r n m ) T, (8) where an auxiliary equation defines the vector g n in terms of B n and the appropriate definition of f. 2.9 Recursive Detection: Order Versus Order 2 Autoregressive Model A decision must be made between two models for a sequence of Gaussian distributed random variables. Each model is an autoregressive model. The first model is autoregressive of order one, while the second is autoregressive of order two. There are two goals here as outlined below. First, the optimal test statistic for a Neyman-Pearson test must be computed for a fixed number N of consecutive samples of a realization. Second, an efficient update of this test statistic to the case with N + samples must be derived. Consider the following two hypotheses. Under H, the model for the measurements is y i.75y i + w i, (8) where w i are independent and identically distributed Gaussian random variables with zero mean and variance equal to 7/4.75; w i are independent of y for all i; andy is Gaussian distributed with zero mean and variance 4. Under H 2, the model for the measurements is y i.75y i +.2y i 2 + w i, (82) where w i are independent and identically distributed Gaussian random variables with zero mean and variance equal to.75; w i are independent of y for all i; andy is Gaussian distributed with zero mean and variance 4. y.75y + w where w is a zero mean Gaussian random variable with zero mean and variance.75. a. Given y,y,...,y N, find the optimal test statistic for a Neyman-Pearson test. Simplify the expression as much as possible. Interpret your answer. b. Denote the test statistic computed in part a by l N. The optimal test statistic for N + measurements is l N+. Find an efficient update rule for computing l N+ from l N. 9