Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Similar documents
ESTIMATION THEORY. Chapter Estimation of Random Variables

Least Squares and Kalman Filtering Questions: me,

Lecture 4: Least Squares (LS) Estimation

Statistics 910, #15 1. Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Appendix A : Introduction to Probability and stochastic processes

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Least Squares Estimation Namrata Vaswani,

X t = a t + r t, (7.1)

A Theoretical Overview on Kalman Filtering

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

Ch. 12 Linear Bayesian Estimators

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

1 Kalman Filter Introduction

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Problem Set 3 Due Oct, 5

Linear Dynamical Systems

Simple Linear Regression for the MPG Data

Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

9 Forward-backward algorithm, sum-product on factor graphs

Factor Analysis and Kalman Filtering (11/2/04)

A Probability Review

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Statistics Homework #4

UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei)

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Estimation techniques

STAT 100C: Linear models

ENGR352 Problem Set 02

STA 4273H: Statistical Machine Learning

CS281A/Stat241A Lecture 17

9 Multi-Model State Estimation

Lecture Note 12: Kalman Filter

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

11 - The linear model

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Levinson Durbin Recursions: I

Kalman Filter. Man-Wai MAK

Lecture Outline. Target Tracking: Lecture 3 Maneuvering Target Tracking Issues. Maneuver Illustration. Maneuver Illustration. Maneuver Detection

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

A measurement error model approach to small area estimation

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

2 Statistical Estimation: Basic Concepts

ENGG2430A-Homework 2

Final Exam. Economics 835: Econometrics. Fall 2010

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

L06. LINEAR KALMAN FILTERS. NA568 Mobile Robotics: Methods & Algorithms

5. Density evolution. Density evolution 5-1

Levinson Durbin Recursions: I

Lecture Note 1: Probability Theory and Statistics

TSRT14: Sensor Fusion Lecture 8

Machine Learning 4771

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Solutions to Homework Set #6 (Prepared by Lele Wang)

The Hilbert Space of Random Variables

The Kalman Filter. An Algorithm for Dealing with Uncertainty. Steven Janke. May Steven Janke (Seminar) The Kalman Filter May / 29

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

Final Examination Solutions (Total: 100 points)

18 Bivariate normal distribution I

For more information about how to cite these materials visit

Information Theoretic Imaging

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

A Short Course in Basic Statistics

ECE531 Lecture 10b: Dynamic Parameter Estimation: System Model

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

CS 532: 3D Computer Vision 6 th Set of Notes

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

LINEAR MMSE ESTIMATION

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

First Year Examination Department of Statistics, University of Florida

Midterm Exam 1 Solution

Problem Set 7 Due March, 22

Introduction to Simple Linear Regression

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

UCSD ECE250 Handout #20 Prof. Young-Han Kim Monday, February 26, Solutions to Exercise Set #7

Low-Density Parity-Check Codes

CS181 Midterm 2 Practice Solutions

Principal Components Theory Notes

Chapter 3: Maximum Likelihood Theory

Vectors and Matrices Statistics with Vectors and Matrices

Detection and Estimation Theory

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Tutorial on Principal Component Analysis

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Transcription:

Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random Vectors 4 1

Vector Detection Let the signal Θ = θ 0 with probability p 0 and Θ = θ 1 with probability p 1 = 1 p 0 We observe the RV Y, where Y {Θ = θ 0 } f Y Θ (y θ 0 ) and Y {Θ = θ 1 } f Y Θ (y θ 1 ) We wish to find the estimate ˆΘ(Y) that minimizes the probability of detection error P{ˆΘ Θ} The optimal estimate is obtained using the MAP decoder θ 0 if p Θ Y (θ 0 y) p ˆΘ(y) = Θ Y (θ 1 y) > 1 θ 1 otherwise When p 0 = p 1 = 1/2, the MAP decoder reduces to the ML decoder θ 0 if f Y Θ (y θ 0) f ˆΘ(y) = Y Θ (y θ 1 ) > 1 θ 1 otherwise EE 278B: Random Vectors 4 2

Reconstruction on the Tree Consider a complete binary reconstruction tree of finite depth k X 1 Z 1 Z 2 X 2 X 3 Z 3 Z 4 Z 5 Z 6 X 4 X 5 X 6 X 7 The root node is assigned a r.v. X 1 Bern(1/2) (the signal) Denote the two children of each non-leaf node i as l i and r i (e.g., for i = 1, l 1 = 2 and r 1 = 3) EE 278B: Random Vectors 4 3

A random variable is assigned to each non-root node as follows X li = X i Z li, X ri = X i Z ri, where Z 1,Z 2,... are i.i.d. Bern(ǫ), ǫ 1/2, r.v.s independent of X 1 That is, the r.v. assigned to a node is the output of a binary symmetric channel (BSC) whose input is the r.v. of its parent Denote the set of leaf r.v.s that are descendants of node i as X i (e.g., for i = 1, X 1 = (X 4,X 5,X 6,X 7 ), and for i = 4, X 4 = X 4 in figure) We observe the leaf node r.v.s X 1 and wish to find the estimate ˆX 1 (X 1 ) that minimizes the probability of error P e = P{ ˆX 1 X 1 } This problem is a simple example of the reconstruction on the tree problem, which arises in computational evolutionary biology (phylogenetic reconstruction), statistical physics, and theoretical computer science. A question of interest in these fields is under what condition on the channel noise can X 1 be reconstructed with P e < 1/2 as the tree depth k The reconstruction problem itself is an example of graphical models in which random variables dependencies are specified by a graph (STAT 375, CS 228) EE 278B: Random Vectors 4 4

Since X 1 Bern(1/2), the optimal estimate is obtained using the ML decoder 0 if p X 1 X (x 1 1 0) p ˆX 1 (X 1 ) = X1 X (x 1 1 1) > 1 1 otherwise Because of the special structure of the observation r.v.s, the optimal estimate can be computed using a fast iterative message passing algorithm Define L i,0 = p Xi X i (x i 0) L i,1 = p Xi X i (x i 1) Then the ML estimate can be written as ˆX 1 (X 1 ) = { 0 if L1,0 > L 1,1 1 otherwise We now show that L 1,0,L 1,1 can be computed (in order of the number of nodes in the tree) by iteratively computing the intermediate likelihoods L i,0,l i,1 beginning with the leaf nodes for which L i,0 = 1 x i L i,1 = x i EE 278B: Random Vectors 4 5

By the law of total probability, for a non-leaf node i, we can write L i,0 = p Xi X i (x i 0) = p Xli,X ri X i (0,0 0) p Xi X i,x li,x ri (x i 0,0,0) +p Xli,X ri X i (0,1 0) p Xi X i,x li,x ri (x i 0,0,1) +p Xli,X ri X i (1,0 0) p Xi X i,x li,x ri (x i 0,1,0) +p Xli,X ri X i (1,1 0) p Xi X i,x li,x ri (x i 0,1,1) = p Xli X i (0 0)p Xri X i (0 0) p Xi X i,x li,x ri (x i 0,0,0) +p Xli X i (0 0) p Xri X i (1 0)p Xi X i,x li,x ri (x i 0,0,1) +p Xli X i (1 0) p Xri X i (0 0)p Xi X i,x li,x ri (x i 0,1,0) +p Xli X i (1 0)p Xri X i (1 0)p Xi X i,x li,x ri (x i 0,1,1) conditional independence = ǫ 2 p Xi X i,x li,x ri (x i 0,0,0)+ ǫǫ p Xi X i,x li,x ri (x i 0,0,1) +ǫ ǫ p Xi X i,x li,x ri (x i 0,1,0)+ǫ 2 p Xi X i,x li,x ri (x i 0,1,1) L i,1 can be expressed similarly EE 278B: Random Vectors 4 6

Now, since X i = (X li,x ri ), by conditional independence, p Xi X i,x li,x ri (x i x i,x li,x ri ) = p Xli,X ri X i,x li,x ri (x li,x ri x i,x li,x ri ) Hence we obtain the following iteratively equations where, at the leaf nodes = p Xli X li (x li x li )p Xri X ri (x ri x ri ) L i,0 = ( ǫl li,0+ǫl li,1)( ǫl ri,0+ǫl ri,1), L i,1 = (ǫl li,0+ ǫl li,1)(ǫl ri,0+ ǫl ri,1), L i,0 = p Xi X i (x i 0) = 1 x i L i,1 = p Xi X i (x i 1) = x i Hence to compute L 1,0 and L 1,1, we start with the likelihoods at each leaf node, then compute the likelihoods for the nodes at level k 1, and so on until we arrive at node 1 EE 278B: Random Vectors 4 7

Detection for Vector Additive Gaussian Noise Channel Consider the vector additive Gaussian noise (AGN) channel Y = Θ+Z, where the signal Θ = θ 0, an n-dimensional real vector, with probability 1/2 and Θ = θ 1 with probability 1/2, and the noise Z N(0,Σ Z ) are independent We observe y and wish to find the estimate ˆΘ(Y) that minimizes the probability of decoding error P{ˆΘ Θ} First assume that Σ Z = NI, i.e., additive white Gaussian noise channel The optimal decoding rule is the ML decoder. Define the log likelihood ratio Then, the ML decoder is ˆΘ(y) = Λ(y) = ln f(y θ 0) f(y θ 1 ) { θ0 if Λ(y) > 0 θ 1 otherwise EE 278B: Random Vectors 4 8

Now, Λ(y) = 1 2N [ (y θ1 ) T (y θ 1 ) (y θ 0 ) T (y θ 0 ) ] Hence, the ML decoder reduces to the minimum distance decoder { θ0 if y θ 0 < y θ 1 ˆΘ(y) = θ 1 otherwise We can simplify this further to { θ0 if y ˆΘ(y) T (θ 1 θ 0 ) < 1 = 2 (θt 1θ 1 θ T 0θ 0 ) θ 1 otherwise Hence, the decision depends only on the value of a scalar r.v. W = Y T (θ 1 θ 0 ). Such r.v. is referred to as a sufficient statistic for the optimal decoder. Further, W {Θ = θ 0 } N(θ T 0(θ 1 θ 0 ),N(θ 1 θ 0 ) T (θ 1 θ 0 )), W {Θ = θ 1 } N(θ T 1(θ 1 θ 0 ),N(θ 1 θ 0 ) T (θ 1 θ 0 )) EE 278B: Random Vectors 4 9

Assuming that the signals have the same power, i.e., θ T 0θ 0 = θ T 1θ 1 = P, the optimal decoding rule reduces to the matched filter decoder (receiver) { θ0 if y ˆΘ(y) T (θ 1 θ 0 ) < 0 = θ 1 otherwise, that is, ˆΘ(y) = { θ0 if w < 0 θ 1 if w 0 This is the same as the optimal rule for the scalar case discussed in Lecture notes 1! The minimum probability of error is P e = Q 0θ 1 2N(P θ T 0θ 1 ) = Q P θ T 0θ 1 2N EE 278B: Random Vectors 4 10

This is minimized by using antipodal signals θ 0 = θ 1, which yields ( ) P P e = Q N Exactly the same as scalar antipodal signals Now suppose that the noise is not white, i.e., Σ Z NI. Then the ML decoder reduces to { θ0 if (y θ 0 ) ˆΘ(y) T Σ 1 Z = (y θ 0) < (y θ 1 ) T Σ 1 Z (y θ 1) θ 1 otherwise Now, let y = Σ 1/2 Z y and θ i = Σ 1/2 Z θ i for i = 0,1, then the rule becomes the same as that for the white noise case { θ0 if y ˆΘ(y) θ 0 < y θ 1 = θ 1 otherwise and can be simplified to the scalar case as before Thus, the optimal decoder is to first multiply Y by Σ 1/2 Z to obtain Y and then to apply the optimal rule for the white noise case with the transformed signals θ i = Σ 1/2 Z θ i, i = 0,1 EE 278B: Random Vectors 4 11

Vector Linear Estimation Let X f X (x) be a r.v. representing the signal and let Y be an n-dimensional RV representing the observations The minimum MSE estimate of X given Y is the conditional expectation E(X Y). This is often not practical to compute either because the conditional pdf of X given Y is not known or because of high computational cost The MMSE linear (or affine) estimate is easier to find since it depends only on the means, variances, and covariances of the r.v.s involved To find the MMSE linear estimate, first assume that E(X) = 0 and E(Y) = 0. The problem reduces to finding a real n-vector h such that n ˆX = h T Y = h i Y i minimizes the MSE = E [ (X ˆX) 2] i=1 EE 278B: Random Vectors 4 12

MMSE Linear Estimate via Orthogonality Principle To find ˆX we use the orthogonality principle: we view the r.v.s X,Y 1,Y 2,...,Y n as vectors in the inner product space consisting of all zero mean r.v.s defined over the underlying probability space The linear estimation problem reduces to a geometry problem: find the vector ˆX that is closest to X (in norm of error X ˆX) X signal ˆX error vector X ˆX subspace spanned by Y 1,Y 2,...,Y n EE 278B: Random Vectors 4 13

To minimize MSE = X ˆX 2, we choose ˆX so that the error vector X ˆX is orthogonal to the subspace spanned by the observations Y 1,Y 2,...,Y n, i.e., hence E [ (X ˆX)Y i ] = 0, i = 1,2,...,n, n E(Y i X) = E(Y i ˆX) = h j E(Y i Y j ), i = 1,2,...,n j=1 Define the cross covariance of Y and X as the n-vector Σ YX = E [ (Y E(Y))(X E(X)) ] = For n = 1 this is simply the covariance σ Y1 X σ Y2 X. σ Yn X The above equations can be written in vector form as Σ Y h = Σ YX If Σ Y is nonsingular, we can solve the equations to obtain h = Σ 1 Y Σ YX EE 278B: Random Vectors 4 14

Thus, if Σ Y is nonsingular then the best linear MSE estimate is: ˆX = h T Y = Σ T YX Σ 1 Y Y Compare this to the scalar case, where ˆX = Cov(X,Y) Y σy 2 Now to find the minimum MSE, consider MSE = E [ (X ˆX) 2] = E [ (X ˆX)X ] E [ ] (X ˆX) ˆX = E [ (X ˆX)X ], since by orthogonality (X ˆX) ˆX = E(X 2 ) E( ˆXX) = Var(X) E ( Σ T YXΣ 1 Y YX) = Var(X) Σ T YXΣ 1 Y Σ YX Compare this to the scalar case, where minimum MSE is Var(X) Cov(X,Y )2 If X or Y have nonzero mean, the MMSE affine estimate ˆX = h 0 +h T Y is determined by first finding the MMSE linear estimate of X E(X) given Y E(Y) (minimum MSE for ˆX and ˆX are the same), which is ˆX = Σ T YX Σ 1 Y (Y E(Y)), and then setting ˆX = ˆX +E(X) (since E( ˆX) = E(X) is necessary) σ 2 Y EE 278B: Random Vectors 4 15

Example Let X be the r.v. representing a signal with mean µ and variance P. The observations are Y i = X +Z i, for i = 1,2,...,n, where the Z i are zero mean uncorrelated noise with variance N, and X and Z i are also uncorrelated Find the MMSE linear estimate of X given Y and its MSE For n = 1, we already know that ˆX 1 = P P +N Y 1 + N P +N µ To find the MMSE linear estimate for general n, first let X = X µ and Y i = Y i µ. Thus X and Y are zero mean The MMSE linear estimate of X given Y is given by ˆX n = h T Y, where Σ Y h = Σ YX, thus P +N P P h 1 P P P +N P h 2....... = P. P P P +N P h n EE 278B: Random Vectors 4 16

By symmetry, h 1 = h 2 = = h n = Therefore ˆX n = ˆX n = ( P n ) (Y i µ) np +N i=1 The mean square error of the estimate: P np + N. Thus P np +N +µ = n i=1 Y i ( P n Y i )+ np +N i=1 MSE n = P E( ˆX nx ) = PN np +N N np +N µ Thus as n, MSE n 0, i.e., the linear estimate becomes perfect (even though we don t know the complete statistics of X and Y ) EE 278B: Random Vectors 4 17

Linear Innovation Sequence Let X be the signal and Y be the observation vector (all zero mean) Suppose the Y i s are orthogonal, i.e., E(Y i Y j ) = 0 for all i j, and let ˆX(Y) be the best linear MSE estimate of X given Y and ˆX(Y i ) be the best linear MSE estimate of X given only Y i for i = 1,...,n, then we can write n ˆX(Y) = ˆX(Y i ), i=1 MSE = Var(X) n i=1 Cov 2 (X,Y i ) Var(Y i ) Hence the computation of the best linear MSE estimate and its MSE are very simple In fact, we can compute the estimates and the MSE causally (recursively) ˆX(Y i+1 ) = ˆX(Y i )+ ˆX(Y i+1 ) MSE i+1 = MSE i Cov2 (X,Y i+1 ) Var(Y i+1 ) EE 278B: Random Vectors 4 18

This can be proved by direct evaluation of MMSE linear estimate or using orthogonality: X ˆX(Y 1 ) Y 1 ˆX(Y 2 ) ˆX(Y 2 ) Y 2 EE 278B: Random Vectors 4 19

Now suppose the Y i s are not orthogonal. We can still express the estimate and its MSE as sums We first whiten Y to obtain Z. The best linear MSE estimate of X given Y is exactly the same as that given Z (why?) The estimate and its MSE can then be computed as n ˆX(Y) = ˆX(Z i ) i=1 MSE = Var(X) n Cov 2 (X,Z i ) i=1 We can compute an orthogonal observation sequence Ỹ from Y causally: Given Y i, we compute the error of the best linear MSE estimate of Y i+1, Ỹ i+1 (Y i ) = Y i+1 Ŷ i+1 (Y i ) Clearly, Ỹi+1 (Ỹ1,Ỹ2,...,Ỹi), hence we can write i Ŷ i+1 (Y i ) = Ŷ i+1 (Ỹj) j=1 EE 278B: Random Vectors 4 20

Interpretation: Ŷ i+1 is the part of Y i+1 predictable by Y i, hence carries no useful new information for estimating X beyond Y i Ỹ i+1 by comparison is the unpredictable part, hence carries new information As such, Ỹ is called the linear innovation sequence of Y Remark: If we normalize Ỹ (by dividing each Ỹ i by its standard deviation), we obtain the same sequence as using the Cholesky decomposition in Lecture notes 3 Example: Let the observation sequence be Y i = X +Z i for i = 1,2,...,n, where X, Z 1,..., Z n are zero mean, uncorrelated r.v.s with E(X 2 ) = P and E(Zi 2 ) = N for i = 1,2,...,n. Find the linear innovation sequence of Y Using the innovation sequence, the MMSE linear estimate of X given Ỹ i+1 and its MSE can be computed causally ˆX(Ỹ i+1 ) = ˆX(Ỹ i )+ ˆX(Ỹi+1), MSE i+1 = MSE i Cov2 (X,Ỹ i+1 ) Var(Ỹ i+1 ) The innovation sequence will prove useful in deriving the Kalman filter EE 278B: Random Vectors 4 21

Kalman Filter The Kalman filter is an efficient, recursive algorithm for computing the MMSE linear estimate and its MSE when the signal X and observations Y evolve according to a state-space model Consider a linear dynamical system described by the state-space model: with noisy observations (output) X i+1 = A i X i +U i, i = 0,1,...,n Y i = X i +V i, i = 0,1,...,n, where X 0, U 0,U 1,...,U n, V 0,V 1,...,V n are zero mean, uncorrelated RVs with Σ X0 = P 0, Σ Ui = Q i, Σ Vi = N i ; A i is a known sequence of matrices V i U i X i+1 X i Delay Y i A i EE 278B: Random Vectors 4 22

This state space model is used in many applications: Navigation, e.g., of a car: State: is location, speed, heading, acceleration, tilt, steering wheel position of vehicle Observations: inertial (accelerometer, gyroscopes), electronic compass, GPS Phase locked loop: State: phase and frequency offsets Observations: noisy observation of phase Computer vision, e.g., face tracking: State: Pose, motion, shape (size, articulation), appearance (light, color) Observations: video frame sequence Economics... EE 278B: Random Vectors 4 23

The goal is to compute the MMSE linear estimate of the state from causal observations: Prediction: Find the estimate ˆX i+1 i of X i+1 from Y i and its MSE Σ i+1 i Filtering: Find the estimate ˆX i i of X i from Y i and its MSE Σ i i The Kalman filter provides clever recursive equations for computing these estimates and their error covariance matrices EE 278B: Random Vectors 4 24

Scalar Kalman Filter Consider the scalar state space system: X i+1 = a i X i +U i, i = 0,1,...,n with noisy observations Y i = X i +V i, i = 0,1,...,n, where X 0, U 0,U 1,...,U n, V 0,V 1,...,V n are zero mean, uncorrelated r.v.s with Var(X 0 ) = P 0, Var(U i ) = Q i, Var(V i ) = N i, and a i is a known sequence V i U i X i+1 X i Delay Y i a i EE 278B: Random Vectors 4 25

Kalman filter (prediction): Initialization: ˆX 0 1 = 0, σ 2 0 1 = P 0 Update equations: For i = 0,1,2,...,n, the estimate is where the filter gain is The MSE of ˆX i+1 i is ˆX i+1 i = a i ˆXi i 1 +k i (Y i ˆX i i 1 ), k i = a iσ 2 i i 1 σ 2 i i 1 +N i σ 2 i+1 i = a i(a i k i )σ 2 i i 1 +Q i EE 278B: Random Vectors 4 26

Example: Let a i = 1, Q i = 0, N i = N, and P 0 = P (so X 0 = X 1 = X 2 = = X), and Y i = X +V i (this is the same as the earlier estimation example) Kalman filter: Initialization: ˆX 0 1 = 0 and σ 2 0 1 = P The update in each step is ˆX i+1 i = (1 k i ) ˆX i i 1 +k i Y i with and the MSE is k i = σ2 i i 1 σ 2 i i 1 +N, σ 2 i+1 i = (1 k i)σ 2 i i 1 EE 278B: Random Vectors 4 27

We can solve for σi+1 i 2 explicitly ( ) σi+1 i 2 = 1 σ2 i i 1 σi i 1 2 +N σi i 1 2 = Nσ2 i i 1 σi i 1 2 +N The gain is 1 σ 2 i+1 i The recursive estimate is = 1 N + 1 σ 2 i i 1 σ 2 i+1 i = 1 i/n +1/P = NP ip +N ˆX i+1 i = k i = (i 1)P +N ip +N P ip +N ˆX i i 1 + We thus obtain the previous result in a recursive form P ip +N Y i EE 278B: Random Vectors 4 28

Example: Let n = 200, P 0 = 1, N i = 1 For i = 1 to 100: a i = α 2, Q i = P 0 (1 α 2 ) with α = 0.95 (memory factor) For i = 100 to 200: a i = 1, Q i = 0 (i.e., state remains constant) 2 1 Xi 0 1 2 0 20 40 60 80 100 120 140 160 180 200 4 2 Yi 0 2 4 0 20 40 60 80 100 120 140 160 180 200 2 ˆXi+1 i 1 0 1 0 20 40 60 80 100 120 140 160 180 200 i EE 278B: Random Vectors 4 29

Xi+1 ˆX i+1 i 1 0 1 2 0 20 40 60 80 100 120 140 160 180 200 Yi Xi+1 4 2 0 2 4 0 20 40 60 80 100 120 140 160 180 200 1 σ 2 i+1 i 0.5 0 0 20 40 60 80 100 120 140 160 180 200 i EE 278B: Random Vectors 4 30

Derivation of the Kalman Filter We use innovations. Let Ỹi be the innovation r.v. for Y i, then we can write ˆX i+1 i = ˆX i+1 i 1 +k i Ỹ i, σ i+1 i = σ i+1 i 1 +k i Cov(X i+1,ỹi) where ˆX i+1 i 1 and σ i+1 i 1 are the MMSE linear estimate of X given Y i 1 and its MSE, and k i = Cov(X i+1,ỹi) Var(Ỹi) Now, since X i+1 = a i X i +U i, by linearity of MMSE linear estimate, we have and ˆX i+1 i 1 = a i ˆXi i 1 σ 2 i+1 i 1 = a2 iσ 2 i i 1 +Q i EE 278B: Random Vectors 4 31

Now, the innovation r.v. for Y i is Ỹ i = Y i Ŷ i (Y i 1 ) Since Y i = X i +V i and V i is uncorrelated with Y j, j = 1,2,...,i 1, Ŷ i (Y i 1 ) = ˆX i i 1 Hence, This yields Ỹ i = Y i ˆX i i 1 ˆX i+1 i = a i ˆXi i 1 +k i Ỹ i = a i ˆXi i 1 +k i (Y i ˆX i i 1 ) Now, consider σ 2 i+1 i = σ2 i+1 i 1 k icov(x i+1,ỹ i ), k i = Cov(X i+1,ỹi) Var(Ỹi) = Cov(a ix i +U i,x i ˆX i i 1 +V i ) Var(X i ˆX i i 1 +V i ) = Cov(a ix i,x i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) EE 278B: Random Vectors 4 32

= a icov(x i,x i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) = a icov(x i ˆX i i 1,X i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) since (X i ˆX i i 1 ) ˆX i i 1 The MSE is = a ivar(x i ˆX i i 1 ) Var(X i ˆX i i 1 )+N i = a iσ 2 i i 1 σ 2 i i 1 +N i σ 2 i+1 i = σ2 i+1 i 1 k icov(a i X i +U i,x i ˆX i i 1 +V i ) = σ 2 i+1 i 1 k ia i σ 2 i i 1 = a i (a i k i )σ 2 i i 1 +Q i This completes the derivation of the scalar Kalman filter EE 278B: Random Vectors 4 33

Vector Kalman Filter The above scalar Kalman filter can be extended to the vector state space model: Initialization: ˆX0 1 = 0, Σ 0 1 = P 0 Update equations: For i = 0,1,2,...,n, the estimate is where the filter gain matrix The covariance of the error is ˆX i+1 i = A iˆxi i 1 +K i (Y i ˆX i i 1 ), K i = A i Σ i i 1 (Σ i i 1 +N i ) 1 Σ i+1 i = A i Σ i i 1 A T i K i Σ i i 1 A T i +Q i Remark: If X 0, U 0,U 1,...,U n and V 0,V 1,...,V n are Gaussian (zero mean, uncorrelated), then the Kalman filter yields the best MSE estimate of X i, i = 0,...,n EE 278B: Random Vectors 4 34

Filtering Now assume the goal is to compute the MMSE linear estimate of X i given Y i, i.e., instead of predicting the next state, we are interested in estimating the current state We denote this estimate by ˆX i i and its MSE by σ 2 i i The Kalman filter can be adapted to this case as follows: Initialization: ˆX 0 0 = P 0 P 0 +N 0 Y 0 σ 2 0 0 = P 0N 0 P 0 +N 0 Update equations: For i = 1,2,...,n, the estimate is ˆX i i = a i 1 (1 k i ) ˆX i 1 i 1 +k i Y i EE 278B: Random Vectors 4 35

with filter gain and MSE recursion k i = σ 2 i i = (1 k i) a2 i 1 σ2 i 1 i 1 +Q i 1 a 2 i 1 σ2 i 1 i 1 +Q i 1+N i ( ) a 2 i 1σi 1 i 1 2 +Q i 1 Vector case Initialization: ˆX 0 0 = P 0 (P 0 +N 0 ) 1 Y 0 Σ 0 0 = P 0 (I (P 0 +N 0 ) 1 P 0 ) Update equations: For i = 1,2,...,n, the estimate is ˆX i i = (I K i )A i 1ˆXi 1 i 1 +K i Y i with filter gain K i = (A i 1 Σ i 1 i 1 A T i 1+Q i 1 ) ( A i 1 Σ i 1 i 1 A T i 1+Q i 1 +N i ) 1 and MSE recursion Σ i i = (A i 1 Σ i 1 i 1 A T i 1+Q i 1 )(I K T i ) EE 278B: Random Vectors 4 36