Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random Vectors 4 1

Vector Detection Let the signal Θ = θ 0 with probability p 0 and Θ = θ 1 with probability p 1 = 1 p 0 We observe the RV Y, where Y {Θ = θ 0 } f Y Θ (y θ 0 ) and Y {Θ = θ 1 } f Y Θ (y θ 1 ) We wish to find the estimate ˆΘ(Y) that minimizes the probability of detection error P{ˆΘ Θ} The optimal estimate is obtained using the MAP decoder θ 0 if p Θ Y (θ 0 y) p ˆΘ(y) = Θ Y (θ 1 y) > 1 θ 1 otherwise When p 0 = p 1 = 1/2, the MAP decoder reduces to the ML decoder θ 0 if f Y Θ (y θ 0) f ˆΘ(y) = Y Θ (y θ 1 ) > 1 θ 1 otherwise EE 278B: Random Vectors 4 2

Reconstruction on the Tree Consider a complete binary reconstruction tree of finite depth k X 1 Z 1 Z 2 X 2 X 3 Z 3 Z 4 Z 5 Z 6 X 4 X 5 X 6 X 7 The root node is assigned a r.v. X 1 Bern(1/2) (the signal) Denote the two children of each non-leaf node i as l i and r i (e.g., for i = 1, l 1 = 2 and r 1 = 3) EE 278B: Random Vectors 4 3

A random variable is assigned to each non-root node as follows X li = X i Z li, X ri = X i Z ri, where Z 1,Z 2,... are i.i.d. Bern(ǫ), ǫ 1/2, r.v.s independent of X 1 That is, the r.v. assigned to a node is the output of a binary symmetric channel (BSC) whose input is the r.v. of its parent Denote the set of leaf r.v.s that are descendants of node i as X i (e.g., for i = 1, X 1 = (X 4,X 5,X 6,X 7 ), and for i = 4, X 4 = X 4 in figure) We observe the leaf node r.v.s X 1 and wish to find the estimate ˆX 1 (X 1 ) that minimizes the probability of error P e = P{ ˆX 1 X 1 } This problem is a simple example of the reconstruction on the tree problem, which arises in computational evolutionary biology (phylogenetic reconstruction), statistical physics, and theoretical computer science. A question of interest in these fields is under what condition on the channel noise can X 1 be reconstructed with P e < 1/2 as the tree depth k The reconstruction problem itself is an example of graphical models in which random variables dependencies are specified by a graph (STAT 375, CS 228) EE 278B: Random Vectors 4 4

Since X 1 Bern(1/2), the optimal estimate is obtained using the ML decoder 0 if p X 1 X (x 1 1 0) p ˆX 1 (X 1 ) = X1 X (x 1 1 1) > 1 1 otherwise Because of the special structure of the observation r.v.s, the optimal estimate can be computed using a fast iterative message passing algorithm Define L i,0 = p Xi X i (x i 0) L i,1 = p Xi X i (x i 1) Then the ML estimate can be written as ˆX 1 (X 1 ) = { 0 if L1,0 > L 1,1 1 otherwise We now show that L 1,0,L 1,1 can be computed (in order of the number of nodes in the tree) by iteratively computing the intermediate likelihoods L i,0,l i,1 beginning with the leaf nodes for which L i,0 = 1 x i L i,1 = x i EE 278B: Random Vectors 4 5

By the law of total probability, for a non-leaf node i, we can write L i,0 = p Xi X i (x i 0) = p Xli,X ri X i (0,0 0) p Xi X i,x li,x ri (x i 0,0,0) +p Xli,X ri X i (0,1 0) p Xi X i,x li,x ri (x i 0,0,1) +p Xli,X ri X i (1,0 0) p Xi X i,x li,x ri (x i 0,1,0) +p Xli,X ri X i (1,1 0) p Xi X i,x li,x ri (x i 0,1,1) = p Xli X i (0 0)p Xri X i (0 0) p Xi X i,x li,x ri (x i 0,0,0) +p Xli X i (0 0) p Xri X i (1 0)p Xi X i,x li,x ri (x i 0,0,1) +p Xli X i (1 0) p Xri X i (0 0)p Xi X i,x li,x ri (x i 0,1,0) +p Xli X i (1 0)p Xri X i (1 0)p Xi X i,x li,x ri (x i 0,1,1) conditional independence = ǫ 2 p Xi X i,x li,x ri (x i 0,0,0)+ ǫǫ p Xi X i,x li,x ri (x i 0,0,1) +ǫ ǫ p Xi X i,x li,x ri (x i 0,1,0)+ǫ 2 p Xi X i,x li,x ri (x i 0,1,1) L i,1 can be expressed similarly EE 278B: Random Vectors 4 6

Now, since X i = (X li,x ri ), by conditional independence, p Xi X i,x li,x ri (x i x i,x li,x ri ) = p Xli,X ri X i,x li,x ri (x li,x ri x i,x li,x ri ) Hence we obtain the following iteratively equations where, at the leaf nodes = p Xli X li (x li x li )p Xri X ri (x ri x ri ) L i,0 = ( ǫl li,0+ǫl li,1)( ǫl ri,0+ǫl ri,1), L i,1 = (ǫl li,0+ ǫl li,1)(ǫl ri,0+ ǫl ri,1), L i,0 = p Xi X i (x i 0) = 1 x i L i,1 = p Xi X i (x i 1) = x i Hence to compute L 1,0 and L 1,1, we start with the likelihoods at each leaf node, then compute the likelihoods for the nodes at level k 1, and so on until we arrive at node 1 EE 278B: Random Vectors 4 7

Detection for Vector Additive Gaussian Noise Channel Consider the vector additive Gaussian noise (AGN) channel Y = Θ+Z, where the signal Θ = θ 0, an n-dimensional real vector, with probability 1/2 and Θ = θ 1 with probability 1/2, and the noise Z N(0,Σ Z ) are independent We observe y and wish to find the estimate ˆΘ(Y) that minimizes the probability of decoding error P{ˆΘ Θ} First assume that Σ Z = NI, i.e., additive white Gaussian noise channel The optimal decoding rule is the ML decoder. Define the log likelihood ratio Then, the ML decoder is ˆΘ(y) = Λ(y) = ln f(y θ 0) f(y θ 1 ) { θ0 if Λ(y) > 0 θ 1 otherwise EE 278B: Random Vectors 4 8

Now, Λ(y) = 1 2N [ (y θ1 ) T (y θ 1 ) (y θ 0 ) T (y θ 0 ) ] Hence, the ML decoder reduces to the minimum distance decoder { θ0 if y θ 0 < y θ 1 ˆΘ(y) = θ 1 otherwise We can simplify this further to { θ0 if y ˆΘ(y) T (θ 1 θ 0 ) < 1 = 2 (θt 1θ 1 θ T 0θ 0 ) θ 1 otherwise Hence, the decision depends only on the value of a scalar r.v. W = Y T (θ 1 θ 0 ). Such r.v. is referred to as a sufficient statistic for the optimal decoder. Further, W {Θ = θ 0 } N(θ T 0(θ 1 θ 0 ),N(θ 1 θ 0 ) T (θ 1 θ 0 )), W {Θ = θ 1 } N(θ T 1(θ 1 θ 0 ),N(θ 1 θ 0 ) T (θ 1 θ 0 )) EE 278B: Random Vectors 4 9

Assuming that the signals have the same power, i.e., θ T 0θ 0 = θ T 1θ 1 = P, the optimal decoding rule reduces to the matched filter decoder (receiver) { θ0 if y ˆΘ(y) T (θ 1 θ 0 ) < 0 = θ 1 otherwise, that is, ˆΘ(y) = { θ0 if w < 0 θ 1 if w 0 This is the same as the optimal rule for the scalar case discussed in Lecture notes 1! The minimum probability of error is P e = Q 0θ 1 2N(P θ T 0θ 1 ) = Q P θ T 0θ 1 2N EE 278B: Random Vectors 4 10

This is minimized by using antipodal signals θ 0 = θ 1, which yields ( ) P P e = Q N Exactly the same as scalar antipodal signals Now suppose that the noise is not white, i.e., Σ Z NI. Then the ML decoder reduces to { θ0 if (y θ 0 ) ˆΘ(y) T Σ 1 Z = (y θ 0) < (y θ 1 ) T Σ 1 Z (y θ 1) θ 1 otherwise Now, let y = Σ 1/2 Z y and θ i = Σ 1/2 Z θ i for i = 0,1, then the rule becomes the same as that for the white noise case { θ0 if y ˆΘ(y) θ 0 < y θ 1 = θ 1 otherwise and can be simplified to the scalar case as before Thus, the optimal decoder is to first multiply Y by Σ 1/2 Z to obtain Y and then to apply the optimal rule for the white noise case with the transformed signals θ i = Σ 1/2 Z θ i, i = 0,1 EE 278B: Random Vectors 4 11

Vector Linear Estimation Let X f X (x) be a r.v. representing the signal and let Y be an n-dimensional RV representing the observations The minimum MSE estimate of X given Y is the conditional expectation E(X Y). This is often not practical to compute either because the conditional pdf of X given Y is not known or because of high computational cost The MMSE linear (or affine) estimate is easier to find since it depends only on the means, variances, and covariances of the r.v.s involved To find the MMSE linear estimate, first assume that E(X) = 0 and E(Y) = 0. The problem reduces to finding a real n-vector h such that n ˆX = h T Y = h i Y i minimizes the MSE = E [ (X ˆX) 2] i=1 EE 278B: Random Vectors 4 12

MMSE Linear Estimate via Orthogonality Principle To find ˆX we use the orthogonality principle: we view the r.v.s X,Y 1,Y 2,...,Y n as vectors in the inner product space consisting of all zero mean r.v.s defined over the underlying probability space The linear estimation problem reduces to a geometry problem: find the vector ˆX that is closest to X (in norm of error X ˆX) X signal ˆX error vector X ˆX subspace spanned by Y 1,Y 2,...,Y n EE 278B: Random Vectors 4 13

To minimize MSE = X ˆX 2, we choose ˆX so that the error vector X ˆX is orthogonal to the subspace spanned by the observations Y 1,Y 2,...,Y n, i.e., hence E [ (X ˆX)Y i ] = 0, i = 1,2,...,n, n E(Y i X) = E(Y i ˆX) = h j E(Y i Y j ), i = 1,2,...,n j=1 Define the cross covariance of Y and X as the n-vector Σ YX = E [ (Y E(Y))(X E(X)) ] = For n = 1 this is simply the covariance σ Y1 X σ Y2 X. σ Yn X The above equations can be written in vector form as Σ Y h = Σ YX If Σ Y is nonsingular, we can solve the equations to obtain h = Σ 1 Y Σ YX EE 278B: Random Vectors 4 14

Thus, if Σ Y is nonsingular then the best linear MSE estimate is: ˆX = h T Y = Σ T YX Σ 1 Y Y Compare this to the scalar case, where ˆX = Cov(X,Y) Y σy 2 Now to find the minimum MSE, consider MSE = E [ (X ˆX) 2] = E [ (X ˆX)X ] E [ ] (X ˆX) ˆX = E [ (X ˆX)X ], since by orthogonality (X ˆX) ˆX = E(X 2 ) E( ˆXX) = Var(X) E ( Σ T YXΣ 1 Y YX) = Var(X) Σ T YXΣ 1 Y Σ YX Compare this to the scalar case, where minimum MSE is Var(X) Cov(X,Y )2 If X or Y have nonzero mean, the MMSE affine estimate ˆX = h 0 +h T Y is determined by first finding the MMSE linear estimate of X E(X) given Y E(Y) (minimum MSE for ˆX and ˆX are the same), which is ˆX = Σ T YX Σ 1 Y (Y E(Y)), and then setting ˆX = ˆX +E(X) (since E( ˆX) = E(X) is necessary) σ 2 Y EE 278B: Random Vectors 4 15

Example Let X be the r.v. representing a signal with mean µ and variance P. The observations are Y i = X +Z i, for i = 1,2,...,n, where the Z i are zero mean uncorrelated noise with variance N, and X and Z i are also uncorrelated Find the MMSE linear estimate of X given Y and its MSE For n = 1, we already know that ˆX 1 = P P +N Y 1 + N P +N µ To find the MMSE linear estimate for general n, first let X = X µ and Y i = Y i µ. Thus X and Y are zero mean The MMSE linear estimate of X given Y is given by ˆX n = h T Y, where Σ Y h = Σ YX, thus P +N P P h 1 P P P +N P h 2....... = P. P P P +N P h n EE 278B: Random Vectors 4 16

By symmetry, h 1 = h 2 = = h n = Therefore ˆX n = ˆX n = ( P n ) (Y i µ) np +N i=1 The mean square error of the estimate: P np + N. Thus P np +N +µ = n i=1 Y i ( P n Y i )+ np +N i=1 MSE n = P E( ˆX nx ) = PN np +N N np +N µ Thus as n, MSE n 0, i.e., the linear estimate becomes perfect (even though we don t know the complete statistics of X and Y ) EE 278B: Random Vectors 4 17

Linear Innovation Sequence Let X be the signal and Y be the observation vector (all zero mean) Suppose the Y i s are orthogonal, i.e., E(Y i Y j ) = 0 for all i j, and let ˆX(Y) be the best linear MSE estimate of X given Y and ˆX(Y i ) be the best linear MSE estimate of X given only Y i for i = 1,...,n, then we can write n ˆX(Y) = ˆX(Y i ), i=1 MSE = Var(X) n i=1 Cov 2 (X,Y i ) Var(Y i ) Hence the computation of the best linear MSE estimate and its MSE are very simple In fact, we can compute the estimates and the MSE causally (recursively) ˆX(Y i+1 ) = ˆX(Y i )+ ˆX(Y i+1 ) MSE i+1 = MSE i Cov2 (X,Y i+1 ) Var(Y i+1 ) EE 278B: Random Vectors 4 18

This can be proved by direct evaluation of MMSE linear estimate or using orthogonality: X ˆX(Y 1 ) Y 1 ˆX(Y 2 ) ˆX(Y 2 ) Y 2 EE 278B: Random Vectors 4 19

Now suppose the Y i s are not orthogonal. We can still express the estimate and its MSE as sums We first whiten Y to obtain Z. The best linear MSE estimate of X given Y is exactly the same as that given Z (why?) The estimate and its MSE can then be computed as n ˆX(Y) = ˆX(Z i ) i=1 MSE = Var(X) n Cov 2 (X,Z i ) i=1 We can compute an orthogonal observation sequence Ỹ from Y causally: Given Y i, we compute the error of the best linear MSE estimate of Y i+1, Ỹ i+1 (Y i ) = Y i+1 Ŷ i+1 (Y i ) Clearly, Ỹi+1 (Ỹ1,Ỹ2,...,Ỹi), hence we can write i Ŷ i+1 (Y i ) = Ŷ i+1 (Ỹj) j=1 EE 278B: Random Vectors 4 20

Interpretation: Ŷ i+1 is the part of Y i+1 predictable by Y i, hence carries no useful new information for estimating X beyond Y i Ỹ i+1 by comparison is the unpredictable part, hence carries new information As such, Ỹ is called the linear innovation sequence of Y Remark: If we normalize Ỹ (by dividing each Ỹ i by its standard deviation), we obtain the same sequence as using the Cholesky decomposition in Lecture notes 3 Example: Let the observation sequence be Y i = X +Z i for i = 1,2,...,n, where X, Z 1,..., Z n are zero mean, uncorrelated r.v.s with E(X 2 ) = P and E(Zi 2 ) = N for i = 1,2,...,n. Find the linear innovation sequence of Y Using the innovation sequence, the MMSE linear estimate of X given Ỹ i+1 and its MSE can be computed causally ˆX(Ỹ i+1 ) = ˆX(Ỹ i )+ ˆX(Ỹi+1), MSE i+1 = MSE i Cov2 (X,Ỹ i+1 ) Var(Ỹ i+1 ) The innovation sequence will prove useful in deriving the Kalman filter EE 278B: Random Vectors 4 21

Kalman Filter The Kalman filter is an efficient, recursive algorithm for computing the MMSE linear estimate and its MSE when the signal X and observations Y evolve according to a state-space model Consider a linear dynamical system described by the state-space model: with noisy observations (output) X i+1 = A i X i +U i, i = 0,1,...,n Y i = X i +V i, i = 0,1,...,n, where X 0, U 0,U 1,...,U n, V 0,V 1,...,V n are zero mean, uncorrelated RVs with Σ X0 = P 0, Σ Ui = Q i, Σ Vi = N i ; A i is a known sequence of matrices V i U i X i+1 X i Delay Y i A i EE 278B: Random Vectors 4 22

This state space model is used in many applications: Navigation, e.g., of a car: State: is location, speed, heading, acceleration, tilt, steering wheel position of vehicle Observations: inertial (accelerometer, gyroscopes), electronic compass, GPS Phase locked loop: State: phase and frequency offsets Observations: noisy observation of phase Computer vision, e.g., face tracking: State: Pose, motion, shape (size, articulation), appearance (light, color) Observations: video frame sequence Economics... EE 278B: Random Vectors 4 23

The goal is to compute the MMSE linear estimate of the state from causal observations: Prediction: Find the estimate ˆX i+1 i of X i+1 from Y i and its MSE Σ i+1 i Filtering: Find the estimate ˆX i i of X i from Y i and its MSE Σ i i The Kalman filter provides clever recursive equations for computing these estimates and their error covariance matrices EE 278B: Random Vectors 4 24

Scalar Kalman Filter Consider the scalar state space system: X i+1 = a i X i +U i, i = 0,1,...,n with noisy observations Y i = X i +V i, i = 0,1,...,n, where X 0, U 0,U 1,...,U n, V 0,V 1,...,V n are zero mean, uncorrelated r.v.s with Var(X 0 ) = P 0, Var(U i ) = Q i, Var(V i ) = N i, and a i is a known sequence V i U i X i+1 X i Delay Y i a i EE 278B: Random Vectors 4 25

Kalman filter (prediction): Initialization: ˆX 0 1 = 0, σ 2 0 1 = P 0 Update equations: For i = 0,1,2,...,n, the estimate is where the filter gain is The MSE of ˆX i+1 i is ˆX i+1 i = a i ˆXi i 1 +k i (Y i ˆX i i 1 ), k i = a iσ 2 i i 1 σ 2 i i 1 +N i σ 2 i+1 i = a i(a i k i )σ 2 i i 1 +Q i EE 278B: Random Vectors 4 26

Example: Let a i = 1, Q i = 0, N i = N, and P 0 = P (so X 0 = X 1 = X 2 = = X), and Y i = X +V i (this is the same as the earlier estimation example) Kalman filter: Initialization: ˆX 0 1 = 0 and σ 2 0 1 = P The update in each step is ˆX i+1 i = (1 k i ) ˆX i i 1 +k i Y i with and the MSE is k i = σ2 i i 1 σ 2 i i 1 +N, σ 2 i+1 i = (1 k i)σ 2 i i 1 EE 278B: Random Vectors 4 27

We can solve for σi+1 i 2 explicitly ( ) σi+1 i 2 = 1 σ2 i i 1 σi i 1 2 +N σi i 1 2 = Nσ2 i i 1 σi i 1 2 +N The gain is 1 σ 2 i+1 i The recursive estimate is = 1 N + 1 σ 2 i i 1 σ 2 i+1 i = 1 i/n +1/P = NP ip +N ˆX i+1 i = k i = (i 1)P +N ip +N P ip +N ˆX i i 1 + We thus obtain the previous result in a recursive form P ip +N Y i EE 278B: Random Vectors 4 28

Example: Let n = 200, P 0 = 1, N i = 1 For i = 1 to 100: a i = α 2, Q i = P 0 (1 α 2 ) with α = 0.95 (memory factor) For i = 100 to 200: a i = 1, Q i = 0 (i.e., state remains constant) 2 1 Xi 0 1 2 0 20 40 60 80 100 120 140 160 180 200 4 2 Yi 0 2 4 0 20 40 60 80 100 120 140 160 180 200 2 ˆXi+1 i 1 0 1 0 20 40 60 80 100 120 140 160 180 200 i EE 278B: Random Vectors 4 29

Xi+1 ˆX i+1 i 1 0 1 2 0 20 40 60 80 100 120 140 160 180 200 Yi Xi+1 4 2 0 2 4 0 20 40 60 80 100 120 140 160 180 200 1 σ 2 i+1 i 0.5 0 0 20 40 60 80 100 120 140 160 180 200 i EE 278B: Random Vectors 4 30

Derivation of the Kalman Filter We use innovations. Let Ỹi be the innovation r.v. for Y i, then we can write ˆX i+1 i = ˆX i+1 i 1 +k i Ỹ i, σ i+1 i = σ i+1 i 1 +k i Cov(X i+1,ỹi) where ˆX i+1 i 1 and σ i+1 i 1 are the MMSE linear estimate of X given Y i 1 and its MSE, and k i = Cov(X i+1,ỹi) Var(Ỹi) Now, since X i+1 = a i X i +U i, by linearity of MMSE linear estimate, we have and ˆX i+1 i 1 = a i ˆXi i 1 σ 2 i+1 i 1 = a2 iσ 2 i i 1 +Q i EE 278B: Random Vectors 4 31

Now, the innovation r.v. for Y i is Ỹ i = Y i Ŷ i (Y i 1 ) Since Y i = X i +V i and V i is uncorrelated with Y j, j = 1,2,...,i 1, Ŷ i (Y i 1 ) = ˆX i i 1 Hence, This yields Ỹ i = Y i ˆX i i 1 ˆX i+1 i = a i ˆXi i 1 +k i Ỹ i = a i ˆXi i 1 +k i (Y i ˆX i i 1 ) Now, consider σ 2 i+1 i = σ2 i+1 i 1 k icov(x i+1,ỹ i ), k i = Cov(X i+1,ỹi) Var(Ỹi) = Cov(a ix i +U i,x i ˆX i i 1 +V i ) Var(X i ˆX i i 1 +V i ) = Cov(a ix i,x i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) EE 278B: Random Vectors 4 32

= a icov(x i,x i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) = a icov(x i ˆX i i 1,X i ˆX i i 1 ) Var(X i ˆX i i 1 +V i ) since (X i ˆX i i 1 ) ˆX i i 1 The MSE is = a ivar(x i ˆX i i 1 ) Var(X i ˆX i i 1 )+N i = a iσ 2 i i 1 σ 2 i i 1 +N i σ 2 i+1 i = σ2 i+1 i 1 k icov(a i X i +U i,x i ˆX i i 1 +V i ) = σ 2 i+1 i 1 k ia i σ 2 i i 1 = a i (a i k i )σ 2 i i 1 +Q i This completes the derivation of the scalar Kalman filter EE 278B: Random Vectors 4 33

Vector Kalman Filter The above scalar Kalman filter can be extended to the vector state space model: Initialization: ˆX0 1 = 0, Σ 0 1 = P 0 Update equations: For i = 0,1,2,...,n, the estimate is where the filter gain matrix The covariance of the error is ˆX i+1 i = A iˆxi i 1 +K i (Y i ˆX i i 1 ), K i = A i Σ i i 1 (Σ i i 1 +N i ) 1 Σ i+1 i = A i Σ i i 1 A T i K i Σ i i 1 A T i +Q i Remark: If X 0, U 0,U 1,...,U n and V 0,V 1,...,V n are Gaussian (zero mean, uncorrelated), then the Kalman filter yields the best MSE estimate of X i, i = 0,...,n EE 278B: Random Vectors 4 34

Filtering Now assume the goal is to compute the MMSE linear estimate of X i given Y i, i.e., instead of predicting the next state, we are interested in estimating the current state We denote this estimate by ˆX i i and its MSE by σ 2 i i The Kalman filter can be adapted to this case as follows: Initialization: ˆX 0 0 = P 0 P 0 +N 0 Y 0 σ 2 0 0 = P 0N 0 P 0 +N 0 Update equations: For i = 1,2,...,n, the estimate is ˆX i i = a i 1 (1 k i ) ˆX i 1 i 1 +k i Y i EE 278B: Random Vectors 4 35

with filter gain and MSE recursion k i = σ 2 i i = (1 k i) a2 i 1 σ2 i 1 i 1 +Q i 1 a 2 i 1 σ2 i 1 i 1 +Q i 1+N i ( ) a 2 i 1σi 1 i 1 2 +Q i 1 Vector case Initialization: ˆX 0 0 = P 0 (P 0 +N 0 ) 1 Y 0 Σ 0 0 = P 0 (I (P 0 +N 0 ) 1 P 0 ) Update equations: For i = 1,2,...,n, the estimate is ˆX i i = (I K i )A i 1ˆXi 1 i 1 +K i Y i with filter gain K i = (A i 1 Σ i 1 i 1 A T i 1+Q i 1 ) ( A i 1 Σ i 1 i 1 A T i 1+Q i 1 +N i ) 1 and MSE recursion Σ i i = (A i 1 Σ i 1 i 1 A T i 1+Q i 1 )(I K T i ) EE 278B: Random Vectors 4 36