Least Squares and Kalman Filtering Questions: Email me, namrata@ece.gatech.edu Least Squares and Kalman Filtering 1
Recall: Weighted Least Squares y = Hx + e Minimize Solution: J(x) = (y Hx) T W (y Hx) = y Hx 2 W (1) ˆx = (H T W H) 1 H T W y (2) Given that E[e] = 0 and E[ee T ] = V, Min. Variance Unbiased Linear Estimator of x: choose W = V 1 in (2) Min. Variance of a vector: variance in any direction is minimized Least Squares and Kalman Filtering 2
Recall: Proof Given ˆx = Ly, find L, s.t. E[Ly] = E[LHx] = E[x], so LH = I Let L 0 = (H T V 1 H) 1 H T V 1 Error variance E[(x ˆx)(x ˆx) T ] E[(x ˆx)(x ˆx) T ] = E[(x LHx Le)(x LHX Le) T ] = E[Lee T L T ] = LV L T Say L = L L 0 + L 0. Here LH = I, L 0 H = I, so (L L 0 )H = 0 LV L T = L 0 V L T 0 + (L L 0 )V (L L 0 ) T + 2L 0 V (L L 0 ) T = L 0 V L T 0 + (L L 0 )V (L L 0 ) T + (H T V 1 H) 1 H T (L L 0 ) T = L 0 V L T 0 + (L L 0 )V (L L 0 ) T L 0 V L T 0 Thus L 0 is the optimal estimator (Note: for matrices) Least Squares and Kalman Filtering 3
Regularized Least Squares Minimize J(x) = (x x 0 ) T Π 1 0 (x x 0) + (y Hx) T W (y Hx) (3) x = x x0, y = y Hx 0 J(x) = x T Π 1 0 x + y T W y z = z M 1 z = 0 y I H x M = Π 1 0 0 0 W Least Squares and Kalman Filtering 4
Solution: Use least squares formula with ỹ = 0 y, H = I H, W = M Get: ˆx = x 0 + (Π 1 0 + H T W H) 1 H T W (y Hx 0 ) Advantage: improves condition number of H T H, incorporate prior knowledge about distance from x 0 Least Squares and Kalman Filtering 5
Recursive Least Squares When number of equations much larger than number of variables Storage Invert big matrices Getting data sequentially Use a recursive algorithm At step i 1, have ˆx i 1 : minimizer of (x x 0 ) T Π 1 0 (x x 0) + H i 1 x Y i 1 2 W i 1, Y i 1 = [y 1,...y i 1 ] T Find ˆx i : minimizer of (x x 0 ) T Π 1 0 (x x 0) + H i x Y i 2 W i, H i = H i 1 h i (h i is a row vector), Y i = [y 1,...y i ] T (column vector) Least Squares and Kalman Filtering 6
For simplicity of notation, assume x 0 = 0 and W i = I. Define H T i H i = H T i 1H i 1 + h T i h i ˆx i = (Π 1 0 + H T i H i ) 1 H T i Y i = (Π 1 0 + H T i 1H i 1 + h T i h i ) 1 (H T i 1Y i 1 + h T i y i ) P i = (Π 1 0 + H T i H i ) 1, P 1 = Π 0 So P 1 i = P 1 i 1 + ht i h i Use Matrix Inversion identity: (A + BCD) 1 = A 1 + A 1 B(C 1 + DA 1 B) 1 DA 1 P i = P i 1 P i 1h T i h ip i 1 1 + h i P i 1 h T i Least Squares and Kalman Filtering 7
ˆx 0 = 0 ˆx i = P i H T i Y i = [P i 1 P i 1h T i h ip i 1 1 + h i P i 1 h T i ][H T i 1Y i 1 + h T i y i ] = P i 1 H T i 1Y i 1 P i 1 h T i 1 + h i P i 1 h T i h i P i 1 H T i 1Y i 1 +P i 1 h T i (1 h ip i 1 h T i 1 + h i P i 1 h T i )y i = ˆx i 1 + P i 1 h T i 1 + h i P i 1 h T i (y i h iˆx i 1 ) If W i I, this modifies to (replace y i by w 1/2 i y i & h i by w 1/2 i h i ): ˆx i = ˆx i 1 + P i 1 h T i (w i 1 + h i P i 1 h T i ) 1 (y i h iˆx i 1 ) Least Squares and Kalman Filtering 8
Here we considered y i to be a scalar and h i to be a row vector. In general: y i can be a k-dim vector, h i will be a matrix with k rows RLS with Forgetting factor Weight older data with smaller weight J(x) = i j=1 (y j h j x) 2 β(i, j) Exponential forgetting: β(i, j) = λ i j, λ < 1 Moving average: β(i, j) = 0 if i j > and β(i, j) = 1 otherwise Least Squares and Kalman Filtering 9
Connection with Kalman Filtering The above is also the Kalman filter estimate of the state for the following system model: x i = x i 1 y i = h i x i + v i, v i N (0, R i ), R i = w 1 i (4) Least Squares and Kalman Filtering 10
Kalman Filter RLS was for static data: estimate the signal x better and better as more and more data comes in, e.g. estimating the mean intensity of an object from a video sequence RLS with forgetting factor assumes slowly time varying x Kalman filter: if the signal is time varying, and we know (statistically) the dynamical model followed by the signal: e.g. tracking a moving object x 0 N (0, Π 0 ) x i = F i x i 1 + v x,i, v x,i N (0, Q i ) The observation model is as before: y i = h i x i + v i, v i N (0, R i ) Least Squares and Kalman Filtering 11
Goal: get the best (minimum mean square error) estimate of x i from Y i Cost: J(ˆx i ) = E[(x i ˆx i ) 2 Y i ] Minimizer: conditional mean ˆx i = E[x i Y i ] This is also the MAP estimate, i.e. ˆx i also maximizes p(x i Y i ) Least Squares and Kalman Filtering 12
Kalman filtering algorithm At i = 0, ˆx 0 = 0, P 0 = Π 0. For any i, assume that we know ˆx i 1 = E[x i Y i 1 ]. Then This is the prediction step E[x i Y i 1 ] = F iˆx i 1 = ˆxi i 1 V ar(x i Y i 1 ) = F i P i 1 F T i + Q i = Pi i 1 (5) Least Squares and Kalman Filtering 13
Filtering or correction step: Now x i Y i 1 & y i x i, Y i 1 jointly Gaussian x i Y i 1 N (ˆx i i 1, P i i 1 ) y i x i, Y i 1 = y i x i N (h i x i, R i ) Using formula for the conditional distribution of Z 1 Z 2 when Z 1 and Z 2 are jointly Gaussian, E[x i Y i ] = ˆx i i 1 + P i i 1 h T i (R i + h i P i i 1 h T i ) 1 (y i h iˆx i i 1 ) V ar(x i Y i ) = P i i 1 P i i 1 h T i h i P i i 1 (R i + h i P i i 1 h T i ) 1 ˆx i = E[x i Y i ] and P i = V ar(x i Y i ) Least Squares and Kalman Filtering 14
Summarizing the algorithm ˆx i i 1 = F iˆx i 1 P i i 1 = F i P i 1 Fi T + Q i ˆx i = ˆx i i 1 + P i i 1 h T i (R i + h i P i i 1 h T i ) 1 (y i h iˆx i i 1 ) P i = P i i 1 P i i 1 h T i h i P i i 1 (R i + h i P i i 1 h T i ) 1 For F i = I, Q i = 0, get the RLS algorithm. Least Squares and Kalman Filtering 15
Example Applications RLS: adaptive noise cancelation, given a noisy signal d n assumed to be given by d n = u T n w + v n, get the best estimate of the weight w. Here y n = d n, h n = u n, x = w channel equalization using a training sequence Object intensity estimation: x = intensity, y i = vector of intensities of object region in frame i, h i = 1 m (column vector of m ones), Kalman filter: Track a moving object (estimate its location and velocity at each time) Least Squares and Kalman Filtering 16
Suggested Reading Chapters 2, 3 & 9 of Linear Estimation, by Kailath, Sayed, Hassibi Chapters 4 & 5 of An Introduction to Signal Detection and Estimation, by Vincent Poor Least Squares and Kalman Filtering 17