11 - The linear model

Size: px

Start display at page:

Download "11 - The linear model"

Vivien Burke
5 years ago
Views:

1 11-1 The linear model S. Lall, Stanford The linear model The linear model The joint pdf and covariance Example: uniform pdfs The importance of the prior Linear measurements with Gaussian noise Example: Gaussian noise The signal-to-noise ratio Scalar systems and the SNR Example: small and large noise Posterior covariance Example: navigation Alternative formulae Weighted least squares

2 11-2 The linear model S. Lall, Stanford The linear model A very important class of estimation problems is the linear model y = Ax + w x and w are independent We have induced pdfs p x for x and p w for w The matrix A is m n We measure y = y meas and would like to estimate x

3 11-3 The linear model S. Lall, Stanford The mean Let µ x = E x and µ w = E w Then E y = Aµ x + µ w Call this µ y

4 11-4 The linear model S. Lall, Stanford The linear map Since y = Ax + w, we have [ ] x y = [ ] I 0 x A I][ w We will measure y = y meas and estimate x To do this, we would like the conditional pdf of x y = y meas For this, we need the joint pdf of x and y

5 11-5 The linear model S. Lall, Stanford The joint pdf The joint pdf p of x and y is p(x, y) = p x (x)p w (y Ax) because the joint pdf of x, w is p 1 ([ x w ]) = p x (x)p w (w) and we know [ ] x y = [ ] I 0 x A I][ w z = Hu implies p z (a) = det H 1 p u (H 1 (u)), so ([ ]) I 0 x p(x, y) = p 1 A I][ y

6 11-6 The linear model S. Lall, Stanford The covariance Let Σ w = cov(w) and Σ x = cov(x). We have [ x cov y] = [ Σx Σ x A T AΣ x AΣ x A T + Σ w ] Call this [ ] [ Σx Σ xy x. Above holds because cov Σ yx Σ y y] = [ ][ ] [ ] T I 0 Σx 0 I 0 A I 0 Σ w A I Then cov(y) = AΣ x A T + Σ w AΣ x A T is signal covariance Σ w is noise covariance

7 11-7 The linear model S. Lall, Stanford Example: uniform pdfs Suppose x U[ 2, 2] and w U[ 1, 1] and we measure y = x + w. The joint pdf and MMSE estimator are Notice the estimator is not x est = y meas, because of the prior information that x [ 2, 2].

8 11-8 The linear model S. Lall, Stanford The importance of the prior x U[ 2, 2] as before w U[ 0.3, 0.3]; signal x is large relative to the noise w The joint pdf and MMSE estimator are The estimator is almost x est = y meas

9 11-9 The linear model S. Lall, Stanford The importance of the prior x U[ 2, 2] as before w U[ 10, 10]; signal x is small relative to the noise w 10 5 The joint pdf and MMSE estimator are shown. The estimator mostly ignores the measurement 0 The estimate is almost the prior mean E x =

10 11-10 The linear model S. Lall, Stanford Linear measurements with Gaussian noise We have the linear model y = Ax + w x N(0, Σ x ) and w N(0, Σ w ) are independent So [ x y] is Gaussian, with mean and covariance [ x E y] = [ 0 0] [ x cov y] = [ Σx Σ x A T AΣ x AΣ x A T + Σ w ]

11 11-11 The linear model S. Lall, Stanford Linear measurements with Gaussian noise The MMSE estimate of x given y = y meas is ˆx mmse = Σ x A T (AΣ x A T + Σ w ) 1 y meas because we know ˆx mmse = Σ xy Σ 1 y y meas The matrix L = Σ x A T (AΣ x A T + Σ w ) 1 is called the estimator gain

12 11-12 The linear model S. Lall, Stanford Example: linear measurements with Gaussian noise Suppose y = 2x + w, with prior covariance cov(x) = 1 noise covariance cov(w) = y=ax x=ly the estimator is x mmse = 2y meas The MMSE estimator gives a smaller answer than just inverting A, x mmse A 1 y meas since we have prior information about x

13 11-13 The linear model S. Lall, Stanford Non-zero means Suppose x N(µ x, Σ x ) and w N(µ w, Σ w ). The MMSE estimate of x given y = y meas is ˆx mmse = µ x + Σ x A T (AΣ x A T + Σ w ) 1 (y meas Aµ x µ w )

14 11-14 The linear model S. Lall, Stanford The signal to noise ratio Suppose where x, y and w are scalar, and y = Ax + w. The signal-to-noise ratio is s = A2 Σ x Σw Commonly used for scalar w, x, y; no use in vector case In terms of s, the MMSE estimate is x mmse = µ x + AΣ x A 2 Σ x + Σ w (y meas Aµ x ) = s 2µ x + s2 1 + s 2A 1 y meas

15 11-15 The linear model S. Lall, Stanford Scalar systems and the SNR The MMSE estimate is x mmse = s 2µ x + s2 1 + s 2A 1 y meas let θ = s 2, then x mmse = θµ x + (1 θ)a 1 y a convex linear combination of the prior mean and the least-squares estimate when s is small, x mmse µ x, the prior mean when s is large, x mmse A 1 y, the least-squares estimate of y

16 11-16 The linear model S. Lall, Stanford Example: small noise Suppose y = 2x + w, with prior covariance cov(x) = 1 noise covariance cov(w) = 0.4; signal is large compared to noise y=ax x=ly Hence SNR s = A2 Σ x Σw Estimate is 2 i.e., close to y meas /2 x mmse = s2 y 1 + s 2A 1 meas 0.9A 1 y meas

17 11-17 The linear model S. Lall, Stanford Example: large noise Suppose y = 2x + w, with prior covariance cov(x) = 1 noise covariance cov(w) = 20; signal is small compared to noise y=ax x=ly Hence SNR s = A2 Σ x Σw Estimate is 2 x mmse = s2 y 1 + s 2A 1 meas 0.17A 1 y meas i.e., closer to 0 for all y meas

18 11-18 The linear model S. Lall, Stanford The posterior covariance The posterior covariance of x given y = y meas is cov ( x y = ymeas ) = Σx Σ x A T (AΣ x A T + Σ w ) 1 AΣ x above follows because cov ( x y = ymeas ) = Σx Σ xy Σ 1 y Σ yx We can use this to compute the MSE since E ( x ˆx mmse 2 y = ymeas ) = tracecov ( x y = ymeas )

19 11-19 The linear model S. Lall, Stanford The posterior covariance and SNR For scalar problems, the posterior covariance of x given y = y meas is cov ( x y = ymeas ) = Σ x 1 + s 2 The uncertainty (covariance) in x is reduced by the factor 1 by measurement 1 + s2

20 11-20 The linear model S. Lall, Stanford Example: navigation [ p x = our location, we measure distances r q] i to m beacons at points (u i, v i ) (u 3 ;v 3 ) r 2 (u 2 ;v 2 ) (u 4 ;v 4 ) r 3 r 1 (p;q) r 4 (u 1 ;v 1 ) assume p, q are small compared to u i, v i. then, approximately y = Ax A R m 2, ith row of A is the transpose of unit vector in the direction of beacon i u v1 2 y = r 1. measured vector of distances u 2 m + vm 2 r m

21 11-21 The linear model S. Lall, Stanford Example: navigation 5 here A R 3 2 with A = b 1 b b 2 b 1 b 3 and y = Ax. Each b i is a unit vector. 2 b 3 ([ 4 Prior information is x N, 4] [ ]) Prior mean actual location y is measured; y i is range measurement in the direction b i with noise w added [ ] [ ] [ ] beacons at,, figure shows prior 90% confidence ellipsoid

22 11-22 The linear model S. Lall, Stanford Example: posterior confidence ellipsoids Posterior confidence ellipsoids for two different possible noise covariances. Σ w = Σ w = b 2 4 b 2 3 b 1 3 b 1 2 b 3 2 b 3 1 actual location MMSE estimate actual location MMSE estimate

23 11-23 The linear model S. Lall, Stanford Alternative formula There is another way to write the posterior covariance: cov ( x y = y meas ) = ( Σ 1 x + A T Σ 1 w A ) 1 follows from the Sherman-Morrison-Woodbury formula (A BD 1 C) 1 = A 1 + A 1 B(D CA 1 B) 1 CA 1 This is very useful when we have fewer unknowns than measurements; i.e., Σ x is smaller that AΣ x A T

24 11-24 The linear model S. Lall, Stanford Alternative formula There is also an alternative formula for the estimator gain L = (Σ 1 x + A T Σ 1 w A) 1 A T Σ 1 w Because L = Σ x A T (AΣ x A T + Σ w ) 1 = Σ x A T (Σ 1 w AΣ x A T + I) 1 Σ 1 w = (Σ x A T Σ 1 w A + I) 1 Σ x A T Σ 1 w by push-through identity = (A T Σ 1 w A + Σ 1 x ) 1 A T Σ 1 w

25 11-25 The linear model S. Lall, Stanford Comparison with least-squares The least-squares approach minimizes y Ax 2 = m ( yi a T i x ) 2 i=1 where A = [ a 1 a 2... a m ] T Suppose instead we minimize m ( w i yi a T i x ) 2 i=1 where w 1, w 2,..., w m are positive weights

26 11-26 The linear model S. Lall, Stanford Weighted norms More generally, let s look at weighted norms 3 2 contours of the 2-norm x 2 = x T x contours of the weighted-norm x W = x T Wx = W 1 2x 2 0 where W = [ ]

27 11-27 The linear model S. Lall, Stanford Weighted least squares the weighted least-squares problem; given y meas R m, minimize y meas Ax W assume A R m n, skinny, full rank, and W R m m and W > 0 then (by differentiating) the optimum x is x wls = (A T WA) 1 A T Wy meas

28 11-28 The linear model S. Lall, Stanford Weighted least squares R m y meas range(a) Ax est if there is no noise, y lies in range A the weighted least-squares estimate x wls minimizes y meas Ax W Ax wls is the closest (in weighted-norm) point in rangea to y meas

29 11-29 The linear model S. Lall, Stanford MMSE and weighted least squares suppose we choose weight W = Σ 1 w ; then WLS solution is x wls = (A T Σ 1 w A) 1 A T Σ 1 w y meas compare with MMSE estimate when x N(0, Σ x ) and w N(0, Σ w ) x mmse = (Σ 1 x + A T Σ 1 w A) 1 A T Σ 1 w y meas as the prior covariance Σ x, the MMSE estimate tends to the WLS estimate if Σ w = I then MMSE tends to usual least-squares solution as Σ x the weighted norm heavily penalizes the residual y Ax in low-noise directions

8 - Continuous random vectors

8-1 Continuous random vectors S. Lall, Stanford 2011.01.25.01 8 - Continuous random vectors Mean-square deviation Mean-variance decomposition Gaussian random vectors The Gamma function The χ 2 distribution