Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation...................................... 2 2.1.2 Important special case: Gaussian model.......................... 3 2.1.3 Example: Linear Gaussian Model.............................. 3 2.2 Linear estimation............................................ 4 2.3 Linear MMSE Estimator........................................ 4 2.4 Orthogonality Condition........................................ 5 2.4.1 Principle............................................. 5 2.4.2 Example: Linear Model.................................... 5 3 Non-Bayesian Estimation Techniques 6 3.1 The BLU estimator........................................... 6 3.1.1 Example: Linear Gaussian Model.............................. 7 3.2 The LS estimator............................................ 8 3.3 Orthogonality condition........................................ 8 4 Extension to vector estimation 8 5 Further problems 8 5.1 Problem 1: relations for Linear Gaussian Models.......................... 8 5.1.1 No a priori information: MMSE LS........................... 9 5.1.2 Uncorrelated noise: BLU LS............................... 9 5.1.3 Zero-noise: MMSE LS................................... 9 5.2 Problem 2: orthogonality........................................ 9 5.3 Practical example: multi-antenna communication......................... 9 5.3.1 Problem............................................. 10 5.3.2 Solution............................................. 10 1
1 Problem Statement We are interested in estimating a continuous scalar 1 parameter a A from a vector observation r. The observation the parameter are related through a probabilistic mapping p R A (r a). a may or may not be a sample of a rom variable. This leads to Bayesian non-bayesian estimators, respectively. Special cases Noisy observation model: r = f (a, n) where n (the noise component) is independent of a, has a pdf p N (n). Linear model: for a known h, with n independent of a r = ha + n Linear Gaussian model: linear model with N N (0, Σ N ), A N (m A, Σ A ) 2 Bayesian Estimation Techniques Here, a A has a known a priori distribution p A (a). The most important Bayesian estimators are the MAP (maximum a posteriori) estimator the MMSE (minimum mean squared error) estimator the linear MMSE estimator The latter two will be covered in this section. We remind that the MAP estimate is given by â MAP (r) = arg max a A p A R (a r). 2.1 Minimum Mean Squared Error (MMSE) estimation 2.1.1 General formulation The MMSE estimator minimizes the expected estimation error C = da (a â (r)) 2 p A R (a r) p R (r) dr A R { (A â (R)) 2}. Taking the derivative (assuming this is possible) w.r.t. â (r) setting the result to zero yields â (r) = a p A R (a r) da. 1 We will only consider estimation of scalars. Extension to vectors is straightforward. A 2
2.1.2 Important special case: Gaussian model When A R are jointly Gaussian, we can write where [A, R] N (m, Σ) Σ = m = [ ma m R ] [ ] ΣA Σ T AR. Σ AR Σ R In that case it can be shown (after some straightforward manipulation), that A R N ( m A + Σ T ARΣ 1 R (r m R), Σ A Σ T ARΣ 1 R Σ ) AR so that Note that â MMSE (r) = m A + Σ T ARΣ 1 R (r m R) (1) A posteriori (after observing r), the uncertainty w.r.t. a is reduced from Σ A to Σ A Σ T AR Σ 1 R Σ AR < Σ A. When A R are independent Σ AR = 0, so that â MMSE (r) = m A For A R jointly Gaussian, â MAP (r) = â MMSE (r). The estimator is linear in the observation 2.1.3 Example: Linear Gaussian Model Problem: Derive the MMSE estimator for the Linear Gaussian Model Solution: We know that R = ha + N with A N (m A, Σ A ), N N (0, Σ N ) with A N independent. We also know that the MMSE estimator is given by: Let us compute m R, Σ R Σ AR. â MMSE (r) = m A + Σ T ARΣ 1 R (r m R). m R {R} = hm A Σ AR {(R m R ) (A m A )} {(ha + N hm A ) (A m A )} {(h (A m A ) + N) (A m A )} { = he (A m A ) 2} + E {N (A m A )} = hσ A Σ R {(R } m R ) (R m R ) T {(h } (A m A ) + N) (h (A m A ) + N) T = he {(A m A ) 2} h T + E { NN T } = hσ A h T + Σ N. 3
This leads to â MMSE (r) = m A + Σ T ARΣ 1 R (r m R) = m A + Σ A h T ( hσ A h T + Σ N (r hma ) 2.2 Linear estimation In some cases, it is preferred to have an estimator which is a linear function of the observation: â (r) = b T r + c so that â (r) is obtained through an affine transformation of the observation. Clearly, when A R are jointly Gaussian, the MMSE estimator is a linear estimator with b T = Σ T ARΣ 1 R c = m A + Σ T ARΣ 1 R (m R). 2.3 Linear MMSE Estimator When A R are not jointly Gaussian, we can still use (1) as an estimator. This is known as the Linear MMSE (LMMSE) estimator: â LMMSE (r) = m A + Σ T ARΣ 1 R (r m R) Theorem: We wish to find an estimator with the following properties the estimator must be linear in the observation E {â (r)} = m A (this can be interpreted as a form of unbiasedness) the estimator has the minimum MSE among all linear estimators Then the estimator is the LMMSE estimator. Proof: Given a generic linear estimator â (r) = b T r + c (2) we would like to find b c such that the resulting estimator is unbiased has minimum variance. Clearly E {â (r)} = b T m R + c so that b c have to satisfy We also know that b T m R + c = m A c = m A b T m R c 2 = m 2 A + b T m R m T Rb 2m A b T m R Σ R {(R } m R ) (R m R ) T { RR T } m R m T R Σ AR {(R m R ) (A m A )} {RA} m R m A. 4
Let us now look at the variance of the estimation error (the estimation error is zero-mean): V {(â (R) A) 2} { (b T R + c A ) } 2 = b T E { RR T } b + E { A 2} + c 2 + 2cb T m R 2cm A 2b T E {RA} = b T ( Σ R + m R m T R) b + ΣA + m 2 A + c 2 + 2c ( b T ) m R m A 2b T (Σ AR + m R m A ) = b T ( Σ R + m R m T R) b + ΣA + m 2 A c 2 2b T (Σ AR + m R m A ) = b T Σ R b + Σ A 2b T Σ AR where we have used the fact that c 2 = m 2 A + bt m R m T R b 2m Ab T m R. Taking derivative w.r.t. b equating the result to zero, yields 2 2Σ R b 2Σ AR = 0 thus (since Σ 1 R is symmetric) b = Σ 1 R Σ AR (3) Substitution of (3) (4) into (2) gives us the final result which is clearly the desired result. QED 2.4 Orthogonality Condition 2.4.1 Principle c = m A b T m R = m A Σ T ARΣ 1 R m R (4) â (r) = m A + Σ T ARΣ 1 R (r m R) The orthogonality condition is useful in deriving (L-)MMSE estimators. Theorem: for LMMSE estimation, the estimation error is (statistically) orthogonal to the observation when m R = 0 m A = 0: Proof: QED. 2.4.2 Example: Linear Model E {R (â (R) A)} { R ( Σ T ARΣ 1 R R A)} { RR T } ( Σ 1 R Σ AR) E {RA} = Σ R ( Σ 1 R Σ AR) ΣAR = 0. Problem: Verify the orthogonality condition assuming m A = 0. Solution: For m A = 0, we need to verify that E {R (â (R) A)} = 0, we see that we need to verify that E {Râ (R)} {RA} = hσ A 2 To see this requires some knowledge of vector calculus: differentiation w.r.t. x gives us x T Ax 2Ax x T y y. 5
{ E {Râ (R)} R (Σ A h T ( hσ A h T )} + Σ N R { RR T } ( hσ A h T + Σ N ΣA h = ( hσ A h T + Σ N ) ( hσa h T + Σ N ΣA h = Σ A h 3 Non-Bayesian Estimation Techniques The above techniques cannot be applied when we do not consider A to be a rom variable. Another approach is called for, where we would like unbiased estimators with small variances. The most important non-bayesian estimators are the ML (maximum likelihood) estimator the BLU (best linear unbiased) estimator the LS (least squares) estimator The latter two will be covered in this section. All expectations are taken for a given value of a. We remind that the ML estimate is given by â ML (r) = arg max a A p R A (r a). 3.1 The BLU estimator As before, we consider a generic linear estimator: â (r) = b T r + c. To obtain an unbiased estimator, we need to assume that c = 0 that E {R} = ha for some known vector h. We introduce Σ R {(R } E {R}) (R E {R}) T { RR T } a 2 hh T which is assumed to be known to the estimator. Our goal is to find a b that leads to an unbiased estimator with minimal variance for all a. Unbiased An unbiased estimator must satisfy E {â (r)} = a, for all a A, so that b T h = 1. 6
Variance of Estimation Error The variance of the estimation error is given by V a {(â (R) a) 2} { (b T r a ) } 2 { (b T (r ha) ) } 2 { = b T E (r ha) 2} b = b T Σ R b. This leads us to the following optimization problem: find b that minimizes V a, subject to b T h = 1. Using a Lagrange multiplier technique, we find we need to minimize b T Σ R b λ ( b T h 1 ). This leads to With the constraint b T h = 1 giving rise to b = Σ 1 R hλ. λh T Σ 1 R h = 1 so that b = Σ 1 R h ( h T Σ 1 R h V a = ( h T Σ 1 R h h T Σ 1 R Σ RΣ 1 R h ( h T Σ 1 R h = ( h T Σ 1 R h. 3.1.1 Example: Linear Gaussian Model Problem: derive the BLU estimator for the Linear Gaussian Model Solution: where Hence â BLU (r) = ( h T Σ 1 R h h T Σ 1 R r Σ R {(R } m R ) (R m R ) T {(R } ha) (R ha) T {(ha } + N ha) (ha + N ha) T = Σ N. â BLU (r) = ( h T Σ 1 N h h T Σ 1 N r 7
3.2 The LS estimator If we refer back to our problem statement, the LS estimator tries to find a that minimizes the distance the between the observation the reconstructed noiseless observation: { d r h (a, 0) 2}. When h (a, 0) = ha, we find that d { r ha 2} { r T r } + a 2 h T h 2ar T h Minimizing w.r.t. a gives us finally 2ah T h 2r T h = 0 3.3 Orthogonality condition â LS (r) = ( h T h h T r. The reconstruction error is orthogonal to the parameter: (r hâ LS (r)) T a = ( r h ( h T h h T r) T a 4 Extension to vector estimation When = (r r) T a = 0 r = Ha + n with r n N 1 vectors, H a N M matrix, a an M 1 vector. The MMSE, LS BLU estimates are then given by 5 Further problems â LMMSE (r) = m A + Σ A H T ( HΣ A H T + Σ N (r HmA ) = m A + ( H T Σ 1 N H + Σ 1 A H T Σ 1 N (r Hm A) â LS (r) = ( H T H H T r â BLU (r) = ( H T Σ 1 N H H T Σ 1 N r 5.1 Problem 1: relations for Linear Gaussian Models How are MMSE, LS BLU related? Consider the scalar case. The estimators are given by â MMSE (r) = m A + Σ A h T ( hσ A h T + Σ N (r hma ) â LS (r) = ( h T h h T r â BLU (r) = ( h T Σ 1 N h h T Σ 1 N r 8
5.1.1 No a priori information: MMSE LS When m A = 0 Σ 1 A = 0 we get 5.1.2 Uncorrelated noise: BLU LS When Σ N = σ 2 I, 5.1.3 Zero-noise: MMSE LS For the MMSE estimator, when Σ N = 0, 5.2 Problem 2: orthogonality â MMSE (r) = h T ( hh T r = ( hh T h T r = â LS (r) â BLU (r) = ( h T h h T r = â LS (r) â MMSE (r) = m A + h T ( hh T (r hma ) = h T ( hh T r = â LS (r). Problem: Derive the LMMSE estimator, starting from the orthogonality condition. Solution: We introduce Q = A m A, S = R m R = R hm A, find the LMMSE estimate of Q from the observation S. The orthogonality principle then tells us that with, for some (as yet undetermined) b: so that Hence The estimate of A is then given by E {Sˆq (S)} {SQ} = hσ A E {Sˆq (S)} { Sb T S } = ( hσ A h T + Σ N ) b b = ( hσ A h T + Σ N ΣA h. ˆq LMMSE (s) = Σ A h T ( hσ A h T + Σ N s. â LMMSE (r) = ˆq LMMSE (r m R ) + m A = m A + Σ A h T ( hσ A h T + Σ N (r hma ). 5.3 Practical example: multi-antenna communication In this problem, we show how LMMSE LS can be used when ML or MAP leads to algorithms which are too complex to implement. 9
5.3.1 Problem We are interested in the following problem: we transmit a vector of n T iid data symbols a A = Ω n T over an AWGN channel. Here n T is the number of transmit antennas. The receiver is equipped with n R receive antennas. We can write the observation as r = Ha + n where H is a (known) n R n T channel matrix N N ( 0, σ 2 I nr ). In any practical communication scheme, Ω is a finite set of size M (e.g., BPSK signaling with Ω = { 1, +1}). The symbols are iid, uniformly distributed over Ω. 1. Determine the ML MAP estimates of a. Determine the complexity of the receiver (number of operations to estimate a) as a function of n T. 2. How can we use LMMSE LS be used to estimate a? We assume Σ A = I nt m A = 0. Determine the complexity of the resulting estimators as a function of n T. 5.3.2 Solution Solution - part 1 The MAP ML estimators are considered to be optimal in the case of estimating discrete parameters, in a sense of minimizing the error probability. â ML (r) = arg max p R A (r a) = arg max log p R A (r a) = arg max 1 r Ha 2 σ2 = arg min r Ha 2 â MAP (r) = arg max p A R (a r) = arg max p R A (r a) p A (a) p R (r) = arg max p R A (r a) = â ML (r) Complexity: for each a Ω n T, we need to compute r Ha 2. Hence, the complexity is exponential in the number of transmit antennas. Special case: Let us assume Ω = { 1, +1} n T = 1, n R > 1. Then â ML (r) = arg max a Ω ( ah T r ) In this case we combine the observations from multiple receive antennas. Each observation is weighted with the channel gain. This means that when the channel is unreliable on a given antenna (so that h k is small), this has only a small contribution to our decision statistic. 10
Solution - part 2 Let us temporarily forget that a lives in a discrete space. We can then introduce the LMMSE LS estimates as follows: â LMMSE (r) = H T ( HH T + σ 2 I nr r â LS (r) = ( H T H H T r. Note that â LMMSE (r) â LS (r) may not belong to Ω n T. Hence, we need to quantize these estimates (mapping the estimate to the closest point in Ω n T ); this can be done on a symbol-per-symbol basis. Generally the LS estimator will have poor performance when HH T is close to singular. Note that the LMMSE estimator requires knowledge of σ 2, while the MAP ML estimators do not. Complexity: now the complexity is of the order n R n T (since any matrix not depending on r can be pre-computed by the receiver), which is linear in the number of transmit antennas. Special case: -=-=-Comment: there was a mistake in the lecture. These are the correct results-=-=- Let us assume that we use more transmit that receive antennas: n T > n R. In that case the the n R n R matrix HH T is necessarily singular, so that LS estimation will not work. The LMMSE estimator requires the inversion of HH T + σ 2 I nr. When σ 2 0, this matrix is always invertible. This is a strange situation, where noise is actually helpful in the estimation process. 11