ELEG-636: Statistical Signal Processing

ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 34

Course Objectives & Structure Course Objectives & Structure Objective: Given a discrete time sequence {x(n)}, develop Statistical and spectral signal representation Filtering, prediction, and system identification algorithms Optimization methods that are Statistical Adaptive Course Structure: Weekly lectures [notes: www.ece.udel.edu/ arce] Periodic homework (theory & Matlab implementations) [15%] Midterm & Final examinations [85%] Textbook: Haykin, Adaptive Filter Theory. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 2 / 34

Course Objectives & Structure Course Objectives & Structure Broad Applications in Communications, Imaging, Sensors. Emerging application in Brain-imaging techniques Brain-machine interfaces, Implantable devices. Neurofeedback presents real-time physiological signals from MRIs in a visual or auditory form to provide information about brain activity. These signals are used to train the patient to alter neural activity in a desired direction. Traditionally, feedback using EEGs or other mechanisms has not focused on the brain because the resolution is not good enough. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 3 / 34

Wiener Filtering Problem Statement Produce an estimate of a desired process statistically related to a set of observations Historical Notes: The linear filtering problem was solved by Andrey Kolmogorov for discrete time his 1938 paper established the basic theorems for smoothing and predicting stationary stochastic processes Norbert Wiener in 1941 for continuous time not published until the 1949 paper Extrapolation, Interpolation, and Smoothing of Stationary Time Series Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 5 / 34

Wiener Filtering System restrictions and considerations: Filter is linear Filter is discrete time Filter is finite impulse response (FIR) The process is WSS Statistical optimization is employed Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 6 / 34

" # Wiener (MSE) Filtering Theory For the discrete time case Wiener Filtering "!! % $ The filter impulse response is finite and given by { w h k = k for k = 0, 1,, M 1 0 otherwise The output ˆd(n) is an estimate of the desired signal d(n) x(n) and d(n) are statistically related ˆd(n) and d(n) are statistically related Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 7 / 34

In convolution and vector form Wiener Filtering where ˆd(n) = M 1 k=0 w k x(n k) = wh x(n) w = [w 0, w 1,, w M 1 ] T [filter coefficient vector] x = [x(n), x(n 1),, x(n M + 1)] T [observation vector] The error can now be written as e(n) = d(n) ˆd(n) = d(n) w H x(n) Question: Under what criteria should the error be minimized? Selected Criteria: Mean squared-error (MSE) J(w) = E{e(n)e (n)} ( ) Result: The w that minimizes J(w) is the optimal (Wiener) filter Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 8 / 34

Wiener Filtering Utilizing e(n) = d(n) w H x(n) in ( ) and expanding, J(w) = E{e(n)e (n)} = E{(d(n) w H x(n))(d (n) x H (n)w)} = E{ d(n) 2 d(n)x H (n)w w H x(n)d (n) +w H x(n)x H (n)w} = E{ d(n) 2 } E{d(n)x H (n)}w w H E{x(n)d (n)} +w H E{x(n)x H (n)}w ( ) Let R = E{x(n)x H (n)} [autocorrelation of x(n)] p = E{x(n)d (n)} [cross correlation between x(n) and d(n)] Then ( ) can be compactly expressed as J(w) = σd 2 ph w w H p+w H Rw where we have assumed x(n) & d(n) are zero mean, WSS Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 9 / 34

Wiener Filtering The MSE criteria as a function of the filter weight vector w J(w) = σ 2 d ph w w H p+w H Rw Observation: The error is a quadratic function of w Consequences: The error is an M dimensional bowl shaped function of w with a unique minimum Result: The optimal weight vector, w 0, is determined by differentiating J(w) and setting the result to zero w J(w) w=w0 = 0 A closed form solution exists Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 10 / 34

Wiener Filtering Example Consider a two dimensional case, i.e., a M = 2 tap filter. Plot the error surface and error contours. Error Surface Error Contours Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 11 / 34

Wiener Filtering Aside (Matrix Differentiation): For complex data, w k = a k + jb k, k = 0, 1,, M 1 the gradient, with respect to w k, is k (J) = J a k + j J b k, k = 0, 1,, M 1 The complete gradient is thus given by 0 (J) 1 (J) w (J) =. = M 1 (J) J a 0 + j J b 0 J a 1 + j J b 1. J a M 1 + j J b M 1 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 12 / 34

Wiener Filtering Example Let c and w be M 1 complex vectors. For g = c H w, find w (g) Note Thus g = c H w = M 1 k=0 c k w k = M 1 k=0 c k (a k + jb k ) k (g) = g + j g a k b k = ck + j(jc k ) = 0, k = 0, 1,, M 1 Result: For g = c H w w (g) = 0 (g) 1 (g). M-1 (g) = 0 0. 0 = 0 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 13 / 34

Wiener Filtering Example Now suppose g = w H c. Find w (g) In this case, and g = w H c = M 1 k=0 w k c k = M 1 k=0 c k (a k jb k ) k (g) = g a k + j g b k = c k + j( jc k ) = 2c k, k = 0, 1,, M 1 Result: For g = w H c w (g) = 0 (g) 1 (g). M-1 (g) = 2c 0 2c 1. 2c M 1 = 2c Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 14 / 34

Wiener Filtering Example Lastly, suppose g = w H Qw. Find w (g) In this case, g = = M 1 i=0 M 1 i=0 M 1 j=0 w i w j q i,j M 1 (a i jb i )(a j + jb j )q i,j j=0 k (g) = g + j g a k b k M 1 = 2 (a j + jb j )q k,j + 0 j=0 M 1 = 2 w j q k,j j=0 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 15 / 34

Wiener Filtering Result: For g = w H Qw w (g) = 0 (g) 1 (g). M-1 (g) = 2 M 1 i=0 M 1 i=0 M-1 i=0 q 0,i w i q 1,i w i. q M 1,i w i = 2Qw Observation: Differentiation result depends on matrix ordering Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 16 / 34

Wiener Filtering Returning to the MSE performance criteria J(w) = σ 2 d ph w w H p+w H Rw Approach: Minimize error by differentiating with respect to w and set result to 0 w (J) = 0 0 2p+2Rw = 0 Rw 0 = p [normal equation] Result: The Wiener filter coefficients are defined by w 0 = R 1 p Question: Does R 1 always exist? Recall R is positive semi-definite, and usually positive definite Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 17 / 34

Orthogonality Principle Orthogonality Principle Consider again the normal equation that defines the optimal solution Rw 0 E{x(n)x H (n)}w 0 = p = E{x(n)d (n)} Rearranging E{x(n)d (n)} E{x(n)x H (n)}w 0 = 0 E{x(n)[d (n) x H (n)w 0 ]} = 0 E{x(n)e 0 (n)} = 0 Note: e0 (n) is the error when the optimal weights are used, i.e., e 0 (n) = d (n) x H (n)w 0 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 18 / 34

Orthogonality Principle Thus E{x(n)e0 (n)} = E x(n)e 0 (n) x(n 1)e 0 (n). x(n M + 1)e 0 (n) = 0 0. 0 Orthogonality Principle A necessary and sufficient condition for a filter to be optimal is that the estimate error, e (n), be orthogonal to each input sample in x(n) Interpretation: The observations samples and error are orthogonal and contain no mutual information Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 19 / 34

Minimum MSE Objective: Determine the minimum MSE Approach: Use the optimal weights w 0 = R 1 p in the MSE expression J(w) = σd 2 ph w w H p+w H Rw J min = σd 2 ph w 0 w H 0 p+wh 0 R(R 1 p) = σd 2 ph w 0 w H 0 p+wh 0 p = σd 2 ph w 0 Result: J min = σ 2 d ph R 1 p where the substitution w 0 = R 1 p has been employed Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 20 / 34

Excess MSE Objective: Consider the excess MSE introduced by using a weighted vector that is not optimal. J(w) J min = (σ 2 d ph w w H p+w H Rw) (σ 2 d ph w 0 w H 0 p+wh 0 Rw 0) Using the fact that yields p = Rw 0 and p H = w H 0 R J(w) J min = p H w w H p+w H Rw+p H w 0 + w H 0 p wh 0 Rw 0 = w H 0 Rw wh Rw 0 + w H Rw+w H 0 Rw 0 + w H 0 Rw 0 w H 0 Rw 0 = w H 0 Rw wh Rw 0 + w H Rw+w H 0 Rw 0 = (w w 0 ) H R(w w 0 ) J(w) = J min +(w w 0 ) H R(w w 0 ) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 21 / 34

Excess MSE Finally, using the eigenvalue and vector representation R = QΩQ H J(w) = J min +(w w 0 ) H QΩQ H (w w 0 ) or defining the eigenvector transformed difference Result: v = Q H (w w 0 ) ( ) J(w) = J min + v H Ωv M = J min + λ k v k vk J(w) = J min + k=1 M λ k v k 2 k=1 Note: ( ) shows that v k is the difference (w w 0 ) projected onto eigenvector q k Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 22 / 34

Communications Example Example Consider the following system & C ( + 7 0 1 2 : - * +, - > 0 9 9? 7 1 @ 6 4 1 0 7. +, - & ' ( @ A 6 7 : B ) ( D : 2 1 5 : ; 2 1 E 7 6 B / 0 1 2 3 4 5 6 7 2 8 0 5 9 : ; 0 < 2 : 5 = 6 4 1 0 7 ) ( G F Objective: Determine the optimal filter for a given system and channel Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 23 / 34

Communications Example Specific Objective Determine the optimal order two filter weights, w 0, for H 1 (z) = H 2 (z) = 1 1+0.8458z 1 1 1 0.9458z 1 [AR process] [communication channel] where v 1 (n) and v 2 (n) zero mean white noise processes with σ 2 1 = 0.27 and σ2 2 = 0.1 Note: To determine w 0, we need: R u auto correlation of the received signal p the cross correlation between received signal u(n) and the desired signal d(n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 24 / 34

Communications Example J H e M Y R S T \ O L M N O ` R [ [ a Y S b X V S R Y P M N O H I J b c X Y \ d K J f \ T S W \ ] T S g Y X d Q R S T U V W X Y T Z R W [ \ ] R ^ T \ W _ X V S R Y K J i h Procedure: Consider R u first Since u(n) = x(n)+v 2 (n), where v 2 (n) is white with σ2 2 = 0.1 and is uncorrelated with x(n) [ ] 0.1 0 R u = R x + R v2 = R x + 0 0.1 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 25 / 34

j k š Wiener (MSE) Filtering Theory Communications Example q t x } ~ z w x y z } Œ ~ ƒ ~ } { x y z q r t v t l m l n o p Ž ƒ ~ ˆ ~ ƒ k œ v t n o } ~ ƒ } ˆ } Š ƒ ~ } Next, consider the relation between x(n) and v 1 (n) where for the give systems H 1 (z)h 2 (z) = X(z) = H 1 (z)h 2 (z)v 1 (z) 1 (1+0.8458z 1 )(1 0.9458z 1 ) Converting to the time domain, we see x(n) is an order 2 AR process x(n) 0.1x(n 1) 0.8x(n 2) = v 1 (n) x(n)+a 1 x(n 1)+a 2 x(n 2) = v 1 (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 26 / 34

Communications Example Since x(n) is a real valued order two AR process, the Yule-Walker equations are given by [ r(0) r(1) r (1) r(0) [ r(0) r(1) r(1) r(0) Solving for the coefficients: a 1 ][ a1 a 2 ][ a1 a 2 ] = ] = [ r (1) r (2) [ r(1) r(2) = r(1)[r(0) r(2)] r 2 (0) r 2 (1) a 2 = r(0)r(2) r 2 (1) r 2 (0) r 2 (1) Question: What are the known and unknown terms in this system? ] ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 27 / 34

Communications Example Note: We must solve the system to obtain the unknown r( ) values Noting r(0) = σ 2 x and rearranging to solve for r(1) and r(2) r(1) = a 1 σx 2 ( ) 1+a ( 2 ) r(2) = a 2 + a2 1 1+a 2 The Yule-Walker equations also stipulate σ 2 x ( ) σ 2 v 1 = r(0)+a 1 r(1)+a 2 r(2) ( ) Next, utilize the determined r(1), r(2) and given a 1, a 2,σ 2 v 1 values to determine r(0) = σ 2 x Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 28 / 34

Communications Example Substituting ( ) and ( ) into ( ), utilize r(0) = σ 2 x, and rearranging σ 2 v 1 = σx 2 = = r(0)+a 1 r(1)+a 2 r(2) ( ( ) = σx 2 a1 + a 1 )σ 2x + a 2 a 2 + a2 1 1+a 2 1+a 2 ( ) 1+ a2 1 a2 2 1+a + a2 1 a 2 2 1+a 2 ( ) 1+a2 σ 2 v 1 1 a 2 (1+a 2 ) 2 a1 2 σ 2 x σ 2 x Note: All correlation terms have been determined since r(0) = σ 2 x Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 29 / 34

Communications Example Using a 1 = 0.1, a 2 = 0.8, and σv 2 1 = 0.27, ( ) σx 2 1+a2 σv 2 = 1 1 a 2 (1+a 2 ) 2 a1 2 = 1 = Thus r(0) = 1. Similarly r(1) = a 1 1+a 2 σ 2 x = 0.5 and finally, R x is R x = [ 1 0.5 0.5 1 ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 30 / 34

ž Ÿ Communications Example Ñ Ï Î Ð Recall the overall system Ä ± ² ³» «± º º À ² Á µ ² ± ª Á Â» Ã Å» ³ ²» ¼ ³ ² Æ Ã Ç È Ÿ ª É Ê Ë Í Ì ± ² ³ µ ³ ¹ ± º» ¼ ± ½ ³» ¾ µ ² ± Putting the pieces of R u together R u = R x + R v2 [ ] [ 1 0.5 0.1 0 = + 0.5 1 0 0.1 [ ] 1.1 0.5 = 0.5 1.1 ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 31 / 34

Communications Example Recall: The Wiener solution is given by w 0 = R 1 p Note: R is known, but p is still to be determined {[ ]} d(n)x(n) p = E d(n)u(n 1) Recall or in the time domain X(z) = H 2 (z)d(z) = D(z) 1 0.9458z 1 x(n) 0.9458x(n 1) = d(n) Lastly, the observation is corrupted by additive noise u(n) = x(n)+v 2 (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 32 / 34

Communications Example Thus E{u(n)d(n)} = E{[x(n)+v 2 (n)][x(n) 0.9458x(n 1)]} Similarly, = E{x 2 (n)}+e{x(n)v 2 (n)} 0.9458E{x(n)x(n 1)} 0.9458E{v 2 (n)x(n 1)} = σx 2 + 0 0.9458r(1) 0 ( ) 1 = 1 0.9458 2 = 0.5272 E{u(n 1)d(n)} = E{[x(n 1)+v 2 (n 1)][x(n) 0.9458x(n 1)]} = r(1) 0.9458r(0) = 0.4458 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 33 / 34

Communications Example Thus Final Solution: p = [ 0.5272 0.4458 ] w 0 = R 1 p [ ] 1 [ 1.1 0.5 0.5272 = 0.5 1.1 0.4458 [ ] 0.8360 = 0.7853 ] The optimal filter weights are a function of the source signal statistics and the communications channel Question: How do we optimized a filter of when the statistics are not known in closed form or a priori? Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 34 / 34