V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline Goals Introduce Wiener-Hopf (WH) equations Introduce application of the steepest descent method to the WH problem Approximation to the Least Mean Square (LMS) algorithm Adaline Description (adaptive linear neuron) Applications to adaptive noise cancellation (ANC) References: [Hagan], [Haykin] EC4460.SuFy06/MPF 1

1) Wiener-Hopf Equations p 0 p 1 p w 0 w 1 w Assume 9 sensors on an array y + Σ + d desired response p q w q Output: ( ) y n = T w p T = 0,, q p p p T = 0,, q w w w LMS criterion: find weights ω i which minimize the Mean Squared (MS) error: σ d y e e = E = E σ e is minimum for ω so that: σ e ω k = 0, k E e = 0, k ω k Assume the filter is real E e = ω k EC4460.SuFy06/MPF

[ ] k σ e = Ee0pk = 0 k= 0,, q Orthogonality Principle: Let the error e = d y. The weights ω k minimize the MSE if ω is chosen so that σ e [ ] = 0 = 0,, E e p k q k (the error e is orthogonal to the input data p k ) Consequences: Ex: The estimate of the desired response to the filter y 0 (n) and the error e 0 (n) are Why? E[ y0e 0] = p 3 p p 1 EC4460.SuFy06/MPF 3

Optimum Weight Computation: Use orthogonality principle [ ] Vk σ e = 0 V k σ e = E e pi = 0 i = 0,, q ( ) 0 E d y p i = i = 0,, q EC4460.SuFy06/MPF 4

EC4460.SuFy06/MPF 5

Summary: u k + p k + filter y k + target output desired response d k noise n k y k W-H equations lead to designing a filter with: yk T = ω So that the MSE e E ( dk yk) p { } σ = is minimum Operations W-H optimum solution: d d d k k k = = = p p p k k l k+ l ω opt = R r 1 u * du Minimum MSE (MMSE) given as: σ e d T ( ) ω opt = R 0 rdu EC4460.SuFy06/MPF 6

Example Assume that you are given: (1) A noisy signal x(n)=s(n)+w(n) () access to w (n) noise correlated to w(n) (3) s(n) wideband (4) w(n) narrowband Goal: minimize the noise effects EC4460.SuFy06/MPF 7

EC4460.SuFy06/MPF 8

) Method of Steepest Descent (SD) W-H equations require to compute the inverse of a matrix. Solution may need to be recomputed if the signal input changes its behavior. Addressed via the method of SD by solving W-H equations iteratively (leading to an adaptive scheme). Filter progressively learns the correlation and cross-correlation, and the filter coefficients converge to the optimum values given by the W-H equations. i.e., finds the minimum values of Descent Directions σ e iteratively. take steps that lead downhill for the MSE function: ( ) = E{ e } k F w T = E dk ω p EC4460.SuFy06/MPF 9

F(ω) F min = MMSE ω 0 ω updated weight 1 ω ( ( )) n+ 1 = ωn + γ Vω F ω n γ > 0 EC4460.SuFy06/MPF 10

Stability of Steepest Descent Iterative solution: General solution: Note n ωn+ 1 = I γr p ωn + γr pd n ( I Rp ) ( ) ω = ω + γ ω ω SD algorithm CV opt 0 opt n ( I Rp ) λ( I γrp ) γλ ( Rp ) ωn ωopt iff γ 0 iff < 1 iff 1 < 1 ( Rp ) λ( Rp ) iff 1 γλ < 1 0< γ < SD algorithm CV when ( R ) 0< γ < λmax p In practice use the fact that (for a filter of length Q): λ max Q 1 ( ) ( ) R trace R R i, i QR 0 p p p p i= 0 ( ) 0< γ < QR p 0 Q filter length EC4460.SuFy06/MPF 11

Geometrical Significance of Eigenvectors and Eigenvalues 1 7 R p ω R p 1 9 1 u = ; =, = u, d = 4 i σ find ω and MMSE EC4460.SuFy06/MPF 1

Error surface shape and eigenvalue ratios a 1 = 0.195 a = 0.95 λ 1 = 1.1 λ = 0.9 χ = 1. a 1 = 0.975 a = 0.95 λ 1 = 1.5 λ = 0.5 χ = 3 EC4460.SuFy06/MPF 13

λ 1 = 1.81 λ = 0.18 χ = 10 λ 1 = 1.957 λ = 0.0198 χ = 100 EC4460.SuFy06/MPF 14

Effects of varying the step size on the iterative scheme behavior EC4460.SuFy06/MPF 15

Adaline Network (Textbook notations) Input R a = Goal: find w and b which minimize the error between target and network outputs. ( ) ( ) ( T F x = E e E t a E t x z) = = ; w p x= z = b 1 p R x 1 1 W S x R b S x 1 Linear Neuron n S S x 1 a = purelin (Wp + b) overall all feature vectors and target outputs a S x 1 ( ) ( ) ( T F x = E t a = E t x z) T T T T = E t x z z x+ x zz x EC4460.SuFy06/MPF 16

When does F(x) have a minimum? In practice, how do we get to the optimum solution? use one of the iterative optimization techniques seen before ( ) x = x αv F x k+ 1 k+ 1 x= x k How to multiply notations and computations? (LMS algorithm) EC4460.SuFy06/MPF 17

Gradient Approximation Consequence in Convergence Behavior LMS trajectory for α =0.1 1 0-1 - - -1 0 1 EC4460.SuFy06/MPF 18

Bound on the Learning Rate α Recall that steepest descent algorithm does: 1 T T F( x) = x Ax+ d + c x k+ 1 = xk + Here we have: T ( ) F x = x Rx x h+ c x + = k 1 T Simplification for an actually usable bound on α T R = E zz When the feature vector magnitude changes dramatically needs to normalize impact of update xk+ 1 = xk + αezk / z + eps EC4460.SuFy06/MPF 19

Application to Adaptive Noise Cancellation (with reference) Assume you receive a noisy signal x(n) + w(n) desired speech-only signal Goal: extract x(n) to make signal easier to understand cockpit noise Ex: pilot with mask case noisy voice: received inside the mask: x(n) + w(n) noise reference: collected outside the mask: w (n) Adaptive filter set up: w (n) filter d(n) + + e(n) d(n) EC4460.SuFy06/MPF 0

Application to Adaptive Noise Cancellation (without reference) Assume you receive a noisy signal x(n) + w(n) Goal: extract x(n) speech-only signal cockpit noise EC4460.SuFy06/MPF 1

EC4460.SuFy06/MPF