Linear Optimum Filtering: Statement

Ch2: Wiener Filters Optimal filters for stationary stochastic models are reviewed and derived in this presentation. Contents: Linear optimal filtering Principle of orthogonality Minimum mean squared error Wiener-Hopf equations Error-performance surface Multiple Linear Regressor Model Numerical example Channel equalization Linearly constrained minimum variance filter Summary Generalized Sidelobe Cancellers 1

Linear Optimum Filtering: Statement Complex-valued stationary (at least w.s.s.) stochastic processes. Linear discrete-time filter, w 0, w 1, w 2,... (IIR or FIR (inherently stable)) y(n) is the estimate of the desired response d(n) e(n) is the estimation error, i.e., difference bw. the filter output and the desired response 2

Linear Optimum Filtering: Statement Problem statement: Given Filter input, u(n), Desired response, d(n), Find the optimum filter coefficients, w(n) To make the estimation error as small as possible How? An optimization problem. 3

Linear Optimum Filtering: Statement Optimization (minimization) criterion: 1. Expectation of the absolute value, 2. Expectation (mean) square value, 3. Expectation of higher powers of the absolute value of the estimation error. Minimization of the Mean Square value of the Error (MSE) is mathematically tractable. Problem becomes: Design a linear discrete-time filter whose output y(n) provides an estimate of a desired response d(n), given a set of input samples u(0), u(1), u(2)..., such that the mean-square value of the estimation error e(n), defined as the difference between the desired response d(n) and the actual response, is minimized. 4

Principle of Orthogonality Filter output is the convolution of the filter IR and the input where the asterisk denotes complex conjugation. Note that in complex terminology, the term w * ( ) ku n k represents the scalar version of an inner product of the filter coefficient w k and the filter input u(n - k). 5

Principle of Orthogonality Error: MSE (Mean-Square Error) criterion: Square Quadratic Func. Convex Func. Minimum is attained when (Gradient w.r.t. optimization variable w is zero.) 7

Derivative in complex variables Let Then derivation w.r.t. w k is Hence or the cost function J is a scalar independent of time n. 8

Principle of Orthogonality Partial derivative of J is Using and and Hence b k n 9

Principle of Orthogonality Since, or The necessary and sufficient condition for the cost function J to attain its minimum value is, for the corresponding value of the estimation error e o (n) to be orthogonal to each input sample that enters into the estimation of the desired response at time n. Error at the minimum is uncorrelated with the filter input! A good basis for testing whether the linear filter is operating in its optimum condition. 10

Principle of Orthogonality Corollary: If the filter is operating in optimum conditions (in the MSE sense) When the filter operates in its optimum condition, the estimate of the desired response defined by the filter output y o (n) and the corresponding estimation error e o (n) are orthogonal to each other. 11

Minimum Mean-Square Error Let the estimate of the desired response that is optimized in the MSE sense, depending on the inputs which span the space i.e. so Then the error in optimal conditions is or Also let the minimum MSE be ( 0) HW: try to derive this relation from the corollary. 12

Minimum Mean-Square Error Normalized MSE: Let Meaning If ε is zero, the optimum filter operates perfectly, in the sense that there is complete agreement bw. d(n) and. (Optimum case) If ε is unity, there is no agreement whatsoever bw. d(n) and (Worst case) 13

Wiener-Hopf Equations We have (principle of orthogonality) Rearranging i i i i k 0, 1, 2,... where Wiener-Hopf Equations (set of infinite eqn.s) 14

Wiener-Hopf Equations Solution of Wiener-Hopf Equations for Linear Transversal (FIR) Filter Wiener-Hopf Equations reduces to M simultaneous equations The transversal filter involves a combination of three operations: Storage, multiplication and addition, as described here: 15

1. The storage is represented by a cascade of M-1 one-sample delays, with the block for each such unit labeled z -1. We refer to the various points at which the one-sample delays are accessed as tap points. The tap inputs are denoted by u(n), u(n - 1),...,u(n M + 1). Thus, with u(n) viewed as the current value of the filter input, the remaining M - 1 tap inputs, u(n - 1),..., u(n - M + 1), represent past values of the input. 2. The scalar inner products of tap inputs u(n), u(n - 1),..., u(n - M + 1) and tap weights w 0, w 1,, w M-1 are respectively formed by using a corresponding set of multipliers. In particular the multiplication involved in forming the scalar product of u(n) and w 0 is represented by a block labeled w 0*, and so on for the other inner products. 3. The function of adders to sum the multiplier outputs to produce an overall output for the filter. 16

Wiener-Hopf Equations (Matrix Form) Let Then and 17

Wiener-Hopf Equations (Matrix Form) Then the Wiener-Hopf equations can be written as where is composed of the optimum (FIR) filter coefficients. The solution is found to be Note that R is almost always positive-definite. 18

Error-Performance Surface Substitute Rewriting 19

Error-Performance Surface Quadratic function of the filter coefficients convex function, then k or Wiener-Hopf Equations 20

Minimum value of Mean-Squared Error We calculated that The estimate of the desired response is Hence its variance is Then w p H o H p wo At w o. (J min is independent of w) 21

Canonical Form of the Error-Performance Surface Rewrite the cost function in matrix form Next, express J(w) as a perfect square in w Then, by substituting In other words, 22

Canonical Form of the Error-Performance Surface Observations: J(w) is quadratic in w, Minimum is attained at w=w o, J min is bounded below, and is always a positive quantity, J min >0 23

Canonical Form of the Error-Performance Surface Transformations may significantly simplify the analysis, Use Eigen-decomposition for R Then Let a vector a transformed version of the difference between the tap-weight vector w and the optimum solution w o Substituting back into J Canonical form The transformed vector v is called as the principal axes of the surface. 24

Canonical Form of the Error-Performance Surface w 2 w o J(w o )=J min J(w)=c curve v 2 (λ 2 ) J(v)=c curve J min Q Transformation v 1 (λ 1 ) w 1 25

Multiple Linear Regressor Model Wiener Filter tries to match the filter coefficients to the model of the desired response, d(n). Desired response can be generated by 1. a linear model, a 2. with noisy observable data, d(n) 3. noise is additive and white. Model order is m, i.e. a [ a0 a1 a 1] T m What should the length of the Wiener filter be to achive minimum MSE? 26

Multiple Linear Regressor Model The variance of the desired response is But we know that, R ( ) H m E um n um ( n) where w o is the filter optimized w.r.t. MSE (Wiener filter) of length M. 1. Underfitted model: M < m Performance improves quadratically with increasing M. Worst case: M=0, 2. Critically fitted model: M = m w o =a, R=R m, The only adjustable term; Quadratic in M 27

Multiple Linear Regressor Model 3. Overfitted model: M > m Filter longer than the model does not improve performance. um ( n) u( n), M m( n) u u M-m (n) is an (M-m)-by-l vector made up of past data samples immediately preceding the m-by-l vector u m (n). See Example 2.7 pp 108 or 110 28

Numerical Example (Ch2:P11) The desired response d(n) is modeled as an AR process of order 1; that is, it may be produced by applying a white-noise process v(n) of zero mean and variance σ 12 =0.27 to the input of an all-pole filter of order 1; H () z 1 1 + 0.8458z 1-1 The process d(n) is applied to a communication channel modeled by the all-pole transfer function H 1 () z 1-0.9458z 2-1 The channel output x(n) is corrupted by an additive white-noise process v 2 (n) of zero mean and variance σ 22 = 0.1, so a sample of the received signal u(n) equals u(n) = x(n) + v 2 (n) 29

+ + (a) Autoregressive model of desired response d(n); (b) model of noisy communication channel. 30

The requirement is to specify a Wiener filter consisting of a transversal filter with two taps, which operates on the received signal u(n) so as to produce an estimate of the desired response that is optimum in the mean-square sense. Statistical Characterization of the Desired Response d(n) and the Received signal u(n) d(n) + a 1 d(n - 1) = v 1 (n) where a 1 = 0.8458. The variance of the process d(n) equals 2 2 1 0.27 d 0.9486 2 2 1a 10.8458 1 The process d(n) acts as input to the channel. Hence, from Fig. (b). we find that the channel output x(n) is related to the channel input d(n) by the first-order difference equation x(n) + b 1 x(n - 1) = d(n) 31

where b 1 = -0.9458. We also observe from the two parts of Fig. that the channel output x(n) may be generated by applying the whitenoise process v 1 (n) to a second-order all-pole filter whose transfer function equals. H ( z ) H ( z ) H ( z ) X ( z ) H ( z ) V ( z ) 1 (1 + 0.8458 z )(1-0.9458 z ) 1 2-1 -1 Accordingly, x(n) is a second-order AR process described by the difference equation x( n) a x( n 1) a x( n 2) v( n) 1 2 where a 1 = -0.1 and a 2 = -0.8. Note that both AR processes d(n) and x(n) are wide-sense stationary. 32

Since the process x(n) and v 2 (n) are uncorrelated, it follows that the correlation matrix R equals the correlation matrix of x(n) plus the correlation matrix of v 2 (n). R=R x +R 2 R x r r x x (0) rx(1) (1) rx(0) 1 a (0). 2 2 2 1 x rx 2 2 1 a2 (1 a2) a1 1 0.8 0.27 1 2 2 1 0.8 (1 0.8) (0.1) a rx(1) 1 1 2 x a 2 0.1 0.5 1 0.8 See section 1.9 (Computer Experiments). 33

R x 1 0.5 0.5 1 Next we observe that since v 2 (n) is a white-noise process of zero mean and variance σ 22 = 0.1 R 2 0.1 0 0 0.1 R R R 1.1 0.5 x 2 0.5 1.1 34

x(n) + b 1 x(n - 1) = d(n) u(n)=x(n)+v 2 (n) p p(0) p( 1) Since these two processes are real valued p( k) p( k) E[ u( n k) d( n)] k 0, 1 p( k ) r ( k ) b r ( k 1), k 0, 1 b 0.9458 x 1 1 1 x 1 p(0) r (0) b r ( 1) 1-0.9458 0.5 0.5272 x x p(1) r (1) b r (0) 0.5 0.94581 0.4458 x x 0.5272 p -0.4458 35

Error Performance Surface Wiener Filter and MMSE R 1 1 r(0) r(1) r(1) r(0) 1 r(0) r(1) (0) r (1) r(1) r(0) 2 2 r 1.1456 0.5208 0.5208 1.1456 w J o min 1 0.8360 R p 0.7853 2 d H p wo 0.8360 0.9486 [0.5272, 0.4458] 0.7853 0.1579 36

Canonical Error-Performance Surface we know that where for M=2 Then v 2 (λ 2 ) J min v 1 (λ 1 ) 39

* Application Channel Equalization We consider a temporal signal-processing problem, namely that of channel equalization. When data are transmitted over the channel by means of discrete pulse-amplitude modulation combined with a linear modulation scheme (e.g., quadriphase-shift keying) the number of detectable levels that the telephone channel can support is essentially limited by intersymbol interference (ISI) rather than by additive noise. Criterion: 1. Zero Forcing (ZF) 2. Minimum Mean Square Error (MMSE) 40

Equalizer 41

The impulse response of the equalizer the impulse response of the channel Ignoring the effect of channel noise, the cascade connection of the channel and the equalizer is equivalent to a single tapped-delay-line filter where the sequence W k is equal to the convolution of the sequences c n and h k, i.e. 42

Let the data sequence u(n) applied to the channel input consist of a white-noise sequence of zero mean and unit variance. Accordingly, we may express the elements of the correlation matrix R of the channel input as follows 1, l 0 rl () 0, l 0 For d(n) supplied to the equalizer, we assume the availability of a delayed "replica" of the transmitted sequence. This d(n) may be generated by using another feedback shifter of identical design to that used to supply the original data sequence u(n). The two feedback shift registers are synchronized with each other such that we may set d(n) = u(n) Thus the cross-correlation between the transmitted sequence u(n) and the desired response d(n) is defined by 1, l 0 pl () 0, l 1, 2,..., N 43

w l N k 1, l 0 0, l 1, 2,..., N 1, l 0 hc k lk N 0, l 1, 2,..., N 44

Given the impulse response of the channel characterized by the coefficients c -N,,c -1,, c 0, c 1,, c N we may use above Eq. to solve for the unknown tap-weights h -N,,h -1,, h 0, h 1,, h N of the equalizer. In the literature on digital communications, an equalizer designed in accordance the above Eq. is referred to as a zero-forcing equalizer. The equalizer is so called because, with a single pulse transmitted over the channel, it "forces" the receiver output to be zero at all the sampling instances, except for the time instant that corresponds to the transmitted pulse. 45

Application Channel Equalization - MMSE L v(n) M x(n) y(n) z(n) Channel, h + Filter, w + - ε(n) Delay, δ x(n-δ) Transmitted signal passes through the dispersive channel and a corrupted version (both channel & noise) of x(n) arrives at the receiver. Problem: Design a receiver filter so that we can obtain a delayed version of the transmitted signal at its output. 46

Application Channel Equalization MMSE cost function is: Filter output Convolution Filter input Convolution 47

Application Channel Equalization Combine last two equations Convolution h L-1 x Toeplitz matrix performs convolution Compact form of the filter output -2??? Desired signal is x(n-δ), or 48

Application Channel Equalization Rewrite the MMSE cost function Expanding (data and noise are uncorrelated E{x(n)v(k)}=0 for all n,k) Re-expressing the expectations 49

Application Channel Equalization Quadratic function gradient is zero at minimum The solution is found as And J min is J min depends on the design parameter δ 50

Application Linearly Constrained Minimum Variance (LCMV) Filter Problem: 1. We want to design an FIR filter which suppresses all frequency components of the filter input except ω o, with a gain of g at ω o. 51

Application Linearly Constrained Minimum - Variance Filter Problem: 2. We want to design a beamformer which can resolve an incident wave coming from angle θ o (with a scaling factor g), while at the same time suppress all other waves coming from other directions. Fig: 2.10 Plane wave incident on a linear-array antenna 52

Application Linearly Constrained Minimum - Variance Filter Although these problems are physically different, they are mathematically equivalent. They can be expressed as follows: Suppress all components (freq. ω or dir. θ) of a signal while setting the gain of a certain component constant (ω o or θ o ) They can be formulated as a constrained optimization problem: Cost function: variance of all components (to be minimized) Constraint (equality): the gain of a single component has to be g. Observe that there is no desired response!. 53

Application Linearly Constrained Minimum - Variance Filter Mathematical model: Filter output Beamformer output Minimize the MS value of y(n) subject to: Constraints: Minimize the MS beamformer output y(n) subject to linear constraint: Normalized angular freq. with respect to the sampling rate g is a complex valued gain 54

Application Linearly Constrained Minimum - Variance Filter Cost function: output power quadratic convex Constraint : linear Method of Lagrange multipliers can be utilized to solve the problem. output power Solution: Set the gradient of J to zero constraint Optimum beamformer weights are found from the set of equations * similar to Wiener-Hopf equations. 55

Application Linearly Constrained Minimum - Variance Filter Rewrite the equations in matrix form: Hence How to find λ? Use the linear constraint: to find * j0 j0( M 1) Rwo s( 0) where s( 0) 1 e e 2 * T Therefore the solution becomes For θ o, w o is the linearly Constrained Minimum-Variance (LCMV) beamformer For ω o, w o is the linearly Constrained Minimum-Variance (LCMV) filter 56

Minimum-Variance Distortionless Response Beamformer/Filter Distortionless set g=1, then We can show that (HW) H J min represents an estimate of the variance of the signal impinging on the antenna array along the direction θ 0. Generalize the result to any direction θ (angular frequency ω): minimum-variance distortionless response (MVDR) spectrum An estimate of the power of the signal coming from direction θ An estimate of the power of the signal coming from frequency ω 57

Minimum Variance Distortionless Response Spectrum In addition to spectrum estimation, the constrained optimization is popular in array signal processing in spatial rather than temporal domain. Therefor one can include multiple constraints to result in generalized sidelobe canceler. 58

Summary For stationary signals, the MSE is a quadratic function of linear filter coefficients. optimal linear filter in the MMSE sense is found by setting gradients to zero orthogonality principle Wiener filter. It depends on the second order statistics. It can be used as an approximation, if the signals are locally stationary. A competing optimization criterion is to minimize the filter output mean power (variance) given constraints on desired outputs optimization by the method of Lagrange multipliers. 59

Generalized Sidelobe Cancellers Continuing with the discussion of the LCMV narrowband beamformer defined by the linear constraint of Eq.(2.76), we note that this constraint represents the inner product in which w is the weight vector and s( 0) is the steering vector pointing along the electrical angle 0. The steering vector is an M-by-1 vector, where M is the number of antenna elements in the beamformer. We may generalize the notion of a linear constraint by introducing multiple linear constraints defined by C H w=g (2.91) 60

Generalized Sidelobe Cancellers The matrix C is termed the constraint matrix, and the vector g, termed the gain vector, has constant elements. Assuming that there are L linear constraints, C is an M-by-L matrix and g is an L-by-1 vector; each column of the matrix C represents a single linear constraint. Furthermore, it is assumed that the constraint matrix C has linearly independent columns. For example, with H 1 [ s( 0), s( 1)] w, 0 the narrowband beamformer is constrained to preserve a signal of interest impinging on the array along the electrical angle θ o and, at the same time, to suppress an interference known to originate along the electrical angle θ l. 61

Generalized Sidelobe Cancellers Let the columns of an M-by-(M - L) matrix be defined as a basis for the orthogonal complement of the space spanned by the columns of matrix C. Using the definition of an orthogonal complement, we may thus write or, just as well, The null matrix 0 in Eq.(2.92) is L-by-(M - L), whereas in Eq.(2.93) it is (M - L)-by-L; we naturally have M > L. We now define the M- by-m partitioned matrix whose columns span the entire M-dimensional signal space. The inverse matrix U -1 exists by virtue of the fact that the determinant of matrix U is nonzero. C a C H C 0 a (2.92) H (2.93) C C 0 a U [ C C ] (2.94) a 62

Generalized Sidelobe Cancellers Next, let the M-by-1 weight vector of the beamformer be written in terms of the matrix U as w Uq (2.95) Equivalently, the M-by-1 vector q is defined by q U -1 w (2.96) Let q be partitioned in a manner compatible with that in Eq.(2.94), as shown by v q (2.97) w a where v is an L-by-1 vector and the (M - L)-by-l vector w a is that portion of the weight vector w that is not affected by the constraints. We may then use the definitions of Eqs. (2.94) and (2.97) in Eq.(2.95) to write v w [ C Ca ] Cv Cawa (2.98) w a 63

Generalized Sidelobe Cancellers We may now apply the multiple linear constraints of Eq.(2.91), (C H w=g) obtaining H H C Cv C Cawa g But, from Eq.(2.92), we know that reduces to H C Cv Solving for the vector v, we thus get g H C C a is zero; hence, Eq.(2.99) which shows that the multiple linear constraints do not affect w a 2.99 2.100 H 1 (2.101) v C C g 64

Generalized Sidelobe Cancellers Next, we define a fixed beamformer component represented by w q =Cv=C(C H C) -1 g (2.102) which is orthogonal to the columns of matrix C a by virtue of the property described in Eq.(2.93); the rationale for using the subscript q in w q will become apparent later. From this definition, we may use Eq.(2.98) to express the overall weight vector of the beamformer as w=w q -C a w a (2.103) Substituting Eq.(2.103) into Eq.(291) yields C H w=g C H w q -C H C a w a =g (2.103) 65

Generalized Sidelobe Cancellers which, by virtue of Eq. (2.92), reduces to C H w q =g (2.104) Equation (2.104) shows that weight vector w q is that part of the weight vector w which satisfies the constraints. In contrast, the vector w a is unaffected by the beamformer. Thus, in light of Eq.(2.103), the beamformer may be represented by the block diagram shown in Fig. 2.11(a). The beamformer described herein is referred to as a generalized sidelobe cenceller (GSC). 66

Generalized Sidelobe Cancellers In light of Eq. (2.102), we may now perform an unconstrained minimization of the mean-square value of the beamformer output y(n) with respect to the adjustable weight vector w a. According to Eq.(2.75), the beamformer output is defined by the inner product where 67

FIGURE 2.11 (a) Block diagram of generalized sidelobe canceller. (b) Reformulation of the generalized sidelobe cancelling problem as a standard optimum filtering problem. 68

Generalized Sidelobe Cancellers u(n) is the input signal vector, in which the electrical angle is defined by the direction of arrival of the incoming plane wave and u ( n) 0 is the electrical signal picked up by antenna element 0 of the linear array in Fig. 2.10 at time n. Hence, substituting Eq. (2.103) into Eq. (2.105) yields H H H y( n) w u( n) w C u( n) (2.107) q a a 0 69

Generalized Sidelobe Cancellers If we now define and we may rewrite Eq.(2.107) in a form that resembles the standard Wiener filter exactly, as shown by where d(n) plays the role of a desired response for the GSC and x(n) plays the role of input vector, as depicted in Fig.2.11(b). 70

Generalized Sidelobe Cancellers We thus see that the combined use of vector w q and matrix C a has converted the linearly constrained optimization problem into a standard optimum filtering problem. In particular, we now have an unconstrained optimization problem involving the adjustable portion w a of the weight vector, which may be formally written as where the (M-L)-by-1 cross-correlation vector and the (M-L)-by-(M-L) correlation matrix 71

Generalized Sidelobe Cancellers The cost function of Eq.(2.111) is quadratic in the unknown vector w a, which, as previously stated, embodies the available degrees of freedom in the GSC. Most importantly, this cost function has exactly the same mathematical form as that of the standard wiener filter defined in Eq.(2.50). Accordingly, we may readily use our previous results to obtain the optimum value of w a as Using the definitions of Eqs.(2.108) and (2.109) in Eq.(2.112), we may express the vector p x as where R is the correlation matrix of the incoming data vector u(n). 72

Generalized Sidelobe Cancellers Similarly, using the definition of Eq.(2.109) in Eq.(2.113), we may express the matrix R x as The matrix C a has full rank, and the correlation matrix R is positive definite, since the incoming data always contain some form of additive sensor noise, with the result that R x is nonsingular. Accordingly, we may rewrite the optimum solution of Eq.(2.114) as 73

Generalized Sidelobe Cancellers Let Po denote the minimum output power of the GSC attained by using the optimum solution w ao. Then adapting the previous result derived in Eq.(2.49) for the standard Wiener filter and proceeding in a manner similar to that just described, we may express P o as Now consider the special case of a quiet environment, for which the received signal consists of white noise acting alone. Let the corresponding value of the correlation matrix R be written as 74

Generalized Sidelobe Cancellers where I is the M-by-M identity matrix and σ 2 is the noise variance. Under this condition we readily find, from Eq.(2.117),that By definition, the weight vector w q is orthogonal to the columns of matrix C a. It follows, therefore, that the optimum weight vector w ao is identically zero for the quiet environment described by Eq.(2.119). Thus, with w ao equal to zero, we find from Eq.(2.103) that w = w q It is for this reason that w q is often referred to as the quiescent weight vector -hence the use of subscript q to denote it. 75

Generalized Sidelobe Cancellers Filtering Interpretations of W q and C a The quiescent weight vector w q and matrix C a play critical roles of their own in the operation of the GSC. To develop physical interpretations of them, consider an MVDR spectrum estimator (formulated in temporal terms) for which we have and Hence, the use of these values in Eq. (2.102) yields the corresponding value of the quiescent weight vector, viz., 76

Generalized Sidelobe Cancellers which represents an FIR filter of length M-Be frequency response of this filter is given by Figure 2.12(a) shows the amplitude response of this filter for M = 4 and ω 0 =1. From this figure, we clearly see that the FIR filter representing the quiescent weight vector w q acts like a bandpass filter tuned to the angular frequency ω 0, for which the MVDR spectrum estimator is constrained to produce a distortionless response. 77

Generalized Sidelobe Cancellers Consider next a physical interpretation of the matrix C a. The use of Eq.(2.120) in Eq.(2.92) yields According to Eq.(2.123),each of the (M-L) columns of matrix C a represents an FIR filter with an amplitude response that is zero at ω 0, as illustrated in Fig.2.12(b) for ω 0 =1, M = 4, L = 1,and In other words, the matrix C a is represented by a bank of band-rejection filters, each of which is tuned to ω 0. Thus, C a is referred to as a signal blocking matrix, since it blocks (rejects) the received signal at the angular frequency ω 0.The function of the matrix C a is to cancel interference that leaks through the sidelobes of the bandpass filter representing the quiescent weight vector w q. 78

Generalized Sidelobe Cancellers 79

Generalized Sidelobe Cancellers FlGURE 2.12(a) Interpretation of w H q s(ω) as the response of an FIR filter. (b) Interpretation of each column of matrix C a as a band-rejection filter. In both parts of the figure It is assumed that ω 0 =1 HW2: Ch2; P 2, 5, 7, 10, 13, 15 80