ELEG-636: Statistical Signal Processing

Size: px
Start display at page:

Download "ELEG-636: Statistical Signal Processing"

Transcription

1 ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

2 Course Objectives & Structure Course Objectives & Structure Objective: Given a discrete time sequence {x(n)}, develop Statistical and spectral signal representation Filtering, prediction, and system identification algorithms Optimization methods that are Statistical Adaptive Course Structure: Weekly lectures [notes: arce] Periodic homework (theory & Matlab implementations) [15%] Midterm & Final examinations [85%] Textbook: Haykin, Adaptive Filter Theory. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

3 Course Objectives & Structure Course Objectives & Structure Broad Applications in Communications, Imaging, Sensors. Emerging application in Brain-imaging techniques Brain-machine interfaces, Implantable devices. Neurofeedback presents real-time physiological signals from MRIs in a visual or auditory form to provide information about brain activity. These signals are used to train the patient to alter neural activity in a desired direction. Traditionally, feedback using EEGs or other mechanisms has not focused on the brain because the resolution is not good enough. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

4 Motivation Adaptive Optimization and Filtering Methods Motivation Adaptive optimization and filtering methods are appropriate, advantageous, or necessary when: Signal statistics are not known aprioriand must be learned from observed or representative samples Signal statistics evolve over time Time or computational restrictions dictate that simple, if repetitive, operations be employed rather than solving more complex, closed form expressions To be considered are the following algorithms: Steepest Descent (SD) deterministic Least Means Squared (LMS) stochastic Recursive Least Squares (RLS) deterministic Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

5 Steepest Descent Definition (Steepest Descent (SD)) Steepest descent, also known as gradient descent, it is an iterative technique for finding the local minimum of a function. Approach: Given an arbitrary starting point, the current location (value) is moved in steps proportional to the negatives of the gradient at the current point. SD is an old, deterministic method, that is the basis for stochastic gradient based methods SD is a feedback approach to finding local minimum of an error performance surface The error surface must be known a priori In the MSE case, SD converges converges to the optimal solution, w 0 = R 1 p, without inverting a matrix Question: Why in the MSE case does this converge to the global minimum rather than a local minimum? Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

6 Steepest Descent Example Consider a well structured cost function with a single minimum. The optimization proceeds as follows: Contour plot showing that evolution of the optimization Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

7 Steepest Descent Example Consider a gradient ascent example in which there are multiple minima/maxima Surface plot showing the multiple minima and maxima Contour plot illustrating that the final result depends on starting value Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

8 Steepest Descent To derive the approach, consider the FIR case: {x(n)} the WSS input samples {d(n)} the WSS desired output {ˆd(n)} the estimate of the desired signal given by ˆd(n) =w H (n)x(n) where x(n) =[x(n), x(n 1),, x(n M + 1)] T w(n) =[w 0 (n), w 1 (n),, w M-1 (n)] T [obs. vector] [time indexed filter coefs.] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

9 Steepest Descent Then similarly to previously considered cases and the MSE at time n is where σ 2 d e(n) =d(n) ˆd(n) =d(n) w H (n)x(n) J(n) = E{ e(n) 2 } = σ 2 d wh (n)p p H w(n)+w H (n)rw(n) variance of desired signal p cross-correlation between x(n) and d(n) R correlation matrix of x(n) Note: The weight vector and cost function are time indexed (functions of time) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

10 Steepest Descent When w(n) is set to the (optimal) Wiener solution, w(n) =w 0 = R 1 p and J(n) =J min = σd 2 ph w 0 Use the method of steepest descent to iteratively find w 0. The optimal result is achieved since the cost function is a second order polynomial with a single unique minimum Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

11 Steepest Descent Example Let M = 2. The MSE is a bowl shaped surface, which is a function of the 2-D space weight vector w(n) J(w) w 2 J w 1 (J ) J w 2 w 2 J(w) w 0 w 1 w 0 w w 1 Surface Plot Contour Plot Imagine dropping a marble at any point on the bowl-shaped surface. The ball will reach the minimum point by going through the path of steepest descent. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79 1

12 Steepest Descent Observation: Set the direction of filter update as: J(n) Resulting Update: or, since J(n) = 2p + 2Rw(n) w(n + 1) =w(n)+ 1 2 µ[ J(n)] w(n + 1) =w(n)+µ[p Rw(n)] n = 0, 1, 2, where w(0) =0 (or other appropriate value) and µ is the step size Observation: SD uses feedback, which makes it possible for the system to be unstable Bounds on the step size guaranteeing stability can be determined with respect to the eigenvalues of R (Widrow, 1970) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

13 Convergence Analysis Convergence Analysis Define the error vector for the tap weights as c(n) =w(n) w 0 Then using p = Rw 0 in the update, w(n + 1) = w(n)+ µ[p Rw(n)] = w(n)+µ[rw 0 Rw(n)] = w(n) µrc(n) and subtracting w 0 from both sides w(n + 1) w 0 = w(n) w 0 µrc(n) c(n + 1) = c(n) µrc(n) = [I µr]c(n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

14 Convergence Analysis Using the Unitary Similarity Transform R = QΩQ H we have c(n + 1) = [I µr]c(n) = [I µqωq H ]c(n) Q H c(n + 1) = [Q H µq H QΩQ H ]c(n) = [I µω]q H c(n) ( ) Define the transformed coefficients as v(n) = Q H c(n) = Q H (w(n) w 0 ) Then ( ) becomes v(n + 1) =[I µω]v(n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

15 Convergence Analysis Consider the initial condition of v(n) v(0) = Q H (w(0) w 0 ) = Q H w 0 [if w(0) =0] Consider the k th term (mode) in v(n + 1) =[I µω]v(n) Note [I µω] is diagonal Thus all modes are independently updated The update for the k th term can be written as v k (n + 1) =(1 µλ k )v k (n) k = 1, 2,, M or using recursion v k (n) =(1 µλ k ) n v k (0) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

16 Convergence Analysis Observation: Conversion to the optimal solution requires lim n = w 0 lim c(n) n = lim w(n) w 0 = 0 n lim v(n) = lim Q H c(n) =0 n n lim v k (n) = 0 k = 1, 2,, M ( ) n Result: According to the recursion the limit in ( ) holds if and only if v k (n) =(1 µλ k ) n v k (0) 1 µλ k < 1 for all k Thus since the eigenvalues are nonnegative, 0 <µλ max < 2, or 0 <µ< 2 λ max Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

17 Convergence Analysis Observation: The k th mode has geometric decay v k (n) =(1 µλ k ) n v k (0) The rate of decay it is characterized by the time it takes to decay to e 1 of the initial value Let τ k denote this time for the k th mode v k (τ k ) = (1 µλ k ) τ k v k (0) =e 1 v k (0) e 1 = (1 µλ k ) τ k τ k = Result: The overall rate of decay is 1 ln(1 µλ k ) 1 for µ 1 µλ k 1 ln(1 µλ max ) τ 1 ln(1 µλ min ) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

18 Convergence Analysis Example Consider the typical behavior of a single mode Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

19 Error Analysis Convergence Analysis Recall that J(n) = J min +(w(n) w 0 ) H R(w(n) w 0 ) = J min +(w(n) w 0 ) H QΩQ H (w(n) w 0 ) = J min + v(n) H Ωv(n) = M J min + λ k v k (n) 2 [sub in v k (n) =(1 µλ k ) n v k (0)] = J min + k=1 M λ k (1 µλ k ) 2n v k (0) 2 k=1 Result: If 0 <µ< 2 λ max, then lim n J(n) =J min Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

20 Example: Predictor Example Consider a two tap predictor for real valued input Analyzed the effects of the following cases: Varying the eigenvalue spread χ(r) = λ max λ min while keeping µ fixed Varying µ and keeping the eigenvalue spread χ(r) fixed λ Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

21 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =1.22 Small eigenvalue spread modes converge at a similar rate Eigenvalue spread: χ(r) =3 Moderate eigenvalue spread modes converge at moderately similar rates Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

22 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =10 Large eigenvalue spread modes converge at different rates Eigenvalue spread: χ(r) =100 Very large eigenvalue spread modes converge at very different rates Principle direction convergence is fastest Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

23 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =1.22 Small eigenvalue spread modes converge at a similar rate Eigenvalue spread: χ(r) =3 Moderate eigenvalue spread modes converge at moderately similar rates Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

24 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =10 Large eigenvalue spread modes converge at different rates Eigenvalue spread: χ(r) =100 Very large eigenvalue spread modes converge at very different rates Principle direction convergence is fastest Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

25 Example: Predictor Learning curves of steepest-descent algorithm with step-size parameter µ = 0.3 and varying eigenvalue spread. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

26 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] with χ(r) =10 and varying step sizes Step sizes: µ = 0.3 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

27 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] with χ(r) =10 and varying step sizes Step sizes: µ = 0.3 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

28 Example: Predictor Example Consider a system identification problem {x(n)} w(n) system d ˆ( n ) _ + d(n) e(n) Suppose M = 2and R x = [ ] P = [ ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

29 Example: Predictor From eigen analysis we have λ 1 = 1.8,λ 2 = 0.2 µ< also and Also, q 1 = 1 2 [ 1 1 ] Q = 1 2 [ w 0 = R 1 p = q 2 = 1 2 [ 1 1 [ ] ] ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

30 Example: Predictor Thus v(n) =Q H [w(n) w 0 ] Noting that v(0) = Q H w 0 = 1 2 [ ][ ] = [ ] and v 1 (n) =(1 µ(1.8)) n 0.51 v 2 (n) =(1 µ(0.2)) n 1.06 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

31 Example: Predictor SD convergence properties for two µ values Step sizes: µ = 0.5 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

32 Least Mean Squares (LMS) Least Mean Squares (LMS) Definition (Least Mean Squares (LMS) Algorithm) Motivation: The error performance surface used by the SD method is not always known apriori Solution: Use estimated values. We will use the following instantaneous estimates ˆR(n) =x(n)x H (n) ˆp(n) =x(n)d (n) Result: The estimates are RVs and thus this leads to a stochastic optimization Historical Note: Invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

33 Least Mean Squares (LMS) Recall the SD update w(n + 1) =w(n)+ 1 2 µ[ (J(n))] where the gradient of the error surface at w(n) was shown to be Using the instantaneous estimates, (J(n)) = 2p + 2Rw(n) ˆ (J(n)) = 2x(n)d (n)+2x(n)x H (n)w(n) = 2x(n)[d (n) x H (n)w(n)] = 2x(n)[d (n) d ˆ (n)] = 2x(n)e (n) where e (n) is the complex conjugate of the estimate error. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

34 Least Mean Squares (LMS) Utilizing (J(n)) = 2x(n)e (n) in the update w(n + 1) = w(n)+ 1 2 µ[ (J(n))] = w(n)+µx(n)e (n) [LMS Update] The LMS algorithm belongs to the family of stochastic gradient algorithms The update is extremely simple Although the instantaneous estimates may have large variance, the LMS algorithm is recursive and effectively averages these estimates The simplicity and good performance of the LMS algorithm make it the benchmark against which other optimization algorithms are judged Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

35 Convergence Analysis Convergence Analysis Independence Theorem The following conditions hold: 1 The vectors x(1), x(2),, x(n) are statistically independent 2 x(n) is independent of d(1), d(2),, d(n 1) 3 d(n) is statistically dependent on x(n), but is independent of d(1), d(2),, d(n 1) 4 x(n) and d(n) are mutually Gaussian The independence theorem is invoked in the LMS algorithm analysis The independence theorem is justified in some cases, e.g., beamforming where we receive independent vector observations In other cases it is not well justified, but allows the analysis to proceeds (i.e., when all else fails, invoke simplifying assumptions) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

36 Convergence Analysis We will invoke the independence theorem to show that w(n) converges to the optimal solution in the mean To prove this, evaluate the update lim n E{w(n)} = w 0 w(n + 1) = w(n)+µx(n)e (n) w(n + 1) w 0 = w(n) w 0 + µx(n)e (n) c(n + 1) = c(n)+µx(n)(d (n) x H (n)w(n)) = c(n)+µx(n)d (n) µx(n)x H (n)[w(n) w 0 + w 0 ] = c(n)+µx(n)d (n) µx(n)x H (n)c(n) µx(n)x H (n)w 0 = [I µx(n)x H (n)]c(n)+µx(n)[d (n) x H (n)w 0 ] = [I µx(n)x H (n)]c(n)+µx(n)e 0 (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

37 Convergence Analysis Take the expectation of the update noting that w(n) is based on past inputs and desired values w(n), and consequently c(n)), are independent of x(n) (Independence Theorem) Thus c(n + 1) = [I µx(n)x H (n)]c(n)+µx(n)e 0 (n) E{c(n + 1)} = (I µr)e{c(n)} + µ E{x(n)e0 }{{ (n)} } =0 why? = (I µr)e{c(n)} Using arguments similar to the SD case we have or equivalently 2 lim E{c(n)} = 0 if 0 <µ< n λ max lim E{w(n)} = w 0 if 0 <µ< 2 n λ max Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

38 Noting that M i=1 λ i = trace[r] Convergence Analysis λ max trace[r] =Mr(0) =Mσ 2 x Thus a more conservative bound (and one easier to determine) is 0 <µ< 2 Mσ 2 x Convergence in the mean lim n E{w(n)} = w 0 is a weak condition that says nothing about the variance, which may even grow A stronger condition is convergence in the mean square, which says lim n E{ c(n) 2 } = constant Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

39 Convergence Analysis Proving convergence in the mean square is equivalent to showing that lim J(n) = lim n n E{ e(n) 2 } = constant To evaluate the limit, write e(n) as Thus e(n) = d(n) ˆd(n) =d(n) w H (n)x(n) = d(n) w H 0 x(n) [wh (n) w H 0 ]x(n) = e 0 (n) c H (n)x(n) J(n) = E{ e(n) 2 } {( )( )} = E e 0 (n) c H (n)x(n) e0 (n) xh (n)c(n) = J min + E{c H (n)x(n)x H (n)c(n)} }{{} J ex (n) = J min + J ex (n) [Cross terms 0, why?] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

40 Convergence Analysis Since J ex (n) is a scalar J ex (n) = E{c H (n)x(n)x H (n)c(n)} = E{trace[c H (n)x(n)x H (n)c(n)]} = E{trace[x(n)x H (n)c(n)c H (n)]} = trace[e{x(n)x H (n)c(n)c H (n)}] Invoking the independence theorem J ex (n) = trace[e{x(n)x H (n)}e{c(n)c H (n)}] = trace[rk(n)] where K(n) =E{c(n)c H (n)} Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

41 Convergence Analysis Thus J(n) = J min + J ex (n) = J min + trace[rk(n)] Recall Set Q H RQ = Ω or R = QΩQ H S(n) Q H K(n)Q where S(n) need not be diagonal. Then K(n) = QQ H K(n)QQ H [since Q 1 = Q H ] = QS(n)Q H Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

42 Convergence Analysis Utilizing R = QΩQ H and K(n) =QS(n)Q H in the excess error expression Since Ω is diagonal J ex (n) = trace[rk(n)] = trace[qωq H QS(n)Q H ] = trace[qωs(n)q H ] = trace[q H QΩS(n)] = trace[ωs(n)] J ex (n) =trace[ωs(n)] = M λ i s i (n) i=1 where s 1 (n), s 2 (n),, s M (n) are the diagonal elements of S(n). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

43 The previously derived recursion Convergence Analysis E{c(n + 1)} =(I µr)e{c(n)} can be modified to yield a recursion on S(n), S(n + 1) =(I µω)s(n)(i µω)+µ 2 J min Ω which for the diagonal elements is s i (n + 1) =(1 µλ i ) 2 s i (n)+µ 2 J min λ i i = 1, 2,, M Suppose J ex (n) converges, then s i (n + 1) =s i (n) s i (n) = (1 µλ i ) 2 s i (n)+µ 2 J min λ i s i (n) = µ 2 J min λ i 1 (1 µλ i ) 2 = µ2 J min λ i 2µλ i µ 2 λ 2 i = µj min 2 µλ i i = 1, 2,, M Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

44 Convergence Analysis Consider again J ex (n) =trace[ωs(n)] = M λ i s i (n) i=1 Taking the limit and utilizing s i (n) = µj min 2 µλ i, lim n J ex(n) =J min The LMS misadjustment is defined as M i=1 µλ i 2 µλ i MA = lim n J ex (n) J min = M i=1 µλ i 2 µλ i Note: A misadjustment at 10% or less is generally considered acceptable. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

45 Example: First-Order Predictor Example This is a one tap predictor ˆx(n) =w(n)x(n 1) Take the underlying process to be a real order one AR process x(n) = ax(n 1)+v(n) µ The weight update is w(n + 1) = w(n)+µx(n 1)e(n) [LMS update for obs. x(n 1)] = w(n)+µx(n 1)[x(n) w(n)x(n 1)] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

46 Example: First-Order Predictor Since and x(n) = ax(n 1)+v(n) [AR model] ˆx(n) = w(n)x(n 1) [one tap predictor] w 0 = a Note that E{x(n 1)e o (n)} = E{x(n 1)v(n)} = 0 proves the optimality Set µ = 0.05 and consider two cases a σ 2 x Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

47 Example: First-Order Predictor Figure: Transient behavior of adaptive first-order predictor weight ŵ(n) for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

48 Example: First-Order Predictor Figure: Transient behavior of adaptive first-order predictor squared error for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

49 Example: First-Order Predictor Figure: Mean-squared error learning curves for an adaptive first-order predictor with varying step-size parameter µ. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

50 Example: First-Order Predictor Consider the expected trajectory of w(n). Recall w(n + 1) = w(n)+µx(n 1)e(n) = w(n) +µx(n 1)[x(n) w(n)x(n 1)] = [1 µx(n 1)x(n 1)]w(n)+µx(n 1)x(n) In this example, x(n) = ax(n 1)+v(n). Substituting in: w(n + 1) = [1 µx(n 1)x(n 1)]w(n)+µx(n 1)[ ax(n 1) +v(n)] = [1 µx(n 1)x(n 1)]w(n) µax(n 1)x(n 1) +µx(n 1)v(n) Taking the expectation and invoking the dependence theorem E{w(n + 1)} =(1 µσx)e{w(n)} µσ 2 xa 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

51 Example: First-Order Predictor Figure: Comparison of experimental results with theory, based on ŵ(n). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

52 Example: First-Order Predictor Next, derive a theoretical expression for J(n). Note that the initial value of J(n) is J(0) =E{(x(0) w(0)x( 1)) 2 } = E{(x(0)) 2 } = σx 2 and the final value is J( ) = J min + J ex = E{(x(n) w(n)x(n 1)) 2 } + J ex = E{(v(n)) 2 } + J ex Note λ 1 = σ 2 x.thus, = σ 2 v + J min µλ 1 2 µλ 1 ( µσ 2 x ) J( ) = σv 2 + σv 2 2 µσx 2 ( ) = σv µσ2 x 2 µσx 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

53 Example: First-Order Predictor And if µ is small J( ) = σ 2 v σ 2 v ( 1 + µσ2 x ( 1 + µσ2 x 2 2 µσx 2 ) ) Putting all the components together: J(n) =[σx 2 σv(1 2 + µ 2 σ2 x)] (1 µσx) 2 2n + σ }{{}}{{} v(1 2 + µ 2 σ2 x) }{{} 1 0 J(0) J( ) J( ) Also, the time constant is τ = 1 2ln(1 µλ 1 ) = 1 2ln(1 µσx) 2 1 2µσx 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

54 Example: First-Order Predictor Figure: Comparison of experimental results with theory for the adaptive predictor, based on the mean-square error for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

55 Example: Adaptive Equalization Example (Adaptive Equalization) Objective: Pass a known signal through an unknown channel to invert the effects the channel and noise have on the signal Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

56 Example: Adaptive Equalization The signal is a Bernoulli sequence x n = { +1 with probability 1/2 1 with probability 1/2 The additive noise is N(0, 0.001) The channel has a raised cosine response { 1 [ ( h n = cos 2π w (n 2))] n = 1, 2, 3 0 otherwise w controls the eigenvalue spread χ(r) h n is symmetric about n = 2andthusintroducesadelayof2 We will use an M = 11 tap filter, which is symmetric about n = 5 Introduce a delay of 5 Thus an overall delay of δ = = 7 is added to the system Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

57 Example: Adaptive Equalization Channel response and Filter response Figure: (a) Impulse response of channel; (b) impulse response of optimum transversal equalizer. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

58 Example: Adaptive Equalization Consider three w values Note the step size is bound by the w = 3.5 case Choose µ = in all cases. µ 2 Mr(0) = 2 11(1.3022) = 0.14 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

59 Example: Adaptive Equalization Figure: Learning curves of the LMS algorithm for an adaptive equalizer with number of taps M = 11, step-size parameter µ = 0.075, and varying eigenvalue spread χ(r). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

60 Example: Adaptive Equalization Ensemble-average impulse response of the adaptive equalizer (after 1000 iterations) for each of four different eigenvalue spreads. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

61 Example: Adaptive Equalization Figure: Learning curves of the LMS algorithm for an adaptive equalizer with the number of taps M = 11, fixed eigenvalue spread, and varying step-size parameter µ. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

62 Example: LMS Directionality Example Directionality of the LMS algorithm The speed of convergence of the LMS algorithm is faster in certain directions in the weight space If the convergence is in the appropriate direction, the convergence can be accelerated by increased eigenvalue spread To investigate this phenomenon, consider the deterministic signal x(n) =A 1 cos(ω 1 n)+a 2 cos(ω 2 n) Even though it is deterministic, a correlation matrix can be determined: R = 1 [ A A2 A 2 1 cos(ω 1)+A 2 2 cos(ω ] 2) 2 A 2 1 cos(ω 1)+A 2 2 cos(ω 2) A A2 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

63 Example: LMS Directionality Determining the eigenvalues and eigenvectors yields λ 1 = 1 2 A2 1 (1 + cos(ω 1)) A2 2 (1 + cos(ω 2)) λ 2 = 1 2 A2 1 (1 cos(ω 1)) A2 2 (1 cos(ω 2)) and q 1 = [ 1 1 ] q 2 = [ 1 1 ] Case 1: A 1 = 1, A 2 = 0.5,ω 1 = 1.2,ω 2 = 0.1 x a (n) =cos(1.2n)+0.5cos(0.1n) and χ(r) =2.9 Case 2: A 1 = 1, A 2 = 0.5,ω 1 = 0.6,ω 2 = 0.23 x b (n) =cos(0.6n)+0.5cos(0.23n) and χ(r) =12.9 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

64 Example: LMS Directionality Since p undefined, set p = λ i q i Then since p = Rw 0, we see (two cases) p = λ 1 q 1 Rw 0 = λ 1 q 1 w 0 = q 1 = p = λ 2 q 2 Rw 0 = λ 2 q 2 w 0 = q 2 = [ 1 1 [ 1 1 ] ] Utilize 200 iterations of the algorithm. [ ] 1 Consider the minimum eigenfilter first, w 0 = q 2 = 1 [ 1 Consider the maximum eigenfilter second, w 0 = q 1 = 1 ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

65 Example: LMS Directionality Convergence of the LMS algorithm, for a deterministic sinusoidal process, along slow eigenvector (i.e., minimum eigenfilter). For input x a (n) (χ(r) =2.9) For input x b (n) (χ(r) =12.9) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

66 Example: LMS Directionality Convergence of the LMS algorithm, for a deterministic sinusoidal process, along fast eigenvector (i.e., maximum eigenfilter). For input x a (n) (χ(r) =2.9) For input x b (n) (χ(r) =12.9) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

67 Normalized LMS Algorithm Observation: The LMS correction is proportional to µx(n)e (n) w(n + 1) =w(n)+µx(n)e (n) If x(n) is large, the LMS update suffers from gradient noise amplification The normalized LMS algorithm seeks to avoid gradient noise amplification The step size is made time varying, µ(n), andoptimizedto minimize the next step error w(n + 1) = w(n)+ 1 2 µ(n)[ J(n)] = w(n)+µ(n)[p Rw(n)] Choose µ(n), such that w(n + 1) produces the minimum MSE, J(n + 1) =E{ e(n + 1) 2 } Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

68 Normalized LMS Algorithm Let (n) J(n) and note e(n + 1) =d(n + 1) w H (n + 1)x(n + 1) Objective: Choose µ(n) such that it minimizes J(n + 1) The optimal step size, µ 0 (n), will be a function of R and (n). Use instantaneous estimates of these values To determine µ 0 (n), expand J(n + 1) J(n + 1) = E{e(n + 1)e (n + 1)} = E{(d(n + 1) w H (n + 1)x(n + 1)) (d (n + 1) x H (n + 1)w(n + 1))} = σd 2 wh (n + 1)p p H w(n + 1) +w H (n + 1)Rw(n + 1) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

69 Normalized LMS Algorithm Now use the fact that w(n + 1) =w(n) 1 2 µ(n) (n) J(n + 1) = σ 2 d wh (n + 1)p p H w(n + 1) +w H (n + 1)Rw(n + 1) = σd [w(n) 2 1 ] H 2 µ(n) (n) p [ p H w(n) 1 ] 2 µ(n) (n) [ + w(n) 1 ] H 2 µ(n) (n) R [w(n) 12 ] µ(n) (n) }{{} = w H (n)rw(n) 1 2 µ(n)wh (n)r (n) 1 2 µ(n) H (n)rw(n)+ 1 4 µ2 (n) H (n)r (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

70 Normalized LMS Algorithm J(n + 1) = σd [w(n) 2 1 ] H 2 µ(n) (n) p [ p H w(n) 1 ] 2 µ(n) (n) Differentiating with respect to µ(n), +w H (n)rw(n) 1 2 µ(n)wh (n)r (n) 1 2 µ(n) H (n)rw(n)+ 1 4 µ2 (n) H (n)r (n) J(n + 1) µ(n) = 1 2 H (n)p ph (n) 1 2 wh R (n) 1 2 H (n)rw(n)+ 1 2 µ(n) H (n)r (n) ( ) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

71 Normalized LMS Algorithm Setting ( ) equal to 0 µ 0 (n) H (n)r (n) = w H (n)r (n) p H (n) + H (n)rw(n) H (n)p µ 0 (n) = wh (n)r (n) p H (n)+ H (n)rw(n) H (n)p H (n)r (n) = [wh (n)r p H ] (n)+ H (n)[rw(n) p] H (n)r (n) = [Rw(n) p]h (n)+ H (n)[rw(n) p] H (n)r (n) = 1 2 H (n) (n)+ 1 2 H (n) (n) H (n)r (n) = H (n) (n) H (n)r (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

72 Normalized LMS Algorithm Using instantaneous estimates Thus ˆR = x(n)x H (n) and ˆp = x(n)d (n) ˆ (n) = 2[ˆRw(n) ˆp] = 2[x(n)x H (n)w(n) x(n)d (n)] = 2[x(n)(ˆd (n) d (n))] = 2x(n)e (n) µ 0 (n) = H (n) (n) H (n)r (n) = = = e(n) 2 x H (n)x(n) e(n) 2 (x H (n)x(n)) 2 1 x H (n)x(n) = 1 x(n) 2 4x H (n)e(n)x(n)e (n) 2x H (n)e(n)x(n)x H (n)2x(n)e (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

73 Normalized LMS Algorithm Result: The NLMS update is w(n + 1) =w(n)+ µ x(n) }{{ 2 x(n)e (n) } µ(n) µ is introduced to scale the update To avoid problems when x(n) 2 0 we add an offset w(n + 1) =w(n)+ µ a + x(n) 2 x(n)e (n) where a > 0 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

74 NLMS Convergnce Objective: Analyze the NLMS convergence w(n + 1) =w(n)+ µ x(n) 2 x(n)e (n) Substituting e(n) =d(n) w H (n)x(n) µ w(n + 1) = w(n)+ x(n) 2 x(n)[d (n) x H (n)w(n)] [ ] = I µ x(n)xh (n) x(n) 2 w(n)+ µ x(n)d (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

75 NLMS Convergnce Objective: Compare the NLMS and LMS algorithms: NLMS: w(n + 1) = [ ] I µ x(n)xh (n) x(n) 2 w(n)+ µ x(n)d (n) x(n) 2 LMS: w(n + 1) =[I µx(n)x H (n)]w(n)+µx(n)d (n) By observation, we see the following corresponding terms LMS NLMS µ µ x(n)x H (n) x(n)x H (n) x(n) 2 x(n)d (n) x(n)d (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

76 NLMS Convergnce LMS NLMS µ µ x(n)x H (n) x(n)x H (n) x(n) 2 x(n)d (n) x(n)d (n) x(n) 2 LMS case: 0 <µ< 2 trace[e{x(n)x H (n)}] = 2 trace[r] guarantees stability By analogy, 0 < µ < trace guarantees stability of the NLMS [ E 2 { }] x(n)x H (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

77 NLMS Convergnce To analyze the bound, make the following approximation E { x(n)x H } (n) x(n) 2 E{x(n)xH (n)} E{ x(n) 2 } Then trace [ E { x(n)x H }] (n) x(n) 2 = trace[e{x(n)xh (n)}] E{ x(n) 2 } = E{trace[x(n)xH (n)]} E{ x(n) 2 } = E{trace[xH (n)x(n)]} E{ x(n) 2 } = E{trace[ x(n) 2 ]} E{ x(n) 2 } = 1 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

78 NLMS Convergnce Thus 0 < µ < trace [ E 2 { }] = 2 x(n)x H (n) x(n) 2 Final Result: The NLMS update will converge if 0 < µ <2 Note: w(n + 1) =w(n)+ µ x(n) x(n) 2 e (n) The NLMS has a simpler convergence criterion than the LMS The NLMS generally converges faster than the LMS algorithm Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79

79 ELEG Statistical Signal Processing Gonzalo R. Arce Variants of the LMS algorithm Department of Electrical and Computer Engineering University of Delaware Newark, DE, Fall 2013 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

80 Standard LMS Algorithm FIR filters: y(n)=w 0 (n)u(n)+w 1 (n)u(n 1) W M 1 (n)u(n M + 1) = M 1 Â w k (n)u(n k=0 k)=w(n) T u(n), n = 0,1,..., Error between filter output y(t) and a desired signal d(t): Update filter parameters according to e(n)=d(n) y(n)=d(n) w(n) T u(n). w(n + 1)= w(n)+ µu(n)e(n). (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

81 1. Normalized LMS Algorithm Modify at time n the parameter vector from w(n) to w(n + 1) Where l will result from d(n)= M 1 Â w i (n + 1)u(n i=0 i). In order to add an extra freedom degree to the adaptation strategy, one constant, µ, controlling the step size will be introduced: w j (n+1)=w j (n)+ µ 1 Â M 1 i=0 (u(n e(n)u(n j)=w µ j(n)+ i))2 ku(n)k 2 e(n)u(n j). (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

82 To overcome the possible numerical difficulties when ku(n)k is close to zero, a constant a > 0 is used: w j (n + 1)=w j (n)+ µ e(n)u(n j) a + ku(n)k2 This is the update used in the Normalized LMS algorithm. The Normalized LMS algorithm converges if 0 < µ < 2 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

83 Comparison of LMS and NLMS (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

84 Comparison of LMS and NLMS The LMS was run with three different step-sizes: µ =[0.075; 0.025; ]. The NLMS was run with four different step-sizes: µ =[1.0; 0.5; 0.1]. The larger the step-size, the faster the convergence. The smaller the step-size, the better the steady state square error. LMS with µ = and NLMS with µ = 0.1 achieved similar average steady state square error. However, NLMS was faster. LMS with µ = and NLMS with µ = 1.0 had a similar convergence speed. However, NLMS achieved a lower steady state average sqare error. Conclusion: NLMS offers better trade-offs than LMS. The computational complexity of NLMS is slightly higher that that of LMS. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

85 2. LMS Algorithm with Time Variable Adaptation Step Heuristics: combine benefits of two different situations: Convergence time constant is small for large µ. Mean-square error in steady state is low for small µ. Initial adaptation µ is kept large, then it is monotonically reduced µ(n)= 1 n + c. Disadvantage for non-stationary data: algorithm will not react to changes in the optimum solution, for large values of n. Variable Step algorithm: where M(n)= w(n + 1)= w(n)+ M(n)u(n)e(n) µ 0 (n) µ 1 (n) µ M 1 (n) Each filter parameter w i ()n is updated using an independent adaptation step µ i (n).. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

86 Comparison of LMS and variable size LMS µ = µ 0.01n+c with c =[10;20;50] (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

87 3. Sign algorithms In high speed communication the time is critical, thus faster adaptation processes is needed. 8 < sgn(a)= : 1; a> 0 0; a= 0 1; a< 0 The Sign algorithm (other names: pilot LMS, or Sign Error) w(n + 1)= w(n)+ µu(n) sgn(e(n)). The Clipped LMS (or Signed Regressor) The Zero forcing LMS (or Sign Sign) w(n + 1)=w(n)+µ sgn(u(n))e(n). w(n + 1)=w(n)+µ sgn(u(n)) sgn(e(n)). The Sign algorithm can be derived as a LMS algorithm for minimizing the Mean absolute error (MAE) criterion J(w)=E[ e(n) ]=E[ d(n) w T u(n) ]. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

88 Properties of sign algorithms Fast computation: if µ is constrained to the form µ = 2 m, only shifting and addition operations are required. Drawback: the update mechanism is degraded, compared to LMS algorithm, by the crude quantization of gradient estimates. The steady state error will increase The convergence rate decreases The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for bps system. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

89 Comparison of LMS and Sign LMS Sign LMS algorithm should be operated at smaller step-sizes to get a similar behavior as standard LMS algorithm. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

90 4. Linear smoothing of LMS gradient estimates Lowpass filtering the noisy gradient Rename the noisy gradient g(n)= ˆ w J g(n)= ˆ w J = 2u(n)e(n) g i (n)= 2e(n)u(n i). Passing the signals g i (n) through low pass filters will prevent the large fluctuations of direction during adaptation process. b i (n)=lpf(g i (n)). The updating process will use the filtered noisy gradient w(n + 1)=w(n) The following versions are well known: µb(n). Averaged LMS algorithm LPF is the filter with impulse response h(m)= N 1 m = 0,1,...,N 1. w(n + 1)=w(n)+ µ N n  e(j)u(j). j=n N+1 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

91 Momentum LMS algorithm LPF is an IIR filter of first order h(0)=1 g,h(1)=gh(0),h(2)=g 2 h(0),... then, b i (n)=lpf (g i (n)) = gb i (n 1)+(1 g)g i (n) b(n)=gb(n 1)+(1 g)g(n) The resulting algorithm can be written as a second order recursion: w(n + 1)= w(n) µb(n) gw(n)=gw(n 1) gµb(n 1) w(n + 1) gw(n)= w(n) gw(n 1) µb(n)+ gµb(n 1) (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

92 w(n + 1) gw(n)= w(n) gw(n 1) µb(n)+gµb(n 1) w(n + 1)= w(n)+g(w(n) w(n 1)) µ(b(n) gb(n 1)) w(n + 1)= w(n)+g(w(n) w(n 1)) µ(1 g)g(n) w(n + 1) = w(n) +g(w(n) w(n 1)) + 2µ(1 g)e(n)u(n) w(n + 1) w(n) = g(w(n) w(n 1)) + µ(1 g)e(n)u(n) Drawback: The convergence rate may decrease. Advantages: The momentum term keeps the algorithm active even in the regions close to minimum. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

93 5. Nonlinear smoothing of LMS gradient estimates Impulsive interference in either d(n) or u(n), drastically degrades LMS performance. Smooth noisy gradient components using a nonlinear filter. The Median LMS Algorithm The adaptation equation can be implemented as w i (n + 1) =w i (n) µ med((e(n)u(n i)),(e(n 1)u(n 1 i)),..., (e(n N)u(n N i))) Smoothing effect in impulsive noise environment is very strong. If the environment is not impulsive, the performances of Median LMS are comparable with those of LMS. Convergence rate is slower than in LMS. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15

ELEG-636: Statistical Signal Processing

ELEG-636: Statistical Signal Processing ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical

More information

Adaptive Filtering Part II

Adaptive Filtering Part II Adaptive Filtering Part II In previous Lecture we saw that: Setting the gradient of cost function equal to zero, we obtain the optimum values of filter coefficients: (Wiener-Hopf equation) Adaptive Filtering,

More information

Ch4: Method of Steepest Descent

Ch4: Method of Steepest Descent Ch4: Method of Steepest Descent The method of steepest descent is recursive in the sense that starting from some initial (arbitrary) value for the tap-weight vector, it improves with the increased number

More information

Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method

Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method 1 Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement Consider the family of variable parameter FIR filters, computing their

More information

Least Mean Square Filtering

Least Mean Square Filtering Least Mean Square Filtering U. B. Desai Slides tex-ed by Bhushan Least Mean Square(LMS) Algorithm Proposed by Widrow (1963) Advantage: Very Robust Only Disadvantage: It takes longer to converge where X(n)

More information

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation .6 The optimum filtering solution is defined by the Wiener-opf equation w o p for which the minimum mean-square error equals J min σ d p w o () Combine Eqs. and () into a single relation: σ d p p 1 w o

More information

Ch5: Least Mean-Square Adaptive Filtering

Ch5: Least Mean-Square Adaptive Filtering Ch5: Least Mean-Square Adaptive Filtering Introduction - approximating steepest-descent algorithm Least-mean-square algorithm Stability and performance of the LMS algorithm Robustness of the LMS algorithm

More information

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method SGN 21006 Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 20 Adaptive filtering:

More information

LMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.

LMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1. Lecture Lecture includes the following: Eigenvalue spread of R and its influence on the convergence speed for the LMS. Variants of the LMS: The Normalized LMS The Leaky LMS The Sign LMS The Echo Canceller

More information

Adaptive Filter Theory

Adaptive Filter Theory 0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline Goals Introduce Wiener-Hopf (WH) equations Introduce application of the steepest descent method to the WH problem Approximation to the Least

More information

Lecture Notes in Adaptive Filters

Lecture Notes in Adaptive Filters Lecture Notes in Adaptive Filters Second Edition Jesper Kjær Nielsen jkn@es.aau.dk Aalborg University Søren Holdt Jensen shj@es.aau.dk Aalborg University Last revised: September 19, 2012 Nielsen, Jesper

More information

Advanced Signal Processing Adaptive Estimation and Filtering

Advanced Signal Processing Adaptive Estimation and Filtering Advanced Signal Processing Adaptive Estimation and Filtering Danilo Mandic room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE?

IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE? IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE? Dariusz Bismor Institute of Automatic Control, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland, e-mail: Dariusz.Bismor@polsl.pl

More information

Adap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on

Adap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on Adap>ve Filters Part 2 (LMS variants and analysis) Sta>s>cal Signal Processing II: Linear Es>ma>on Eric Wan, Ph.D. Fall 2015 1 LMS Variants and Analysis LMS variants Normalized LMS Leaky LMS Filtered-X

More information

Lecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters

Lecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters 1 Lecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters Overview Block Adaptive Filters Iterating LMS under the assumption of small variations in w(n) Approximating the gradient by time

More information

26. Filtering. ECE 830, Spring 2014

26. Filtering. ECE 830, Spring 2014 26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem

More information

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece. Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015 Sergios Theodoridis 1 1 Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling

Adaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling Adaptive Filters - Statistical digital signal processing: in many problems of interest, the signals exhibit some inherent variability plus additive noise we use probabilistic laws to model the statistical

More information

An Adaptive Sensor Array Using an Affine Combination of Two Filters

An Adaptive Sensor Array Using an Affine Combination of Two Filters An Adaptive Sensor Array Using an Affine Combination of Two Filters Tõnu Trump Tallinn University of Technology Department of Radio and Telecommunication Engineering Ehitajate tee 5, 19086 Tallinn Estonia

More information

III.C - Linear Transformations: Optimal Filtering

III.C - Linear Transformations: Optimal Filtering 1 III.C - Linear Transformations: Optimal Filtering FIR Wiener Filter [p. 3] Mean square signal estimation principles [p. 4] Orthogonality principle [p. 7] FIR Wiener filtering concepts [p. 8] Filter coefficients

More information

Ch6-Normalized Least Mean-Square Adaptive Filtering

Ch6-Normalized Least Mean-Square Adaptive Filtering Ch6-Normalized Least Mean-Square Adaptive Filtering LMS Filtering The update equation for the LMS algorithm is wˆ wˆ u ( n 1) ( n) ( n) e ( n) Step size Filter input which is derived from SD as an approximation

More information

Machine Learning and Adaptive Systems. Lectures 3 & 4

Machine Learning and Adaptive Systems. Lectures 3 & 4 ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Chapter 2 Wiener Filtering

Chapter 2 Wiener Filtering Chapter 2 Wiener Filtering Abstract Before moving to the actual adaptive filtering problem, we need to solve the optimum linear filtering problem (particularly, in the mean-square-error sense). We start

More information

CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS. 4.1 Adaptive Filter

CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS. 4.1 Adaptive Filter CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS 4.1 Adaptive Filter Generally in most of the live applications and in the environment information of related incoming information statistic is not available

More information

AdaptiveFilters. GJRE-F Classification : FOR Code:

AdaptiveFilters. GJRE-F Classification : FOR Code: Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms 2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST 1999 A General Class of Nonlinear Normalized Adaptive Filtering Algorithms Sudhakar Kalluri, Member, IEEE, and Gonzalo R. Arce, Senior

More information

Adaptive SP & Machine Intelligence Linear Adaptive Filters and Applications

Adaptive SP & Machine Intelligence Linear Adaptive Filters and Applications Adaptive SP & Machine Intelligence Linear Adaptive Filters and Applications Danilo Mandic room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

On the Stability of the Least-Mean Fourth (LMF) Algorithm

On the Stability of the Least-Mean Fourth (LMF) Algorithm XXI SIMPÓSIO BRASILEIRO DE TELECOMUNICACÕES-SBT 4, 6-9 DE SETEMBRO DE 4, BELÉM, PA On the Stability of the Least-Mean Fourth (LMF) Algorithm Vítor H. Nascimento and José Carlos M. Bermudez + Abstract We

More information

SNR lidar signal improovement by adaptive tecniques

SNR lidar signal improovement by adaptive tecniques SNR lidar signal improovement by adaptive tecniques Aimè Lay-Euaille 1, Antonio V. Scarano Dipartimento di Ingegneria dell Innovazione, Univ. Degli Studi di Lecce via Arnesano, Lecce 1 aime.lay.euaille@unile.it

More information

ELEG 833. Nonlinear Signal Processing

ELEG 833. Nonlinear Signal Processing Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu February 15, 2005 1 INTRODUCTION 1 Introduction Signal processing

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model

5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model 5 Kalman filters 5.1 Scalar Kalman filter 5.1.1 Signal model System model {Y (n)} is an unobservable sequence which is described by the following state or system equation: Y (n) = h(n)y (n 1) + Z(n), n

More information

Adaptive Systems. Winter Term 2017/18. Instructor: Pejman Mowlaee Beikzadehmahaleh. Assistants: Christian Stetco

Adaptive Systems. Winter Term 2017/18. Instructor: Pejman Mowlaee Beikzadehmahaleh. Assistants: Christian Stetco Adaptive Systems Winter Term 2017/18 Instructor: Pejman Mowlaee Beikzadehmahaleh Assistants: Christian Stetco Signal Processing and Speech Communication Laboratory, Inffeldgasse 16c/EG written by Bernhard

More information

Computer exercise 1: Steepest descent

Computer exercise 1: Steepest descent 1 Computer exercise 1: Steepest descent In this computer exercise you will investigate the method of steepest descent using Matlab. The topics covered in this computer exercise are coupled with the material

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides were adapted

More information

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 135 New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks Martin Bouchard,

More information

Linear Optimum Filtering: Statement

Linear Optimum Filtering: Statement Ch2: Wiener Filters Optimal filters for stationary stochastic models are reviewed and derived in this presentation. Contents: Linear optimal filtering Principle of orthogonality Minimum mean squared error

More information

ESE 531: Digital Signal Processing

ESE 531: Digital Signal Processing ESE 531: Digital Signal Processing Lec 22: April 10, 2018 Adaptive Filters Penn ESE 531 Spring 2018 Khanna Lecture Outline! Circular convolution as linear convolution with aliasing! Adaptive Filters Penn

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute

More information

Variable Learning Rate LMS Based Linear Adaptive Inverse Control *

Variable Learning Rate LMS Based Linear Adaptive Inverse Control * ISSN 746-7659, England, UK Journal of Information and Computing Science Vol., No. 3, 6, pp. 39-48 Variable Learning Rate LMS Based Linear Adaptive Inverse Control * Shuying ie, Chengjin Zhang School of

More information

NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters

NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters R. K. Martin and C. R. Johnson, Jr. School of Electrical Engineering Cornell University Ithaca, NY 14853 {frodo,johnson}@ece.cornell.edu

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION Jordan Cheer and Stephen Daley Institute of Sound and Vibration Research,

More information

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis 84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.

More information

Adaptive Systems Homework Assignment 1

Adaptive Systems Homework Assignment 1 Signal Processing and Speech Communication Lab. Graz University of Technology Adaptive Systems Homework Assignment 1 Name(s) Matr.No(s). The analytical part of your homework (your calculation sheets) as

More information

ECE 636: Systems identification

ECE 636: Systems identification ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental

More information

ADAPTIVE FILTER ALGORITHMS. Prepared by Deepa.T, Asst.Prof. /TCE

ADAPTIVE FILTER ALGORITHMS. Prepared by Deepa.T, Asst.Prof. /TCE ADAPTIVE FILTER ALGORITHMS Prepared by Deepa.T, Asst.Prof. /TCE Equalization Techniques Fig.3 Classification of equalizers Equalizer Techniques Linear transversal equalizer (LTE, made up of tapped delay

More information

Assesment of the efficiency of the LMS algorithm based on spectral information

Assesment of the efficiency of the LMS algorithm based on spectral information Assesment of the efficiency of the algorithm based on spectral information (Invited Paper) Aaron Flores and Bernard Widrow ISL, Department of Electrical Engineering, Stanford University, Stanford CA, USA

More information

MMSE System Identification, Gradient Descent, and the Least Mean Squares Algorithm

MMSE System Identification, Gradient Descent, and the Least Mean Squares Algorithm MMSE System Identification, Gradient Descent, and the Least Mean Squares Algorithm D.R. Brown III WPI WPI D.R. Brown III 1 / 19 Problem Statement and Assumptions known input x[n] unknown system (assumed

More information

EEL 6502: Adaptive Signal Processing Homework #4 (LMS)

EEL 6502: Adaptive Signal Processing Homework #4 (LMS) EEL 6502: Adaptive Signal Processing Homework #4 (LMS) Name: Jo, Youngho Cyhio@ufl.edu) WID: 58434260 The purpose of this homework is to compare the performance between Prediction Error Filter and LMS

More information

ACTIVE noise control (ANC) ([1], [2]) is an established

ACTIVE noise control (ANC) ([1], [2]) is an established 286 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 2, MARCH 2005 Convergence Analysis of a Complex LMS Algorithm With Tonal Reference Signals Mrityunjoy Chakraborty, Senior Member, IEEE,

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

Regular paper. Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation

Regular paper. Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation Ibraheem Kasim Ibraheem 1,* Regular paper Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation System identification is an exceptionally expansive topic and of remarkable

More information

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS Gary A. Ybarra and S.T. Alexander Center for Communications and Signal Processing Electrical and Computer Engineering Department North

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

13. Power Spectrum. For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if.

13. Power Spectrum. For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if. For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if jt X ( ) = xte ( ) dt, (3-) then X ( ) represents its energy spectrum. his follows from Parseval

More information

FSAN/ELEG815: Statistical Learning

FSAN/ELEG815: Statistical Learning : Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA Outline of the Course 1. Review of Probability 2. Stationary

More information

Convergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization

Convergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization Convergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization 1 Shihab Jimaa Khalifa University of Science, Technology and Research (KUSTAR) Faculty

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science : Discrete-Time Signal Processing

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science : Discrete-Time Signal Processing Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.34: Discrete-Time Signal Processing OpenCourseWare 006 ecture 8 Periodogram Reading: Sections 0.6 and 0.7

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Lecture 19 IIR Filters

Lecture 19 IIR Filters Lecture 19 IIR Filters Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/5/10 1 General IIR Difference Equation IIR system: infinite-impulse response system The most general class

More information

Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications

Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications SHENGKUI ZHAO School of Computer Engineering A thesis submitted to the Nanyang Technological University in partial fulfillment

More information

Binary Step Size Variations of LMS and NLMS

Binary Step Size Variations of LMS and NLMS IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume, Issue 4 (May. Jun. 013), PP 07-13 e-issn: 319 400, p-issn No. : 319 4197 Binary Step Size Variations of LMS and NLMS C Mohan Rao 1, Dr. B

More information

BLOCK LMS ADAPTIVE FILTER WITH DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS

BLOCK LMS ADAPTIVE FILTER WITH DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS BLOCK LMS ADAPTIVE FILTER WIT DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS S. Olmos, L. Sörnmo, P. Laguna Dept. of Electroscience, Lund University, Sweden Dept. of Electronics Eng. and Communications,

More information

Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification

Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification American J. of Engineering and Applied Sciences 3 (4): 710-717, 010 ISSN 1941-700 010 Science Publications Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

On the Use of A Priori Knowledge in Adaptive Inverse Control

On the Use of A Priori Knowledge in Adaptive Inverse Control 54 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS PART I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 47, NO 1, JANUARY 2000 On the Use of A Priori Knowledge in Adaptive Inverse Control August Kaelin, Member,

More information

A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases

A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases Phil Schniter Nov. 0, 001 Abstract In this report we combine the approach of Yousef and Sayed [1] with that of Rupp and Sayed

More information

ECE534, Spring 2018: Solutions for Problem Set #5

ECE534, Spring 2018: Solutions for Problem Set #5 ECE534, Spring 08: s for Problem Set #5 Mean Value and Autocorrelation Functions Consider a random process X(t) such that (i) X(t) ± (ii) The number of zero crossings, N(t), in the interval (0, t) is described

More information

Wiener Filtering. EE264: Lecture 12

Wiener Filtering. EE264: Lecture 12 EE264: Lecture 2 Wiener Filtering In this lecture we will take a different view of filtering. Previously, we have depended on frequency-domain specifications to make some sort of LP/ BP/ HP/ BS filter,

More information

Adaptive Stereo Acoustic Echo Cancelation in reverberant environments. Amos Schreibman

Adaptive Stereo Acoustic Echo Cancelation in reverberant environments. Amos Schreibman Adaptive Stereo Acoustic Echo Cancelation in reverberant environments Amos Schreibman Adaptive Stereo Acoustic Echo Cancelation in reverberant environments Research Thesis As Partial Fulfillment of the

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

FSAN815/ELEG815: Foundations of Statistical Learning

FSAN815/ELEG815: Foundations of Statistical Learning FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction

More information

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity Architecture We consider

More information

Optimal and Adaptive Filtering

Optimal and Adaptive Filtering Optimal and Adaptive Filtering Murat Üney M.Uney@ed.ac.uk Institute for Digital Communications (IDCOM) 26/06/2017 Murat Üney (IDCOM) Optimal and Adaptive Filtering 26/06/2017 1 / 69 Table of Contents 1

More information

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C. Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Machine Learning and Adaptive Systems. Lectures 5 & 6

Machine Learning and Adaptive Systems. Lectures 5 & 6 ECE656- Lectures 5 & 6, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 c. Performance Learning-LMS Algorithm (Widrow 1960) The iterative procedure in steepest

More information

IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN ABSTRACT

IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN ABSTRACT IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN Jared K. Thomas Brigham Young University Department of Mechanical Engineering ABSTRACT The application of active noise control (ANC)

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Equalization Prof. David Johns University of Toronto (

Equalization Prof. David Johns University of Toronto ( Equalization Prof. David Johns (johns@eecg.toronto.edu) (www.eecg.toronto.edu/~johns) slide 1 of 70 Adaptive Filter Introduction Adaptive filters are used in: Noise cancellation Echo cancellation Sinusoidal

More information