ELEG-636: Statistical Signal Processing
|
|
- Margery Abigail Bates
- 6 years ago
- Views:
Transcription
1 ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
2 Course Objectives & Structure Course Objectives & Structure Objective: Given a discrete time sequence {x(n)}, develop Statistical and spectral signal representation Filtering, prediction, and system identification algorithms Optimization methods that are Statistical Adaptive Course Structure: Weekly lectures [notes: arce] Periodic homework (theory & Matlab implementations) [15%] Midterm & Final examinations [85%] Textbook: Haykin, Adaptive Filter Theory. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
3 Course Objectives & Structure Course Objectives & Structure Broad Applications in Communications, Imaging, Sensors. Emerging application in Brain-imaging techniques Brain-machine interfaces, Implantable devices. Neurofeedback presents real-time physiological signals from MRIs in a visual or auditory form to provide information about brain activity. These signals are used to train the patient to alter neural activity in a desired direction. Traditionally, feedback using EEGs or other mechanisms has not focused on the brain because the resolution is not good enough. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
4 Motivation Adaptive Optimization and Filtering Methods Motivation Adaptive optimization and filtering methods are appropriate, advantageous, or necessary when: Signal statistics are not known aprioriand must be learned from observed or representative samples Signal statistics evolve over time Time or computational restrictions dictate that simple, if repetitive, operations be employed rather than solving more complex, closed form expressions To be considered are the following algorithms: Steepest Descent (SD) deterministic Least Means Squared (LMS) stochastic Recursive Least Squares (RLS) deterministic Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
5 Steepest Descent Definition (Steepest Descent (SD)) Steepest descent, also known as gradient descent, it is an iterative technique for finding the local minimum of a function. Approach: Given an arbitrary starting point, the current location (value) is moved in steps proportional to the negatives of the gradient at the current point. SD is an old, deterministic method, that is the basis for stochastic gradient based methods SD is a feedback approach to finding local minimum of an error performance surface The error surface must be known a priori In the MSE case, SD converges converges to the optimal solution, w 0 = R 1 p, without inverting a matrix Question: Why in the MSE case does this converge to the global minimum rather than a local minimum? Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
6 Steepest Descent Example Consider a well structured cost function with a single minimum. The optimization proceeds as follows: Contour plot showing that evolution of the optimization Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
7 Steepest Descent Example Consider a gradient ascent example in which there are multiple minima/maxima Surface plot showing the multiple minima and maxima Contour plot illustrating that the final result depends on starting value Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
8 Steepest Descent To derive the approach, consider the FIR case: {x(n)} the WSS input samples {d(n)} the WSS desired output {ˆd(n)} the estimate of the desired signal given by ˆd(n) =w H (n)x(n) where x(n) =[x(n), x(n 1),, x(n M + 1)] T w(n) =[w 0 (n), w 1 (n),, w M-1 (n)] T [obs. vector] [time indexed filter coefs.] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
9 Steepest Descent Then similarly to previously considered cases and the MSE at time n is where σ 2 d e(n) =d(n) ˆd(n) =d(n) w H (n)x(n) J(n) = E{ e(n) 2 } = σ 2 d wh (n)p p H w(n)+w H (n)rw(n) variance of desired signal p cross-correlation between x(n) and d(n) R correlation matrix of x(n) Note: The weight vector and cost function are time indexed (functions of time) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
10 Steepest Descent When w(n) is set to the (optimal) Wiener solution, w(n) =w 0 = R 1 p and J(n) =J min = σd 2 ph w 0 Use the method of steepest descent to iteratively find w 0. The optimal result is achieved since the cost function is a second order polynomial with a single unique minimum Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
11 Steepest Descent Example Let M = 2. The MSE is a bowl shaped surface, which is a function of the 2-D space weight vector w(n) J(w) w 2 J w 1 (J ) J w 2 w 2 J(w) w 0 w 1 w 0 w w 1 Surface Plot Contour Plot Imagine dropping a marble at any point on the bowl-shaped surface. The ball will reach the minimum point by going through the path of steepest descent. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79 1
12 Steepest Descent Observation: Set the direction of filter update as: J(n) Resulting Update: or, since J(n) = 2p + 2Rw(n) w(n + 1) =w(n)+ 1 2 µ[ J(n)] w(n + 1) =w(n)+µ[p Rw(n)] n = 0, 1, 2, where w(0) =0 (or other appropriate value) and µ is the step size Observation: SD uses feedback, which makes it possible for the system to be unstable Bounds on the step size guaranteeing stability can be determined with respect to the eigenvalues of R (Widrow, 1970) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
13 Convergence Analysis Convergence Analysis Define the error vector for the tap weights as c(n) =w(n) w 0 Then using p = Rw 0 in the update, w(n + 1) = w(n)+ µ[p Rw(n)] = w(n)+µ[rw 0 Rw(n)] = w(n) µrc(n) and subtracting w 0 from both sides w(n + 1) w 0 = w(n) w 0 µrc(n) c(n + 1) = c(n) µrc(n) = [I µr]c(n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
14 Convergence Analysis Using the Unitary Similarity Transform R = QΩQ H we have c(n + 1) = [I µr]c(n) = [I µqωq H ]c(n) Q H c(n + 1) = [Q H µq H QΩQ H ]c(n) = [I µω]q H c(n) ( ) Define the transformed coefficients as v(n) = Q H c(n) = Q H (w(n) w 0 ) Then ( ) becomes v(n + 1) =[I µω]v(n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
15 Convergence Analysis Consider the initial condition of v(n) v(0) = Q H (w(0) w 0 ) = Q H w 0 [if w(0) =0] Consider the k th term (mode) in v(n + 1) =[I µω]v(n) Note [I µω] is diagonal Thus all modes are independently updated The update for the k th term can be written as v k (n + 1) =(1 µλ k )v k (n) k = 1, 2,, M or using recursion v k (n) =(1 µλ k ) n v k (0) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
16 Convergence Analysis Observation: Conversion to the optimal solution requires lim n = w 0 lim c(n) n = lim w(n) w 0 = 0 n lim v(n) = lim Q H c(n) =0 n n lim v k (n) = 0 k = 1, 2,, M ( ) n Result: According to the recursion the limit in ( ) holds if and only if v k (n) =(1 µλ k ) n v k (0) 1 µλ k < 1 for all k Thus since the eigenvalues are nonnegative, 0 <µλ max < 2, or 0 <µ< 2 λ max Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
17 Convergence Analysis Observation: The k th mode has geometric decay v k (n) =(1 µλ k ) n v k (0) The rate of decay it is characterized by the time it takes to decay to e 1 of the initial value Let τ k denote this time for the k th mode v k (τ k ) = (1 µλ k ) τ k v k (0) =e 1 v k (0) e 1 = (1 µλ k ) τ k τ k = Result: The overall rate of decay is 1 ln(1 µλ k ) 1 for µ 1 µλ k 1 ln(1 µλ max ) τ 1 ln(1 µλ min ) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
18 Convergence Analysis Example Consider the typical behavior of a single mode Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
19 Error Analysis Convergence Analysis Recall that J(n) = J min +(w(n) w 0 ) H R(w(n) w 0 ) = J min +(w(n) w 0 ) H QΩQ H (w(n) w 0 ) = J min + v(n) H Ωv(n) = M J min + λ k v k (n) 2 [sub in v k (n) =(1 µλ k ) n v k (0)] = J min + k=1 M λ k (1 µλ k ) 2n v k (0) 2 k=1 Result: If 0 <µ< 2 λ max, then lim n J(n) =J min Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
20 Example: Predictor Example Consider a two tap predictor for real valued input Analyzed the effects of the following cases: Varying the eigenvalue spread χ(r) = λ max λ min while keeping µ fixed Varying µ and keeping the eigenvalue spread χ(r) fixed λ Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
21 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =1.22 Small eigenvalue spread modes converge at a similar rate Eigenvalue spread: χ(r) =3 Moderate eigenvalue spread modes converge at moderately similar rates Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
22 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =10 Large eigenvalue spread modes converge at different rates Eigenvalue spread: χ(r) =100 Very large eigenvalue spread modes converge at very different rates Principle direction convergence is fastest Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
23 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =1.22 Small eigenvalue spread modes converge at a similar rate Eigenvalue spread: χ(r) =3 Moderate eigenvalue spread modes converge at moderately similar rates Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
24 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] for step-size µ = 0.3 Eigenvalue spread: χ(r) =10 Large eigenvalue spread modes converge at different rates Eigenvalue spread: χ(r) =100 Very large eigenvalue spread modes converge at very different rates Principle direction convergence is fastest Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
25 Example: Predictor Learning curves of steepest-descent algorithm with step-size parameter µ = 0.3 and varying eigenvalue spread. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
26 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [v 1 (n), v 2 (n)] with χ(r) =10 and varying step sizes Step sizes: µ = 0.3 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
27 Example: Predictor SD loci plots (with shown J(n) contours) as a function of [w 1 (n), w 2 (n)] with χ(r) =10 and varying step sizes Step sizes: µ = 0.3 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
28 Example: Predictor Example Consider a system identification problem {x(n)} w(n) system d ˆ( n ) _ + d(n) e(n) Suppose M = 2and R x = [ ] P = [ ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
29 Example: Predictor From eigen analysis we have λ 1 = 1.8,λ 2 = 0.2 µ< also and Also, q 1 = 1 2 [ 1 1 ] Q = 1 2 [ w 0 = R 1 p = q 2 = 1 2 [ 1 1 [ ] ] ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
30 Example: Predictor Thus v(n) =Q H [w(n) w 0 ] Noting that v(0) = Q H w 0 = 1 2 [ ][ ] = [ ] and v 1 (n) =(1 µ(1.8)) n 0.51 v 2 (n) =(1 µ(0.2)) n 1.06 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
31 Example: Predictor SD convergence properties for two µ values Step sizes: µ = 0.5 This is over damped slow convergence Step sizes: µ = 1 This is under damped fast (erratic) convergence Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
32 Least Mean Squares (LMS) Least Mean Squares (LMS) Definition (Least Mean Squares (LMS) Algorithm) Motivation: The error performance surface used by the SD method is not always known apriori Solution: Use estimated values. We will use the following instantaneous estimates ˆR(n) =x(n)x H (n) ˆp(n) =x(n)d (n) Result: The estimates are RVs and thus this leads to a stochastic optimization Historical Note: Invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
33 Least Mean Squares (LMS) Recall the SD update w(n + 1) =w(n)+ 1 2 µ[ (J(n))] where the gradient of the error surface at w(n) was shown to be Using the instantaneous estimates, (J(n)) = 2p + 2Rw(n) ˆ (J(n)) = 2x(n)d (n)+2x(n)x H (n)w(n) = 2x(n)[d (n) x H (n)w(n)] = 2x(n)[d (n) d ˆ (n)] = 2x(n)e (n) where e (n) is the complex conjugate of the estimate error. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
34 Least Mean Squares (LMS) Utilizing (J(n)) = 2x(n)e (n) in the update w(n + 1) = w(n)+ 1 2 µ[ (J(n))] = w(n)+µx(n)e (n) [LMS Update] The LMS algorithm belongs to the family of stochastic gradient algorithms The update is extremely simple Although the instantaneous estimates may have large variance, the LMS algorithm is recursive and effectively averages these estimates The simplicity and good performance of the LMS algorithm make it the benchmark against which other optimization algorithms are judged Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
35 Convergence Analysis Convergence Analysis Independence Theorem The following conditions hold: 1 The vectors x(1), x(2),, x(n) are statistically independent 2 x(n) is independent of d(1), d(2),, d(n 1) 3 d(n) is statistically dependent on x(n), but is independent of d(1), d(2),, d(n 1) 4 x(n) and d(n) are mutually Gaussian The independence theorem is invoked in the LMS algorithm analysis The independence theorem is justified in some cases, e.g., beamforming where we receive independent vector observations In other cases it is not well justified, but allows the analysis to proceeds (i.e., when all else fails, invoke simplifying assumptions) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
36 Convergence Analysis We will invoke the independence theorem to show that w(n) converges to the optimal solution in the mean To prove this, evaluate the update lim n E{w(n)} = w 0 w(n + 1) = w(n)+µx(n)e (n) w(n + 1) w 0 = w(n) w 0 + µx(n)e (n) c(n + 1) = c(n)+µx(n)(d (n) x H (n)w(n)) = c(n)+µx(n)d (n) µx(n)x H (n)[w(n) w 0 + w 0 ] = c(n)+µx(n)d (n) µx(n)x H (n)c(n) µx(n)x H (n)w 0 = [I µx(n)x H (n)]c(n)+µx(n)[d (n) x H (n)w 0 ] = [I µx(n)x H (n)]c(n)+µx(n)e 0 (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
37 Convergence Analysis Take the expectation of the update noting that w(n) is based on past inputs and desired values w(n), and consequently c(n)), are independent of x(n) (Independence Theorem) Thus c(n + 1) = [I µx(n)x H (n)]c(n)+µx(n)e 0 (n) E{c(n + 1)} = (I µr)e{c(n)} + µ E{x(n)e0 }{{ (n)} } =0 why? = (I µr)e{c(n)} Using arguments similar to the SD case we have or equivalently 2 lim E{c(n)} = 0 if 0 <µ< n λ max lim E{w(n)} = w 0 if 0 <µ< 2 n λ max Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
38 Noting that M i=1 λ i = trace[r] Convergence Analysis λ max trace[r] =Mr(0) =Mσ 2 x Thus a more conservative bound (and one easier to determine) is 0 <µ< 2 Mσ 2 x Convergence in the mean lim n E{w(n)} = w 0 is a weak condition that says nothing about the variance, which may even grow A stronger condition is convergence in the mean square, which says lim n E{ c(n) 2 } = constant Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
39 Convergence Analysis Proving convergence in the mean square is equivalent to showing that lim J(n) = lim n n E{ e(n) 2 } = constant To evaluate the limit, write e(n) as Thus e(n) = d(n) ˆd(n) =d(n) w H (n)x(n) = d(n) w H 0 x(n) [wh (n) w H 0 ]x(n) = e 0 (n) c H (n)x(n) J(n) = E{ e(n) 2 } {( )( )} = E e 0 (n) c H (n)x(n) e0 (n) xh (n)c(n) = J min + E{c H (n)x(n)x H (n)c(n)} }{{} J ex (n) = J min + J ex (n) [Cross terms 0, why?] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
40 Convergence Analysis Since J ex (n) is a scalar J ex (n) = E{c H (n)x(n)x H (n)c(n)} = E{trace[c H (n)x(n)x H (n)c(n)]} = E{trace[x(n)x H (n)c(n)c H (n)]} = trace[e{x(n)x H (n)c(n)c H (n)}] Invoking the independence theorem J ex (n) = trace[e{x(n)x H (n)}e{c(n)c H (n)}] = trace[rk(n)] where K(n) =E{c(n)c H (n)} Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
41 Convergence Analysis Thus J(n) = J min + J ex (n) = J min + trace[rk(n)] Recall Set Q H RQ = Ω or R = QΩQ H S(n) Q H K(n)Q where S(n) need not be diagonal. Then K(n) = QQ H K(n)QQ H [since Q 1 = Q H ] = QS(n)Q H Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
42 Convergence Analysis Utilizing R = QΩQ H and K(n) =QS(n)Q H in the excess error expression Since Ω is diagonal J ex (n) = trace[rk(n)] = trace[qωq H QS(n)Q H ] = trace[qωs(n)q H ] = trace[q H QΩS(n)] = trace[ωs(n)] J ex (n) =trace[ωs(n)] = M λ i s i (n) i=1 where s 1 (n), s 2 (n),, s M (n) are the diagonal elements of S(n). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
43 The previously derived recursion Convergence Analysis E{c(n + 1)} =(I µr)e{c(n)} can be modified to yield a recursion on S(n), S(n + 1) =(I µω)s(n)(i µω)+µ 2 J min Ω which for the diagonal elements is s i (n + 1) =(1 µλ i ) 2 s i (n)+µ 2 J min λ i i = 1, 2,, M Suppose J ex (n) converges, then s i (n + 1) =s i (n) s i (n) = (1 µλ i ) 2 s i (n)+µ 2 J min λ i s i (n) = µ 2 J min λ i 1 (1 µλ i ) 2 = µ2 J min λ i 2µλ i µ 2 λ 2 i = µj min 2 µλ i i = 1, 2,, M Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
44 Convergence Analysis Consider again J ex (n) =trace[ωs(n)] = M λ i s i (n) i=1 Taking the limit and utilizing s i (n) = µj min 2 µλ i, lim n J ex(n) =J min The LMS misadjustment is defined as M i=1 µλ i 2 µλ i MA = lim n J ex (n) J min = M i=1 µλ i 2 µλ i Note: A misadjustment at 10% or less is generally considered acceptable. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
45 Example: First-Order Predictor Example This is a one tap predictor ˆx(n) =w(n)x(n 1) Take the underlying process to be a real order one AR process x(n) = ax(n 1)+v(n) µ The weight update is w(n + 1) = w(n)+µx(n 1)e(n) [LMS update for obs. x(n 1)] = w(n)+µx(n 1)[x(n) w(n)x(n 1)] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
46 Example: First-Order Predictor Since and x(n) = ax(n 1)+v(n) [AR model] ˆx(n) = w(n)x(n 1) [one tap predictor] w 0 = a Note that E{x(n 1)e o (n)} = E{x(n 1)v(n)} = 0 proves the optimality Set µ = 0.05 and consider two cases a σ 2 x Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
47 Example: First-Order Predictor Figure: Transient behavior of adaptive first-order predictor weight ŵ(n) for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
48 Example: First-Order Predictor Figure: Transient behavior of adaptive first-order predictor squared error for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
49 Example: First-Order Predictor Figure: Mean-squared error learning curves for an adaptive first-order predictor with varying step-size parameter µ. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
50 Example: First-Order Predictor Consider the expected trajectory of w(n). Recall w(n + 1) = w(n)+µx(n 1)e(n) = w(n) +µx(n 1)[x(n) w(n)x(n 1)] = [1 µx(n 1)x(n 1)]w(n)+µx(n 1)x(n) In this example, x(n) = ax(n 1)+v(n). Substituting in: w(n + 1) = [1 µx(n 1)x(n 1)]w(n)+µx(n 1)[ ax(n 1) +v(n)] = [1 µx(n 1)x(n 1)]w(n) µax(n 1)x(n 1) +µx(n 1)v(n) Taking the expectation and invoking the dependence theorem E{w(n + 1)} =(1 µσx)e{w(n)} µσ 2 xa 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
51 Example: First-Order Predictor Figure: Comparison of experimental results with theory, based on ŵ(n). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
52 Example: First-Order Predictor Next, derive a theoretical expression for J(n). Note that the initial value of J(n) is J(0) =E{(x(0) w(0)x( 1)) 2 } = E{(x(0)) 2 } = σx 2 and the final value is J( ) = J min + J ex = E{(x(n) w(n)x(n 1)) 2 } + J ex = E{(v(n)) 2 } + J ex Note λ 1 = σ 2 x.thus, = σ 2 v + J min µλ 1 2 µλ 1 ( µσ 2 x ) J( ) = σv 2 + σv 2 2 µσx 2 ( ) = σv µσ2 x 2 µσx 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
53 Example: First-Order Predictor And if µ is small J( ) = σ 2 v σ 2 v ( 1 + µσ2 x ( 1 + µσ2 x 2 2 µσx 2 ) ) Putting all the components together: J(n) =[σx 2 σv(1 2 + µ 2 σ2 x)] (1 µσx) 2 2n + σ }{{}}{{} v(1 2 + µ 2 σ2 x) }{{} 1 0 J(0) J( ) J( ) Also, the time constant is τ = 1 2ln(1 µλ 1 ) = 1 2ln(1 µσx) 2 1 2µσx 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
54 Example: First-Order Predictor Figure: Comparison of experimental results with theory for the adaptive predictor, based on the mean-square error for µ = Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
55 Example: Adaptive Equalization Example (Adaptive Equalization) Objective: Pass a known signal through an unknown channel to invert the effects the channel and noise have on the signal Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
56 Example: Adaptive Equalization The signal is a Bernoulli sequence x n = { +1 with probability 1/2 1 with probability 1/2 The additive noise is N(0, 0.001) The channel has a raised cosine response { 1 [ ( h n = cos 2π w (n 2))] n = 1, 2, 3 0 otherwise w controls the eigenvalue spread χ(r) h n is symmetric about n = 2andthusintroducesadelayof2 We will use an M = 11 tap filter, which is symmetric about n = 5 Introduce a delay of 5 Thus an overall delay of δ = = 7 is added to the system Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
57 Example: Adaptive Equalization Channel response and Filter response Figure: (a) Impulse response of channel; (b) impulse response of optimum transversal equalizer. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
58 Example: Adaptive Equalization Consider three w values Note the step size is bound by the w = 3.5 case Choose µ = in all cases. µ 2 Mr(0) = 2 11(1.3022) = 0.14 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
59 Example: Adaptive Equalization Figure: Learning curves of the LMS algorithm for an adaptive equalizer with number of taps M = 11, step-size parameter µ = 0.075, and varying eigenvalue spread χ(r). Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
60 Example: Adaptive Equalization Ensemble-average impulse response of the adaptive equalizer (after 1000 iterations) for each of four different eigenvalue spreads. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
61 Example: Adaptive Equalization Figure: Learning curves of the LMS algorithm for an adaptive equalizer with the number of taps M = 11, fixed eigenvalue spread, and varying step-size parameter µ. Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
62 Example: LMS Directionality Example Directionality of the LMS algorithm The speed of convergence of the LMS algorithm is faster in certain directions in the weight space If the convergence is in the appropriate direction, the convergence can be accelerated by increased eigenvalue spread To investigate this phenomenon, consider the deterministic signal x(n) =A 1 cos(ω 1 n)+a 2 cos(ω 2 n) Even though it is deterministic, a correlation matrix can be determined: R = 1 [ A A2 A 2 1 cos(ω 1)+A 2 2 cos(ω ] 2) 2 A 2 1 cos(ω 1)+A 2 2 cos(ω 2) A A2 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
63 Example: LMS Directionality Determining the eigenvalues and eigenvectors yields λ 1 = 1 2 A2 1 (1 + cos(ω 1)) A2 2 (1 + cos(ω 2)) λ 2 = 1 2 A2 1 (1 cos(ω 1)) A2 2 (1 cos(ω 2)) and q 1 = [ 1 1 ] q 2 = [ 1 1 ] Case 1: A 1 = 1, A 2 = 0.5,ω 1 = 1.2,ω 2 = 0.1 x a (n) =cos(1.2n)+0.5cos(0.1n) and χ(r) =2.9 Case 2: A 1 = 1, A 2 = 0.5,ω 1 = 0.6,ω 2 = 0.23 x b (n) =cos(0.6n)+0.5cos(0.23n) and χ(r) =12.9 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
64 Example: LMS Directionality Since p undefined, set p = λ i q i Then since p = Rw 0, we see (two cases) p = λ 1 q 1 Rw 0 = λ 1 q 1 w 0 = q 1 = p = λ 2 q 2 Rw 0 = λ 2 q 2 w 0 = q 2 = [ 1 1 [ 1 1 ] ] Utilize 200 iterations of the algorithm. [ ] 1 Consider the minimum eigenfilter first, w 0 = q 2 = 1 [ 1 Consider the maximum eigenfilter second, w 0 = q 1 = 1 ] Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
65 Example: LMS Directionality Convergence of the LMS algorithm, for a deterministic sinusoidal process, along slow eigenvector (i.e., minimum eigenfilter). For input x a (n) (χ(r) =2.9) For input x b (n) (χ(r) =12.9) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
66 Example: LMS Directionality Convergence of the LMS algorithm, for a deterministic sinusoidal process, along fast eigenvector (i.e., maximum eigenfilter). For input x a (n) (χ(r) =2.9) For input x b (n) (χ(r) =12.9) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
67 Normalized LMS Algorithm Observation: The LMS correction is proportional to µx(n)e (n) w(n + 1) =w(n)+µx(n)e (n) If x(n) is large, the LMS update suffers from gradient noise amplification The normalized LMS algorithm seeks to avoid gradient noise amplification The step size is made time varying, µ(n), andoptimizedto minimize the next step error w(n + 1) = w(n)+ 1 2 µ(n)[ J(n)] = w(n)+µ(n)[p Rw(n)] Choose µ(n), such that w(n + 1) produces the minimum MSE, J(n + 1) =E{ e(n + 1) 2 } Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
68 Normalized LMS Algorithm Let (n) J(n) and note e(n + 1) =d(n + 1) w H (n + 1)x(n + 1) Objective: Choose µ(n) such that it minimizes J(n + 1) The optimal step size, µ 0 (n), will be a function of R and (n). Use instantaneous estimates of these values To determine µ 0 (n), expand J(n + 1) J(n + 1) = E{e(n + 1)e (n + 1)} = E{(d(n + 1) w H (n + 1)x(n + 1)) (d (n + 1) x H (n + 1)w(n + 1))} = σd 2 wh (n + 1)p p H w(n + 1) +w H (n + 1)Rw(n + 1) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
69 Normalized LMS Algorithm Now use the fact that w(n + 1) =w(n) 1 2 µ(n) (n) J(n + 1) = σ 2 d wh (n + 1)p p H w(n + 1) +w H (n + 1)Rw(n + 1) = σd [w(n) 2 1 ] H 2 µ(n) (n) p [ p H w(n) 1 ] 2 µ(n) (n) [ + w(n) 1 ] H 2 µ(n) (n) R [w(n) 12 ] µ(n) (n) }{{} = w H (n)rw(n) 1 2 µ(n)wh (n)r (n) 1 2 µ(n) H (n)rw(n)+ 1 4 µ2 (n) H (n)r (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
70 Normalized LMS Algorithm J(n + 1) = σd [w(n) 2 1 ] H 2 µ(n) (n) p [ p H w(n) 1 ] 2 µ(n) (n) Differentiating with respect to µ(n), +w H (n)rw(n) 1 2 µ(n)wh (n)r (n) 1 2 µ(n) H (n)rw(n)+ 1 4 µ2 (n) H (n)r (n) J(n + 1) µ(n) = 1 2 H (n)p ph (n) 1 2 wh R (n) 1 2 H (n)rw(n)+ 1 2 µ(n) H (n)r (n) ( ) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
71 Normalized LMS Algorithm Setting ( ) equal to 0 µ 0 (n) H (n)r (n) = w H (n)r (n) p H (n) + H (n)rw(n) H (n)p µ 0 (n) = wh (n)r (n) p H (n)+ H (n)rw(n) H (n)p H (n)r (n) = [wh (n)r p H ] (n)+ H (n)[rw(n) p] H (n)r (n) = [Rw(n) p]h (n)+ H (n)[rw(n) p] H (n)r (n) = 1 2 H (n) (n)+ 1 2 H (n) (n) H (n)r (n) = H (n) (n) H (n)r (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
72 Normalized LMS Algorithm Using instantaneous estimates Thus ˆR = x(n)x H (n) and ˆp = x(n)d (n) ˆ (n) = 2[ˆRw(n) ˆp] = 2[x(n)x H (n)w(n) x(n)d (n)] = 2[x(n)(ˆd (n) d (n))] = 2x(n)e (n) µ 0 (n) = H (n) (n) H (n)r (n) = = = e(n) 2 x H (n)x(n) e(n) 2 (x H (n)x(n)) 2 1 x H (n)x(n) = 1 x(n) 2 4x H (n)e(n)x(n)e (n) 2x H (n)e(n)x(n)x H (n)2x(n)e (n) Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
73 Normalized LMS Algorithm Result: The NLMS update is w(n + 1) =w(n)+ µ x(n) }{{ 2 x(n)e (n) } µ(n) µ is introduced to scale the update To avoid problems when x(n) 2 0 we add an offset w(n + 1) =w(n)+ µ a + x(n) 2 x(n)e (n) where a > 0 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
74 NLMS Convergnce Objective: Analyze the NLMS convergence w(n + 1) =w(n)+ µ x(n) 2 x(n)e (n) Substituting e(n) =d(n) w H (n)x(n) µ w(n + 1) = w(n)+ x(n) 2 x(n)[d (n) x H (n)w(n)] [ ] = I µ x(n)xh (n) x(n) 2 w(n)+ µ x(n)d (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
75 NLMS Convergnce Objective: Compare the NLMS and LMS algorithms: NLMS: w(n + 1) = [ ] I µ x(n)xh (n) x(n) 2 w(n)+ µ x(n)d (n) x(n) 2 LMS: w(n + 1) =[I µx(n)x H (n)]w(n)+µx(n)d (n) By observation, we see the following corresponding terms LMS NLMS µ µ x(n)x H (n) x(n)x H (n) x(n) 2 x(n)d (n) x(n)d (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
76 NLMS Convergnce LMS NLMS µ µ x(n)x H (n) x(n)x H (n) x(n) 2 x(n)d (n) x(n)d (n) x(n) 2 LMS case: 0 <µ< 2 trace[e{x(n)x H (n)}] = 2 trace[r] guarantees stability By analogy, 0 < µ < trace guarantees stability of the NLMS [ E 2 { }] x(n)x H (n) x(n) 2 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
77 NLMS Convergnce To analyze the bound, make the following approximation E { x(n)x H } (n) x(n) 2 E{x(n)xH (n)} E{ x(n) 2 } Then trace [ E { x(n)x H }] (n) x(n) 2 = trace[e{x(n)xh (n)}] E{ x(n) 2 } = E{trace[x(n)xH (n)]} E{ x(n) 2 } = E{trace[xH (n)x(n)]} E{ x(n) 2 } = E{trace[ x(n) 2 ]} E{ x(n) 2 } = 1 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
78 NLMS Convergnce Thus 0 < µ < trace [ E 2 { }] = 2 x(n)x H (n) x(n) 2 Final Result: The NLMS update will converge if 0 < µ <2 Note: w(n + 1) =w(n)+ µ x(n) x(n) 2 e (n) The NLMS has a simpler convergence criterion than the LMS The NLMS generally converges faster than the LMS algorithm Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring / 79
79 ELEG Statistical Signal Processing Gonzalo R. Arce Variants of the LMS algorithm Department of Electrical and Computer Engineering University of Delaware Newark, DE, Fall 2013 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
80 Standard LMS Algorithm FIR filters: y(n)=w 0 (n)u(n)+w 1 (n)u(n 1) W M 1 (n)u(n M + 1) = M 1 Â w k (n)u(n k=0 k)=w(n) T u(n), n = 0,1,..., Error between filter output y(t) and a desired signal d(t): Update filter parameters according to e(n)=d(n) y(n)=d(n) w(n) T u(n). w(n + 1)= w(n)+ µu(n)e(n). (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
81 1. Normalized LMS Algorithm Modify at time n the parameter vector from w(n) to w(n + 1) Where l will result from d(n)= M 1 Â w i (n + 1)u(n i=0 i). In order to add an extra freedom degree to the adaptation strategy, one constant, µ, controlling the step size will be introduced: w j (n+1)=w j (n)+ µ 1 Â M 1 i=0 (u(n e(n)u(n j)=w µ j(n)+ i))2 ku(n)k 2 e(n)u(n j). (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
82 To overcome the possible numerical difficulties when ku(n)k is close to zero, a constant a > 0 is used: w j (n + 1)=w j (n)+ µ e(n)u(n j) a + ku(n)k2 This is the update used in the Normalized LMS algorithm. The Normalized LMS algorithm converges if 0 < µ < 2 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
83 Comparison of LMS and NLMS (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
84 Comparison of LMS and NLMS The LMS was run with three different step-sizes: µ =[0.075; 0.025; ]. The NLMS was run with four different step-sizes: µ =[1.0; 0.5; 0.1]. The larger the step-size, the faster the convergence. The smaller the step-size, the better the steady state square error. LMS with µ = and NLMS with µ = 0.1 achieved similar average steady state square error. However, NLMS was faster. LMS with µ = and NLMS with µ = 1.0 had a similar convergence speed. However, NLMS achieved a lower steady state average sqare error. Conclusion: NLMS offers better trade-offs than LMS. The computational complexity of NLMS is slightly higher that that of LMS. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
85 2. LMS Algorithm with Time Variable Adaptation Step Heuristics: combine benefits of two different situations: Convergence time constant is small for large µ. Mean-square error in steady state is low for small µ. Initial adaptation µ is kept large, then it is monotonically reduced µ(n)= 1 n + c. Disadvantage for non-stationary data: algorithm will not react to changes in the optimum solution, for large values of n. Variable Step algorithm: where M(n)= w(n + 1)= w(n)+ M(n)u(n)e(n) µ 0 (n) µ 1 (n) µ M 1 (n) Each filter parameter w i ()n is updated using an independent adaptation step µ i (n).. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
86 Comparison of LMS and variable size LMS µ = µ 0.01n+c with c =[10;20;50] (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
87 3. Sign algorithms In high speed communication the time is critical, thus faster adaptation processes is needed. 8 < sgn(a)= : 1; a> 0 0; a= 0 1; a< 0 The Sign algorithm (other names: pilot LMS, or Sign Error) w(n + 1)= w(n)+ µu(n) sgn(e(n)). The Clipped LMS (or Signed Regressor) The Zero forcing LMS (or Sign Sign) w(n + 1)=w(n)+µ sgn(u(n))e(n). w(n + 1)=w(n)+µ sgn(u(n)) sgn(e(n)). The Sign algorithm can be derived as a LMS algorithm for minimizing the Mean absolute error (MAE) criterion J(w)=E[ e(n) ]=E[ d(n) w T u(n) ]. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
88 Properties of sign algorithms Fast computation: if µ is constrained to the form µ = 2 m, only shifting and addition operations are required. Drawback: the update mechanism is degraded, compared to LMS algorithm, by the crude quantization of gradient estimates. The steady state error will increase The convergence rate decreases The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for bps system. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
89 Comparison of LMS and Sign LMS Sign LMS algorithm should be operated at smaller step-sizes to get a similar behavior as standard LMS algorithm. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
90 4. Linear smoothing of LMS gradient estimates Lowpass filtering the noisy gradient Rename the noisy gradient g(n)= ˆ w J g(n)= ˆ w J = 2u(n)e(n) g i (n)= 2e(n)u(n i). Passing the signals g i (n) through low pass filters will prevent the large fluctuations of direction during adaptation process. b i (n)=lpf(g i (n)). The updating process will use the filtered noisy gradient w(n + 1)=w(n) The following versions are well known: µb(n). Averaged LMS algorithm LPF is the filter with impulse response h(m)= N 1 m = 0,1,...,N 1. w(n + 1)=w(n)+ µ N n  e(j)u(j). j=n N+1 (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
91 Momentum LMS algorithm LPF is an IIR filter of first order h(0)=1 g,h(1)=gh(0),h(2)=g 2 h(0),... then, b i (n)=lpf (g i (n)) = gb i (n 1)+(1 g)g i (n) b(n)=gb(n 1)+(1 g)g(n) The resulting algorithm can be written as a second order recursion: w(n + 1)= w(n) µb(n) gw(n)=gw(n 1) gµb(n 1) w(n + 1) gw(n)= w(n) gw(n 1) µb(n)+ gµb(n 1) (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
92 w(n + 1) gw(n)= w(n) gw(n 1) µb(n)+gµb(n 1) w(n + 1)= w(n)+g(w(n) w(n 1)) µ(b(n) gb(n 1)) w(n + 1)= w(n)+g(w(n) w(n 1)) µ(1 g)g(n) w(n + 1) = w(n) +g(w(n) w(n 1)) + 2µ(1 g)e(n)u(n) w(n + 1) w(n) = g(w(n) w(n 1)) + µ(1 g)e(n)u(n) Drawback: The convergence rate may decrease. Advantages: The momentum term keeps the algorithm active even in the regions close to minimum. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
93 5. Nonlinear smoothing of LMS gradient estimates Impulsive interference in either d(n) or u(n), drastically degrades LMS performance. Smooth noisy gradient components using a nonlinear filter. The Median LMS Algorithm The adaptation equation can be implemented as w i (n + 1) =w i (n) µ med((e(n)u(n i)),(e(n 1)u(n 1 i)),..., (e(n N)u(n N i))) Smoothing effect in impulsive noise environment is very strong. If the environment is not impulsive, the performances of Median LMS are comparable with those of LMS. Convergence rate is slower than in LMS. (Variants of the LMS algorithm) Gonzalo R. Arce Spring, / 15
ELEG-636: Statistical Signal Processing
ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical
More informationAdaptive Filtering Part II
Adaptive Filtering Part II In previous Lecture we saw that: Setting the gradient of cost function equal to zero, we obtain the optimum values of filter coefficients: (Wiener-Hopf equation) Adaptive Filtering,
More informationCh4: Method of Steepest Descent
Ch4: Method of Steepest Descent The method of steepest descent is recursive in the sense that starting from some initial (arbitrary) value for the tap-weight vector, it improves with the increased number
More informationLecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method
1 Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement Consider the family of variable parameter FIR filters, computing their
More informationLeast Mean Square Filtering
Least Mean Square Filtering U. B. Desai Slides tex-ed by Bhushan Least Mean Square(LMS) Algorithm Proposed by Widrow (1963) Advantage: Very Robust Only Disadvantage: It takes longer to converge where X(n)
More information2.6 The optimum filtering solution is defined by the Wiener-Hopf equation
.6 The optimum filtering solution is defined by the Wiener-opf equation w o p for which the minimum mean-square error equals J min σ d p w o () Combine Eqs. and () into a single relation: σ d p p 1 w o
More informationCh5: Least Mean-Square Adaptive Filtering
Ch5: Least Mean-Square Adaptive Filtering Introduction - approximating steepest-descent algorithm Least-mean-square algorithm Stability and performance of the LMS algorithm Robustness of the LMS algorithm
More informationSGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method
SGN 21006 Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 20 Adaptive filtering:
More informationLMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.
Lecture Lecture includes the following: Eigenvalue spread of R and its influence on the convergence speed for the LMS. Variants of the LMS: The Normalized LMS The Leaky LMS The Sign LMS The Echo Canceller
More informationAdaptive Filter Theory
0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/
More informationV. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline
V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline Goals Introduce Wiener-Hopf (WH) equations Introduce application of the steepest descent method to the WH problem Approximation to the Least
More informationLecture Notes in Adaptive Filters
Lecture Notes in Adaptive Filters Second Edition Jesper Kjær Nielsen jkn@es.aau.dk Aalborg University Søren Holdt Jensen shj@es.aau.dk Aalborg University Last revised: September 19, 2012 Nielsen, Jesper
More informationAdvanced Signal Processing Adaptive Estimation and Filtering
Advanced Signal Processing Adaptive Estimation and Filtering Danilo Mandic room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationIS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE?
IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE? Dariusz Bismor Institute of Automatic Control, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland, e-mail: Dariusz.Bismor@polsl.pl
More informationAdap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on
Adap>ve Filters Part 2 (LMS variants and analysis) Sta>s>cal Signal Processing II: Linear Es>ma>on Eric Wan, Ph.D. Fall 2015 1 LMS Variants and Analysis LMS variants Normalized LMS Leaky LMS Filtered-X
More informationLecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters
1 Lecture 6: Block Adaptive Filters and Frequency Domain Adaptive Filters Overview Block Adaptive Filters Iterating LMS under the assumption of small variations in w(n) Approximating the gradient by time
More information26. Filtering. ECE 830, Spring 2014
26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem
More informationMachine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.
Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015 Sergios Theodoridis 1 1 Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface
More informationAdaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling
Adaptive Filters - Statistical digital signal processing: in many problems of interest, the signals exhibit some inherent variability plus additive noise we use probabilistic laws to model the statistical
More informationAn Adaptive Sensor Array Using an Affine Combination of Two Filters
An Adaptive Sensor Array Using an Affine Combination of Two Filters Tõnu Trump Tallinn University of Technology Department of Radio and Telecommunication Engineering Ehitajate tee 5, 19086 Tallinn Estonia
More informationIII.C - Linear Transformations: Optimal Filtering
1 III.C - Linear Transformations: Optimal Filtering FIR Wiener Filter [p. 3] Mean square signal estimation principles [p. 4] Orthogonality principle [p. 7] FIR Wiener filtering concepts [p. 8] Filter coefficients
More informationCh6-Normalized Least Mean-Square Adaptive Filtering
Ch6-Normalized Least Mean-Square Adaptive Filtering LMS Filtering The update equation for the LMS algorithm is wˆ wˆ u ( n 1) ( n) ( n) e ( n) Step size Filter input which is derived from SD as an approximation
More informationMachine Learning and Adaptive Systems. Lectures 3 & 4
ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance
More information3.4 Linear Least-Squares Filter
X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum
More informationChapter 2 Wiener Filtering
Chapter 2 Wiener Filtering Abstract Before moving to the actual adaptive filtering problem, we need to solve the optimum linear filtering problem (particularly, in the mean-square-error sense). We start
More informationCHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS. 4.1 Adaptive Filter
CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS 4.1 Adaptive Filter Generally in most of the live applications and in the environment information of related incoming information statistic is not available
More informationAdaptiveFilters. GJRE-F Classification : FOR Code:
Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More information2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms
2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST 1999 A General Class of Nonlinear Normalized Adaptive Filtering Algorithms Sudhakar Kalluri, Member, IEEE, and Gonzalo R. Arce, Senior
More informationAdaptive SP & Machine Intelligence Linear Adaptive Filters and Applications
Adaptive SP & Machine Intelligence Linear Adaptive Filters and Applications Danilo Mandic room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationProbability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver
Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationOn the Stability of the Least-Mean Fourth (LMF) Algorithm
XXI SIMPÓSIO BRASILEIRO DE TELECOMUNICACÕES-SBT 4, 6-9 DE SETEMBRO DE 4, BELÉM, PA On the Stability of the Least-Mean Fourth (LMF) Algorithm Vítor H. Nascimento and José Carlos M. Bermudez + Abstract We
More informationSNR lidar signal improovement by adaptive tecniques
SNR lidar signal improovement by adaptive tecniques Aimè Lay-Euaille 1, Antonio V. Scarano Dipartimento di Ingegneria dell Innovazione, Univ. Degli Studi di Lecce via Arnesano, Lecce 1 aime.lay.euaille@unile.it
More informationELEG 833. Nonlinear Signal Processing
Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu February 15, 2005 1 INTRODUCTION 1 Introduction Signal processing
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More information5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model
5 Kalman filters 5.1 Scalar Kalman filter 5.1.1 Signal model System model {Y (n)} is an unobservable sequence which is described by the following state or system equation: Y (n) = h(n)y (n 1) + Z(n), n
More informationAdaptive Systems. Winter Term 2017/18. Instructor: Pejman Mowlaee Beikzadehmahaleh. Assistants: Christian Stetco
Adaptive Systems Winter Term 2017/18 Instructor: Pejman Mowlaee Beikzadehmahaleh Assistants: Christian Stetco Signal Processing and Speech Communication Laboratory, Inffeldgasse 16c/EG written by Bernhard
More informationComputer exercise 1: Steepest descent
1 Computer exercise 1: Steepest descent In this computer exercise you will investigate the method of steepest descent using Matlab. The topics covered in this computer exercise are coupled with the material
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationAdaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.
Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is
More informationParametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes
Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides were adapted
More informationMITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationNew Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 135 New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks Martin Bouchard,
More informationLinear Optimum Filtering: Statement
Ch2: Wiener Filters Optimal filters for stationary stochastic models are reviewed and derived in this presentation. Contents: Linear optimal filtering Principle of orthogonality Minimum mean squared error
More informationESE 531: Digital Signal Processing
ESE 531: Digital Signal Processing Lec 22: April 10, 2018 Adaptive Filters Penn ESE 531 Spring 2018 Khanna Lecture Outline! Circular convolution as linear convolution with aliasing! Adaptive Filters Penn
More informationStatistical and Adaptive Signal Processing
r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationSparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels
Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute
More informationVariable Learning Rate LMS Based Linear Adaptive Inverse Control *
ISSN 746-7659, England, UK Journal of Information and Computing Science Vol., No. 3, 6, pp. 39-48 Variable Learning Rate LMS Based Linear Adaptive Inverse Control * Shuying ie, Chengjin Zhang School of
More informationNSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters
NSLMS: a Proportional Weight Algorithm for Sparse Adaptive Filters R. K. Martin and C. R. Johnson, Jr. School of Electrical Engineering Cornell University Ithaca, NY 14853 {frodo,johnson}@ece.cornell.edu
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department
More informationA METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION
A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION Jordan Cheer and Stephen Daley Institute of Sound and Vibration Research,
More informationComparative Performance Analysis of Three Algorithms for Principal Component Analysis
84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.
More informationAdaptive Systems Homework Assignment 1
Signal Processing and Speech Communication Lab. Graz University of Technology Adaptive Systems Homework Assignment 1 Name(s) Matr.No(s). The analytical part of your homework (your calculation sheets) as
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental
More informationADAPTIVE FILTER ALGORITHMS. Prepared by Deepa.T, Asst.Prof. /TCE
ADAPTIVE FILTER ALGORITHMS Prepared by Deepa.T, Asst.Prof. /TCE Equalization Techniques Fig.3 Classification of equalizers Equalizer Techniques Linear transversal equalizer (LTE, made up of tapped delay
More informationAssesment of the efficiency of the LMS algorithm based on spectral information
Assesment of the efficiency of the algorithm based on spectral information (Invited Paper) Aaron Flores and Bernard Widrow ISL, Department of Electrical Engineering, Stanford University, Stanford CA, USA
More informationMMSE System Identification, Gradient Descent, and the Least Mean Squares Algorithm
MMSE System Identification, Gradient Descent, and the Least Mean Squares Algorithm D.R. Brown III WPI WPI D.R. Brown III 1 / 19 Problem Statement and Assumptions known input x[n] unknown system (assumed
More informationEEL 6502: Adaptive Signal Processing Homework #4 (LMS)
EEL 6502: Adaptive Signal Processing Homework #4 (LMS) Name: Jo, Youngho Cyhio@ufl.edu) WID: 58434260 The purpose of this homework is to compare the performance between Prediction Error Filter and LMS
More informationACTIVE noise control (ANC) ([1], [2]) is an established
286 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 2, MARCH 2005 Convergence Analysis of a Complex LMS Algorithm With Tonal Reference Signals Mrityunjoy Chakraborty, Senior Member, IEEE,
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationRegular paper. Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation
Ibraheem Kasim Ibraheem 1,* Regular paper Adaptive System Identification Using LMS Algorithm Integrated with Evolutionary Computation System identification is an exceptionally expansive topic and of remarkable
More informationEFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander
EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS Gary A. Ybarra and S.T. Alexander Center for Communications and Signal Processing Electrical and Computer Engineering Department North
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More information13. Power Spectrum. For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if.
For a deterministic signal x(t), the spectrum is well defined: If represents its Fourier transform, i.e., if jt X ( ) = xte ( ) dt, (3-) then X ( ) represents its energy spectrum. his follows from Parseval
More informationFSAN/ELEG815: Statistical Learning
: Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA Outline of the Course 1. Review of Probability 2. Stationary
More informationConvergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization
Convergence Evaluation of a Random Step-Size NLMS Adaptive Algorithm in System Identification and Channel Equalization 1 Shihab Jimaa Khalifa University of Science, Technology and Research (KUSTAR) Faculty
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science : Discrete-Time Signal Processing
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.34: Discrete-Time Signal Processing OpenCourseWare 006 ecture 8 Periodogram Reading: Sections 0.6 and 0.7
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationLecture 19 IIR Filters
Lecture 19 IIR Filters Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/5/10 1 General IIR Difference Equation IIR system: infinite-impulse response system The most general class
More informationPerformance Analysis and Enhancements of Adaptive Algorithms and Their Applications
Performance Analysis and Enhancements of Adaptive Algorithms and Their Applications SHENGKUI ZHAO School of Computer Engineering A thesis submitted to the Nanyang Technological University in partial fulfillment
More informationBinary Step Size Variations of LMS and NLMS
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume, Issue 4 (May. Jun. 013), PP 07-13 e-issn: 319 400, p-issn No. : 319 4197 Binary Step Size Variations of LMS and NLMS C Mohan Rao 1, Dr. B
More informationBLOCK LMS ADAPTIVE FILTER WITH DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS
BLOCK LMS ADAPTIVE FILTER WIT DETERMINISTIC REFERENCE INPUTS FOR EVENT-RELATED SIGNALS S. Olmos, L. Sörnmo, P. Laguna Dept. of Electroscience, Lund University, Sweden Dept. of Electronics Eng. and Communications,
More informationError Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification
American J. of Engineering and Applied Sciences 3 (4): 710-717, 010 ISSN 1941-700 010 Science Publications Error Vector Normalized Adaptive Algorithm Applied to Adaptive Noise Canceller and System Identification
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationOn the Use of A Priori Knowledge in Adaptive Inverse Control
54 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS PART I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 47, NO 1, JANUARY 2000 On the Use of A Priori Knowledge in Adaptive Inverse Control August Kaelin, Member,
More informationA Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases
A Derivation of the Steady-State MSE of RLS: Stationary and Nonstationary Cases Phil Schniter Nov. 0, 001 Abstract In this report we combine the approach of Yousef and Sayed [1] with that of Rupp and Sayed
More informationECE534, Spring 2018: Solutions for Problem Set #5
ECE534, Spring 08: s for Problem Set #5 Mean Value and Autocorrelation Functions Consider a random process X(t) such that (i) X(t) ± (ii) The number of zero crossings, N(t), in the interval (0, t) is described
More informationWiener Filtering. EE264: Lecture 12
EE264: Lecture 2 Wiener Filtering In this lecture we will take a different view of filtering. Previously, we have depended on frequency-domain specifications to make some sort of LP/ BP/ HP/ BS filter,
More informationAdaptive Stereo Acoustic Echo Cancelation in reverberant environments. Amos Schreibman
Adaptive Stereo Acoustic Echo Cancelation in reverberant environments Amos Schreibman Adaptive Stereo Acoustic Echo Cancelation in reverberant environments Research Thesis As Partial Fulfillment of the
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationFSAN815/ELEG815: Foundations of Statistical Learning
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction
More informationPMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron
PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity Architecture We consider
More informationOptimal and Adaptive Filtering
Optimal and Adaptive Filtering Murat Üney M.Uney@ed.ac.uk Institute for Digital Communications (IDCOM) 26/06/2017 Murat Üney (IDCOM) Optimal and Adaptive Filtering 26/06/2017 1 / 69 Table of Contents 1
More informationDeep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.
Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationMachine Learning and Adaptive Systems. Lectures 5 & 6
ECE656- Lectures 5 & 6, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 c. Performance Learning-LMS Algorithm (Widrow 1960) The iterative procedure in steepest
More informationIMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN ABSTRACT
IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN Jared K. Thomas Brigham Young University Department of Mechanical Engineering ABSTRACT The application of active noise control (ANC)
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationEqualization Prof. David Johns University of Toronto (
Equalization Prof. David Johns (johns@eecg.toronto.edu) (www.eecg.toronto.edu/~johns) slide 1 of 70 Adaptive Filter Introduction Adaptive filters are used in: Noise cancellation Echo cancellation Sinusoidal
More information