Computer exercise 1: Steepest descent

Similar documents
Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method

SGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method

Ch4: Method of Steepest Descent

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

2.6 The optimum filtering solution is defined by the Wiener-Hopf equation

LMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.

EE482: Digital Signal Processing Applications

Adaptive Systems Homework Assignment 1

In the Name of God. Lecture 11: Single Layer Perceptrons

Least Mean Square Filtering

Adaptive Filtering Part II

Least Mean Squares Regression. Machine Learning Fall 2018

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

AdaptiveFilters. GJRE-F Classification : FOR Code:

Experiment 1: Linear Regression

26. Filtering. ECE 830, Spring 2014

Adaptive Filter Theory

ELEG-636: Statistical Signal Processing

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

4. Linear transformations as a vector space 17

Ch5: Least Mean-Square Adaptive Filtering

Modal Decomposition and the Time-Domain Response of Linear Systems 1

Lecture 10: October 27, 2016

Numerical Optimization

Instructor: Dr. Benjamin Thompson Lecture 8: 3 February 2009

7.2 Steepest Descent and Preconditioning

Machine Learning and Adaptive Systems. Lectures 5 & 6

Properties of Linear Transformations from R n to R m

The Steepest Descent Algorithm for Unconstrained Optimization

Math 504 (Fall 2011) 1. (*) Consider the matrices

Linear Models for Regression. Sargur Srihari

Iterative Methods for Solving A x = b

ECE580 Partial Solution to Problem Set 3

JUST THE MATHS UNIT NUMBER 5.3. GEOMETRY 3 (Straight line laws) A.J.Hobson

Practical Optimization: Basic Multidimensional Gradient Methods

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Conjugate Gradient (CG) Method

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Adap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD

Physics with Matlab and Mathematica Exercise #1 28 Aug 2012

ECE 275A Homework 6 Solutions

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Least Mean Squares Regression

Homework 3 Solutions

Grade 9 Linear Equations in Two Variables

EE482: Digital Signal Processing Applications

Linear Neural Networks

Problems. Looks for literal term matches. Problems:

Gradient Descent. Dr. Xiaowei Huang

Single layer NN. Neuron Model

Basic concepts in Linear Algebra and Optimization

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Lecture 6 Positive Definite Matrices

Adaptive SP & Machine Intelligence Linear Adaptive Filters and Applications

Generalization. A cat that once sat on a hot stove will never again sit on a hot stove or on a cold one either. Mark Twain

First of all, the notion of linearity does not depend on which coordinates are used. Recall that a map T : R n R m is linear if

Math 3013 Problem Set 4

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

SPEEDING UP THE CONVERGENCE OF SIMPLE GRADIENT METHODS

HOMEWORK 10 SOLUTIONS

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

PETROV-GALERKIN METHODS

Neural Network Training

A Review of Linear Algebra

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Foundations of Computer Vision

1 Linearity and Linear Systems

1.7. Stability and attractors. Consider the autonomous differential equation. (7.1) ẋ = f(x),

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes

4 damped (modified) Newton methods

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Optimization and Calculus

IE 5531: Engineering Optimization I

which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar.

SIMON FRASER UNIVERSITY School of Engineering Science

Lecture 10: Powers of Matrices, Difference Equations

Getting Started with Communications Engineering

3.4 Linear Least-Squares Filter

Chapter 6: Orthogonality

Linear Algebra: Characteristic Value Problem

The Singular Value Decomposition

There are six more problems on the next two pages

14 Singular Value Decomposition

Main topics for the First Midterm Exam

Sample Final Exam: Solutions

Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice

Computational Methods. Eigenvalues and Singular Values

χ 1 χ 2 and ψ 1 ψ 2 is also in the plane: αχ 1 αχ 2 which has a zero first component and hence is in the plane. is also in the plane: ψ 1

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Math 409/509 (Spring 2011)

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

SOEE3250/5675/5115 Inverse Theory Lecture 2; notes by G. Houseman

ECE 636: Systems identification

AMS10 HW7 Solutions. All credit is given for effort. (-5 pts for any missing sections) Problem 1 (20 pts) Consider the following matrix 2 A =

Linear Algebra- Final Exam Review

Transcription:

1 Computer exercise 1: Steepest descent In this computer exercise you will investigate the method of steepest descent using Matlab. The topics covered in this computer exercise are coupled with the material of exercise 1. The goal is on the one hand consolidation of the theory presented in the course, on the other hand implementation of the algorithms. Especially, the implementation is extremely important, since the properties of adaptive filters are often investigated by means of simulation. Computer exercise 1.1 Consider the steepest descent method s update equation for the filter coefficients and the corresponding MSE: w(n + 1) = w(n) + µ(p Rw(n)) J(n) = J min + (w(n) R 1 p) T R(w(n) R 1 p). Create a function in Matlab that calculates the filter coefficients and the corresponding MSE as a function of the iteration number n = [0,..., N 1]. function [w,j]=sd(mu,winit,n,r,p,jmin); Call: [w,j]=sd(mu,winit,n,r,p,jmin) Input arguments: mu = step size, dim 1x1 winit = start values for filter coefficients, dim Mx1 N = no. of iterations, n=[0,...,n-1], dim 1x1 R = autocorrelation matrix, dim MxM p = cross-correlation vector, dim Mx1 Jmin = minimum mean square error, dim 1x1 Output arguments: w = matrix containing the filter coefficients as a function of n along the rows, dim MxN J = mean square error as a function of n, dim 1XN Computer exercise 1.2 Test your algorithm using the exercise example 3.1, where [ ] [ ] 2 1 6 R = p =. 1 2 4

2 Apply J min = 5 and w(0) = [0, 0] T. Investigate how the filter coefficients approach the optimum w o =R 1 p for different step sizes µ. What happens for µ = 2 What happens for µ > 2 λ max? Try other initial values w(0) and step sizes µ! λ max? Computer exercise 1.3 Verify exercise 3.1(c). Call steepest descent with the parameters given in exercise 3.1(c), i.e. µ= 1 2 µ max and w(0)=[0, 0] T. Set J min =5, and observe the first N =50 iterations. Plot J(n) calculated by the function together with the analytical solution ( ) n 4 J(n) = J min + 2. 9 Is there a difference? Why are the solutions different for n = 0? (Tips: 0 0 ) Computer exercise 1.4 The time constant for the k-th mode, τ k, is defined such that it equals the number of iterations within the amplitude of the mode decreases by the factor e 1, i.e., ν k (n) = ν k (0)e n/τ k. The relation between the filter weights w(n) and the modes ν(n) is given by ν(n) = Q T (w(n) w o ) = Q T c(n), (1) where Q is a matrix containing the eigenvectors of R. In the current example the eigenvalue decomposition yields R = QΛQ T Q = 1 [ ] [ ] 1 1 3 0 Λ =. 2 1 1 0 1 Determine the time constant τ k for the modes by transformation of the filter weights according to (1), v=q *(w-r\p*ones(1,n)); and plot plot([0:n-1],log(v)). Now you can determine τ k by inspection of the curve. Compare with the result given by Haykin (equation 4.23), τ k = Investigate µ = 1 and µ = 1. 6 3 1. ln(1 µλ k )

3 Computer exercise 1.5 In exercise 3.5 the effective time constant τ e,mse has been defined. It equals the number of iterations within the MSE reduces by a factor e 1, J(n) J min + (J(0) J min )e n/τe,mse This is an approximation valid for the initial part of the curve, and under the condition that µ is small. After the quickest mode has decayed, the convergence behavior is becoming slow, and the time constant tends towards the time constant of the slowest mode. The modes time constant determining the MSE is given by Haykin (equation 4.31): τ k,mse 1 2 ln(1 µλ k ). (2) Investigate the time constants in exercise 3.1 for two cases: µ = 1 and 10 µ = 1! Use N = 100 iterations, and w(0) = [0, 20 0]T. Write plot([0:n-1],log(j-jmin));grid on; and determine the time constant by inspection. How does it compare to the value obtained by applying the expression in 3.5 and (2), respectively? (Hint: The curve can be approximated by two straight lines.) Computer exercise 1.6 Investigate the cost function J(w) for M = 2 coefficients. Create a function that calculates the MSE as a function of the filter coefficients w = [w 1, w 2 ] T. function [Jm]=jmat(w1,w2,R,p,Jmin); Call: [Jm]=jmat(w1,w2,R,p,Jmin); Input arguments: w1 = vector with K values of w1 over time, dim 1xK w2 = vector with K values of w2 over time, dim 1xK R = autocorrelation matrix, dim MxM p = cross-correlation vector, dim Mx1 Jmin = minimum mean square error, dim 1x1 Output arguments:

4 Jm = matrix containing J(w1,w2) with w1 along the rows and w2 along the columns, dim KxK Computer exercise 1.7 Test your code investigating exercise 3.1. Use J min = 5. Create a matrix J(w) for the coefficients in an interval around the optimal value: R=[2 1;1 2];p=[6;4];Jmin=5;wo=R\p; w1=[(wo(1)-4):0.1:(wo(1)+4)]; w2=[(wo(2)-4):0.1:(wo(2)+4)]; Jm=jmat(w1,w2,R,p,Jmin); This matrix can be evaluated for different values. An alternative is a 3- dimensional plot, surf(w1,w2,jm);xlabel( w1 );ylabel( w2 );zlabel( J(w1,w2) ); another alternative is to plot the contour, contour(w1,w2,jm,50);xlabel( w1 );ylabel( w2 );colorbar where 50 is the number of ellipsoids to be drawn. Try both alternatives! Let us concentrate on the contour plot. In which direction can we observe the maximum change of the costs? Is that direction related to the largest or the smallest eigenvalue? (Hint: Consider the modes!) Computer exercise 1.8 Plot the filter coefficients in the same figure with the contour plot. Calculate first the filter coefficients using the function developed at the beginning of this exercise. Now type: contour(w1,w2,jm,50);xlabel( w1 );ylabel( w2 );colorbar hold on; plot(w(1,:),w(2,:), -x ); Investigate how the filter weights converge to the optimal solution! The iterations are marked with x. Verify that each timestamp is normal to the contour curve, i.e. in the direction of the gradient! Try other start values w(0) and step sizes µ.

5 What happens if µ is too large? What happens for µ = 2 λ max? Computer exercise 1.9 Investigate the convergence of the filter coefficients for the start values w(0) = R 1 p+a[1, 1] T and w(0) = R 1 p+a[1, 1] T, where a is a scalar. What is the largest step size µ so that the start value w(0) can lie on the main axis of the ellipsoid? Why? (Hint: Think of the modes!) What happens if the start values are located on the short axes of the ellipsoid?

6 Program code Steepest descent function [w,j]=sd(mu,winit,n,r,p,jmin); Call: [w,j]=sd(mu,winit,n,r,p,jmin) Input arguments: mu = step size, dim 1x1 winit = start values for filter coefficients, dim Mx1 N = no. of iterations, n=[0,...,n-1], dim 1x1 R = autocorrelation matrix, dim MxM p = cross-correlation vector, dim Mx1 Jmin = minimum mean square error, dim 1x1 Output arguments: w = matrix containing the filter coefficients as a function of n along the rows, dim MxN J = mean square error as a function of n, dim 1XN w=zeros(length(winit),n); J=zeros(1,N); w(:,1)=winit(:); for n=1:n-1 w(:,n+1)=w(:,n)+mu*(p-r*w(:,n)); J(n)=Jmin+(w(:,n)-R\p) *R*(w(:,n)-R\p); end J(N)=Jmin+(w(:,N)-R\p) *R*(w(:,N)-R\p); Cost function function [Jm]=jmat(w1,w2,R,p,Jmin); Call: [Jm]=jmat(w1,w2,R,p,Jmin); Input arguments: w1 = vector with K values of w1 over time, dim 1xK

7 w2 = vector with K values of w2 over time, dim 1xK R = autocorrelation matrix, dim MxM p = cross-correlation vector, dim Mx1 Jmin = minimum mean square error, dim 1x1 Output arguments: Jm = matrix containing J(w1,w2) with w1 along the rows and w2 along the columns, dim KxK Jm=zeros(length(w1),length(w2)); for k=1:length(w1) for l=1:length(w2) Jm(l,k)=Jmin+([w1(k);w2(l)]-R\p) *R*([w1(k);w2(l)]-R\p); end end