Ch4: Method of Steepest Descent
|
|
- Martha Spencer
- 6 years ago
- Views:
Transcription
1 Ch4: Method of Steepest Descent The method of steepest descent is recursive in the sense that starting from some initial (arbitrary) value for the tap-weight vector, it improves with the increased number of iterations. The final value so computed for the tap-weight vector converges to the Wiener solution. This method is descriptive of a deterministic feedback system that finds the minimum point of the ensembleaveraged error-performance surface without knowledge of the surface itself. 1
2 Mean Square Error (Revisited) For a transversal filter (of length M), the output is written as H y ( n) w ( n) u( n) and the error term wrt. a certain desired response is 2
3 Mean Square Error (Revisited) Following these terms, the MSE criterion is defined as Substituting e(n) and manupulating the expression, we get Quadratic in w! where 3
4 Mean Square Error (Revisited) For notational simplicity, express MSE in terms of vector/matrices where H R E u( n) u ( n) r(0) r(1) r( M 1) * r (1) r(0) r( M 2) * * r ( M 1) r ( M 2) r(0) note r k r k * : ( ) ( ) 2 d is variance of the desired response d (n) p * E u( n) d ( n) 4 p(0) p( 1) p( ( M 1))
5 Mean Square Error (Revisited) We found that the solution (optimum filter coef.s w o ) is given by the Wiener-Hopf eqn.s 2 J min - H d p w o Inversion of R can be very costly. J(w) is quadratic in w convex in w for w o, Surface has a single minimum and it is global, then Can we reach to w o, i.e. with a less demanding algorithm? 5
6 Basic Idea of the Method of Steepest Descent Can we find w o in an iterative manner? 6
7 Basic Idea of the Method of Steepest Descent Starting from w(0), generate a sequence {w(n)} with the property J w( n 1) J w( n) Many sequences can be found following different rules. Method of steepest descent generates points using the gradient Gradient of J at point w, i.e. the function increases most. gives the direction at which Then gives the direction at which the function decreases most. Release a tiny ball on the surface of J it follows negative gradient of the surface. 7
8 8
9 9
10 10
11 11
12 Basic Idea of the Method of Steepest Descent For notational simplicity, let then going in the direction given by the negative gradient How far should we go in g defined by the step size param. μ Optimum step size can be obtained by line search - difficult Generally a constant step size is taken for simplicity. 12
13 ξ J 13
14 Application of SD to Wiener Filter For w(n) From the theory of Wiener Filter we know that J( n) J ( n) J ( n) j a0( n) b0( n) J ( n) J ( n) j a1( n) b1( n) J ( n) J ( n) j a M 1( n) bm 1( n) 14
15 Then the update eqn. becomes w( n 1) w( n) [ p Rw( n)] n 0, 1, 2, which defines a feedback connection. The correction δw(n) applied to the tap-weight vector at time n + 1 is equal to μ[p - Rw(n)]. This correction may also be expressed as μ times the expectation of the inner product of the tap-input vector u(n) and the estimation error e(n); e( n) d ( n) u H ( n) w( n) w * ( n 1) E u( n) e ( n), This suggests that we may use a bank of cross-correlators to compute the correction δw(n) applied to the tap-weight vector w(n) 15
16 16
17 Feedback model The transmittance of each branch of the graph is a scalar or a square matrix. For each branch of the graph, the signal vector flowing out equals the signal vector flowing in multiplied by the transmittance matrix of the branch. Parallel sum Cascade product w( n 1) w( n) [ p Rw( n)] 17
18 Convergence Analysis Feedback may cause stability problems under certain conditions. Depends on The step size, μ The autocorrelation matrix, R Does SD converge? Under which conditions? What is the rate of convergence? We may use the canonical representation. Let the weight-error vector be then the update eqn. becomes c( n) w w( n) o 18
19 Convergence Analysis Let be the eigen-decomposition of R. (the unitary similarity transformation) Then Using QQ H =I Apply the change of coordinates H H v( n) Q c( n) Q [ w w( n)] Then, the update eqn. becomes o 19
20 Convergence Analysis We know that Λ is diagonal, then the k-th natural mode is or, with the initial values v k (0), we have Note the geometric series 20
21 Convergence Analysis Obviously for stability or, simply or Why? Since the eigenvalues of the correlation matrix R are all real and positive Geometric series results in an exponentially decaying curve with time constant τ k, where letting 21
22 Convergence Analysis We have c( n) w w( n) w( n) w c( n) but o o then We know that Q is composed of the eigenvectors of R, then or Each filter coefficient decays exponentially. The overall rate of convergence is limited by the slowest and fastest modes 22
23 Convergence Analysis For small step size What is v(0)? The initial value v(0) is H v(0) Q [ w w(0)] For simplicity assume that w(0)=0, then o H v(0) Q w o 23
24 Convergence Analysis Transient behaviour: From the canonical form we know that then As long as the upper limit on the step size parameter μ is satisfied, regardless of the initial point 24
25 Convergence Analysis The progress of J(n) for n =0,1,... is called the learning curve. The learning curve of the steepest-descent algorithm consists of a sum of exponentials, each of which corresponds to a natural mode of the problem. # natural modes = # filter taps 25
26 Example A predictor with 2 taps (w 1 (n) and w 2 (n ) is used to find the params. of the AR process Examine the transient behaviour for Fixed step size, varying eigenvalue spread Fixed eigenvalue spread, varying step size. σ v2 is adjusted so that σ u2 =1. a 1 and a 2 are chosen to have complex roots 26
27 Example The AR process: We had Two eigenmodes Condition number 27
28 Example (Experiment 1) Experiment 1: Keep the step size fixed at Change the eigenvalue spread 28
29 v ( n) (1 ) v (0) v the optimum tap-weight vector equals: w n ( n) n 1, 2, n v 2( n) (1 2) v 2(0) o a a and 1 2 min v 2 H using v(0) Q w we have J o v1(0) a1 1 a1 a2 v(0) v 2(0) a 2 2 a1 a 2 when = and n is fixed J ( n) J v ( n) v ( n) represents min a circle with center at the origin and radius equal to the square root of [ J ( n) J ]. When eqn. represents (for fixed n) an ellipse with min 1 2 major axis equal to the square root of [ J ( n) J ] and minor axis equal to the square root of [ J ( n) J ]. min 1 min 2 29
30 Loci of v 1 (n) versus v 2 (n) for the steepest-descent algorithm with step-size parameter μ=0.3 and varying eigenvalue spread: (a) X(R) =1.22; (b) X(R)=3; (c) X(R)=10; (d) X(R)=
31 31
32 w 1( n) a1 ( v 1( n) v 2( n)) / 2 w( n) w 2( n) a2 ( v 1( n) v 2( n)) / 2 Loci of w 1 (n) versus w 2 (n) for the steepest-descent algorithm with step-size parameter μ=0.3 and varying eigenvalue spread: (a) X(R) =1.22; (b) X(R)=3; (c) X(R)=10; (d) X(R)=
33 33
34 We see that as the eigenvalue spread increases (and the input process becomes more correlated), the minimum meansquared error J min decreases. 34
35 Example (Experiment 2) Keep the eigenvalue spread fixed at Change the step size (μ max =1.1) Loci of v 1 (n) versus v 2 (n) for the steepest-descent algorithm with eigenvalue X (R)=10 and varying step-size parameters: (a) overdamped, μ=0.3 ; (b) underdamped, μ=
36 Loci of w 1 (n) versus w 2 (n) for the steepest-descent algorithm with eigenvalue X (R)=10 and varying step-size parameters: (a) overdamped, μ=0.3 ; (b) underdamped, μ=1.0. Depending on the value of μ, the learning curve can be Overdamped, moves smoothly to the min. ((very) small μ) Underdamped, oscillates towards the min. (large μ< μ max ) Critically damped Generally rate of convergence is slow for the first two. 36
37 Observations SD is a deterministic algorithm, i.e. we assume that R and p are known exactly. In practice they can only be estimated Sample average? Can have high computational complexity. SD is a local search algorithm, but for Wiener filtering, the cost surface is convex (quadratic) convergence is guaranteed as long as μ< μ max is satisfied. 37
38 38
39 Observations The origin of SD comes from the Taylor series expansion (as many other local search optimization algorithms) Convergence can be very slow. To speed up the process, second term can also be included as in the Newton s Method نکته: روش نيوتن در اثبات NLMS نيز استفاده خواهد شد. H=2R, Hessian Differentiation with respect to w and setting the result to zero 39
40 1 w w R p Rw 2 1 R p w 1 ( n 1) ( n) 2 2 ( n) o Optimum solution in a single iteration! High computational complexity (inversion), numerical stability problems. Hw4: Ch4, p 2, 4, 7, 10, 14 40
Ch6-Normalized Least Mean-Square Adaptive Filtering
Ch6-Normalized Least Mean-Square Adaptive Filtering LMS Filtering The update equation for the LMS algorithm is wˆ wˆ u ( n 1) ( n) ( n) e ( n) Step size Filter input which is derived from SD as an approximation
More informationSGN Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method
SGN 21006 Advanced Signal Processing: Lecture 4 Gradient based adaptation: Steepest Descent Method Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 20 Adaptive filtering:
More informationLecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method
1 Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement Consider the family of variable parameter FIR filters, computing their
More informationAdaptive Filtering Part II
Adaptive Filtering Part II In previous Lecture we saw that: Setting the gradient of cost function equal to zero, we obtain the optimum values of filter coefficients: (Wiener-Hopf equation) Adaptive Filtering,
More informationCh5: Least Mean-Square Adaptive Filtering
Ch5: Least Mean-Square Adaptive Filtering Introduction - approximating steepest-descent algorithm Least-mean-square algorithm Stability and performance of the LMS algorithm Robustness of the LMS algorithm
More informationComputer exercise 1: Steepest descent
1 Computer exercise 1: Steepest descent In this computer exercise you will investigate the method of steepest descent using Matlab. The topics covered in this computer exercise are coupled with the material
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationLinear Optimum Filtering: Statement
Ch2: Wiener Filters Optimal filters for stationary stochastic models are reviewed and derived in this presentation. Contents: Linear optimal filtering Principle of orthogonality Minimum mean squared error
More informationAdaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.
Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is
More information2.6 The optimum filtering solution is defined by the Wiener-Hopf equation
.6 The optimum filtering solution is defined by the Wiener-opf equation w o p for which the minimum mean-square error equals J min σ d p w o () Combine Eqs. and () into a single relation: σ d p p 1 w o
More informationELEG-636: Statistical Signal Processing
ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical
More informationV. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline
V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline Goals Introduce Wiener-Hopf (WH) equations Introduce application of the steepest descent method to the WH problem Approximation to the Least
More informationLeast Mean Square Filtering
Least Mean Square Filtering U. B. Desai Slides tex-ed by Bhushan Least Mean Square(LMS) Algorithm Proposed by Widrow (1963) Advantage: Very Robust Only Disadvantage: It takes longer to converge where X(n)
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationLMS and eigenvalue spread 2. Lecture 3 1. LMS and eigenvalue spread 3. LMS and eigenvalue spread 4. χ(r) = λ max λ min. » 1 a. » b0 +b. b 0 a+b 1.
Lecture Lecture includes the following: Eigenvalue spread of R and its influence on the convergence speed for the LMS. Variants of the LMS: The Normalized LMS The Leaky LMS The Sign LMS The Echo Canceller
More informationCHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS. 4.1 Adaptive Filter
CHAPTER 4 ADAPTIVE FILTERS: LMS, NLMS AND RLS 4.1 Adaptive Filter Generally in most of the live applications and in the environment information of related incoming information statistic is not available
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/
More informationECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.
ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, 2015 1 Name: Solution Score: /100 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully.
More informationStable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationA METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION
A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION Jordan Cheer and Stephen Daley Institute of Sound and Vibration Research,
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationSIMON FRASER UNIVERSITY School of Engineering Science
SIMON FRASER UNIVERSITY School of Engineering Science Course Outline ENSC 810-3 Digital Signal Processing Calendar Description This course covers advanced digital signal processing techniques. The main
More informationNon-linear least squares
Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined
More informationAdaptive Beamforming Algorithms
S. R. Zinka srinivasa_zinka@daiict.ac.in October 29, 2014 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method Outline
More information26. Filtering. ECE 830, Spring 2014
26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem
More informationSearching The Performance Surface
5 Searching The Performance Surface Assoc. Prof. Dr. Peerapol Yuvapoositanon Dept. of Electronic Engineering ASP-1 A Single Weight Filter From Ch 3 ASP-2 Cost Function J 140 120 100 80 60 40 20 0-10 -8-6
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMachine Learning and Adaptive Systems. Lectures 3 & 4
ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationVideo 6.1 Vijay Kumar and Ani Hsieh
Video 6.1 Vijay Kumar and Ani Hsieh Robo3x-1.6 1 In General Disturbance Input + - Input Controller + + System Output Robo3x-1.6 2 Learning Objectives for this Week State Space Notation Modeling in the
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department
More informationAdaptive Filter Theory
0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent
More informationMATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year
MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationLecture Notes in Adaptive Filters
Lecture Notes in Adaptive Filters Second Edition Jesper Kjær Nielsen jkn@es.aau.dk Aalborg University Søren Holdt Jensen shj@es.aau.dk Aalborg University Last revised: September 19, 2012 Nielsen, Jesper
More informationAdaptive Filters. un [ ] yn [ ] w. yn n wun k. - Adaptive filter (FIR): yn n n w nun k. (1) Identification. Unknown System + (2) Inverse modeling
Adaptive Filters - Statistical digital signal processing: in many problems of interest, the signals exhibit some inherent variability plus additive noise we use probabilistic laws to model the statistical
More informationEEL 6502: Adaptive Signal Processing Homework #4 (LMS)
EEL 6502: Adaptive Signal Processing Homework #4 (LMS) Name: Jo, Youngho Cyhio@ufl.edu) WID: 58434260 The purpose of this homework is to compare the performance between Prediction Error Filter and LMS
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationNumerical computation II. Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation
Numerical computation II Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation Reprojection error Reprojection error = Distance between the
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationStatistical and Adaptive Signal Processing
r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory
More informationMachine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang
Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationAdvanced Signal Processing Adaptive Estimation and Filtering
Advanced Signal Processing Adaptive Estimation and Filtering Danilo Mandic room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationComparative Performance Analysis of Three Algorithms for Principal Component Analysis
84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationChapter 2 Wiener Filtering
Chapter 2 Wiener Filtering Abstract Before moving to the actual adaptive filtering problem, we need to solve the optimum linear filtering problem (particularly, in the mean-square-error sense). We start
More informationDominant Pole Localization of FxLMS Adaptation Process in Active Noise Control
APSIPA ASC 20 Xi an Dominant Pole Localization of FxLMS Adaptation Process in Active Noise Control Iman Tabatabaei Ardekani, Waleed H. Abdulla The University of Auckland, Private Bag 9209, Auckland, New
More informationNewton s laws. Chapter 1. Not: Quantum Mechanics / Relativistic Mechanics
PHYB54 Revision Chapter 1 Newton s laws Not: Quantum Mechanics / Relativistic Mechanics Isaac Newton 1642-1727 Classical mechanics breaks down if: 1) high speed, v ~ c 2) microscopic/elementary particles
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationAdaptiveFilters. GJRE-F Classification : FOR Code:
Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationIS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE?
IS NEGATIVE STEP SIZE LMS ALGORITHM STABLE OPERATION POSSIBLE? Dariusz Bismor Institute of Automatic Control, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland, e-mail: Dariusz.Bismor@polsl.pl
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationData Mining (Mineria de Dades)
Data Mining (Mineria de Dades) Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya
More informationMachine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More information4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:
Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationOften, in this class, we will analyze a closed-loop feedback control system, and end up with an equation of the form
ME 32, Spring 25, UC Berkeley, A. Packard 55 7 Review of SLODEs Throughout this section, if y denotes a function (of time, say), then y [k or y (k) denotes the k th derivative of the function y, y [k =
More informationNumerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore
Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series
More informationPETROV-GALERKIN METHODS
Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods 7.1 Energy Norm Minimization Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based
More informationIMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN ABSTRACT
IMPROVEMENTS IN ACTIVE NOISE CONTROL OF HELICOPTER NOISE IN A MOCK CABIN Jared K. Thomas Brigham Young University Department of Mechanical Engineering ABSTRACT The application of active noise control (ANC)
More information1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that
Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationSNR lidar signal improovement by adaptive tecniques
SNR lidar signal improovement by adaptive tecniques Aimè Lay-Euaille 1, Antonio V. Scarano Dipartimento di Ingegneria dell Innovazione, Univ. Degli Studi di Lecce via Arnesano, Lecce 1 aime.lay.euaille@unile.it
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More information7.2 Steepest Descent and Preconditioning
7.2 Steepest Descent and Preconditioning Descent methods are a broad class of iterative methods for finding solutions of the linear system Ax = b for symmetric positive definite matrix A R n n. Consider
More informationLecture 7: Linear Prediction
1 Lecture 7: Linear Prediction Overview Dealing with three notions: PREDICTION, PREDICTOR, PREDICTION ERROR; FORWARD versus BACKWARD: Predicting the future versus (improper terminology) predicting the
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More informationTwo hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.
Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationScientific Computing II
Technische Universität München ST 008 Institut für Informatik Dr. Miriam Mehl Scientific Computing II Final Exam, July, 008 Iterative Solvers (3 pts + 4 extra pts, 60 min) a) Steepest Descent and Conjugate
More informationLecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016
Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic
More informationSome definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts
Some definitions Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization M. M. Sussman sussmanm@math.pitt.edu Office Hours: MW 1:45PM-2:45PM, Thack 622 A matrix A is SPD (Symmetric
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationLecture 19 IIR Filters
Lecture 19 IIR Filters Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/5/10 1 General IIR Difference Equation IIR system: infinite-impulse response system The most general class
More informationLecture 15: Ordinary Differential Equations: Second Order
Lecture 15: Ordinary Differential Equations: Second Order 1. Key points Simutaneous 1st order ODEs and linear stability analysis. 2nd order linear ODEs (homogeneous and inhomogeneous. Maple DEplot Eigenvectors
More informationPrincipal Component Analysis (PCA) for Sparse High-Dimensional Data
AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More information