Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Size: px
Start display at page:

Download "Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera"

Transcription

1 Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40

2 Subjects to be covered Motivation for adaptive learning rate Lyapunov Stability Theory Training Algorithm based on Lyapunov Stability Theory Simulations and discussion Conclusion Recurrent Networks p.2/40

3 Training of a Feed Forward Network W W x 1 y x 2 Figure 1: A feed-forward network Here, W R M is the weight vector. The training data consists of, say, N patterns, {x p, y p }, p = 1, 2,..., N. Weight update law: W (t+1) = W (t) η E W, η : learning rate Recurrent Networks p.3/40

4 Motivation for adaptive learning rate Actual Adaptive learning rate 50 x 0 = f(x) x Figure 2: Convergence to global minimum With adaptive learning rate, one can employ a higher learning rate when the error is far from global minimum and a smaller learning rate when it is near to it. Recurrent Networks p.4/40

5 Adaptive Learning Rate The objective is to achieve global convergence for a non-quadratic, non-convex nonlinear function without increasing the computational complexity. In GD, learning rate is fixed. If one can have a larger learning rate for a point far away from global minimum and a smaller learning rate for a point closer to global minimum, then it would be possible to avoid local minima and ensure global convergence. This necessitates need of adaptive learning rate. Recurrent Networks p.5/40

6 Lyapunov Stability Theory Used extensively in control system problems. If we choose a Lyapunov function candidate V (x(t), t) such that V (x(t), t) is positive definite V (x(t), t) is negative definite then the system is asymptotically stable. Local Invariant Set Theorem (La Salle) Consider an autonomous system of the form ẋ = f(x) with f continuous, and let V (x) be a scalar function with continuous partial derivatives. Assume that * for some l > 0, the region Ω l defined by V (x) < l is bounded. Recurrent Networks p.6/40

7 Lyapunov stability theory: contd... * V (x) < 0 for all x in Ωl. Let R be the set of all points within Ω l where V (x) = 0, and M be the largest invariant set in R. Then, every solution x(t) originating in Ω l tends to M as t. Problem lies in choosing a proper Lyapunov function candidate. Recurrent Networks p.7/40

8 Weight update law using Lyapunov based approach The network output is given by ŷ p = f(w, x p ) p = 1, 2,... N (1) The usual quadratic cost function is given as: E = 1 2 N (y p ŷ p ) 2 (2) p=1 Let s choose a Lyapunov function candidate for the system as below: V = 1 2 (ỹt ỹ) (3) where ỹ = [y 1 ŷ 1,..., y p ŷ p,..., y N ŷ N ] T. Recurrent Networks p.8/40

9 LF I Algorithm The time derivative of the Lyapunov function V is given by where V = ỹ ŷ W Ẇ = ỹt JẆ (4) J = ŷ W J RN M Theorem 1. If an arbitrary initial weight W (0) is updated by W (t ) = W (0) + t 0 Ẇ dt (5) where ỹ 2 Ẇ = J T ỹ 2 +ɛ J T ỹ (6) where ɛ is a small positive constant, then ỹ converges to zero under the condition that Ẇ exists along the convergence trajectory. Recurrent Networks p.9/40

10 Proof of LF - I Algorithm Proof. Substitution of Eq. (6) into Eq. (4) yields V 1 = ỹ 2 J T ỹ 2 J T ỹ 2 +ɛ 0 (7) where V 1 < 0 for all ỹ 0. If V 1 is uniformly continuous and bounded, then according to Barbalat s lemma as t, V 1 0 and ỹ 0. Recurrent Networks p.10/40

11 LF - I Algorithm: contd... The weight update law is a batch update law. The instantaneous LF I learning algorithm can be derived as: Ẇ = ỹ 2 J it ỹ 2 J i T ỹ (8) where ỹ = y p ŷ p R and J i = ŷp W R(1 M). The difference equation representation of the weight update equation is given by Ŵ (t + 1) = Ŵ (t) + µẇ (t) (9) Here µ is a constant. Recurrent Networks p.11/40

12 Comparison with BP Algorithm In gradient descent method we have, W = η E W = ηj i T ỹ Ŵ (t + 1) = Ŵ (t) + ηj i T ỹ (10) The update equation for LF-I algorithm: Ŵ (t + 1) = Ŵ (t) + ( µ ỹ 2 J it ỹ 2 ) JiT ỹ Comparing above two equations, we find that the fixed learning rate η in BP algorithm is replaced by its adaptive version η a : η a = ( µ ỹ 2 ) (11) J it ỹ 2 Recurrent Networks p.12/40

13 Adaptive Learning rate of LF-I 50 LF - I : XOR 40 Learning rate No of iterations (4xno. of epochs) Learning rate is not fixed unlike BP algorithm. Learning rate goes to zero as error goes to zero. Recurrent Networks p.13/40

14 Convergence of LF-I The theorem states that the global convergence of LF-I is guaranteed provided Ẇ exists along the convergence trajectory. This, in turn, necessitates V 1 = J T ỹ 0. W V 1 = 0, indicates a local minimum of the error function. W Thus, the theorem only says that the global minimum is reached only when local minima are avoided during training. Since instantaneous update rule introduces noise, it may be possible to reach global minimum in some cases, however, the global convergence is not guaranteed. Recurrent Networks p.14/40

15 LF II Algorithm We consider following Lyapunov function V 2 = 1 2 (ỹt ỹ + λẇt Ẇ) = V 1 + λ 2 ẆT Ẇ (12) where λ is a positive constant. The time derivative of above equation is given by where J = y W V 2 = ỹ T y WẆ + λẅt Ẇ = ỹ T (J D)Ẇ (13) : N m is the Jacobian matrix, and D = λ 1 ỹ 2 ỹẅt R N m Recurrent Networks p.15/40

16 LF II Algorithm: contd... Theorem 2. If the update law for weight vector W follows a dynamics given by following nonlinear differential equation Ẇ = α(w)j T ỹ α(w)ẅ (14) where α(w) = ỹ 2 is a scalar function of weight vector W J T ỹ 2 +ɛ and ɛ is a small positive constant, then ỹ converges to zero under the condition that (J D) T ỹ is non-zero along the convergence trajectory. Recurrent Networks p.16/40

17 Proof of LF II algorithm Proof. Ẇ = α(w)j T ỹ α(w)ẅ may be rewritten as Ẇ = ỹ 2 J T ỹ 2 + ɛ (J D)T ỹ (15) Substituting for Ẇ from above equation into V 2 = ỹ T (J D)Ẇ, we get V 2 = ỹ 2 J T ỹ 2 +ɛ (J D)T ỹ 2 0 (16) Since (J D) T ỹ is non-zero, V 2 < 0 for all ỹ 0 and V 2 = 0 iff ỹ = 0. If V 2 is uniformly continuous and bounded, then according to Barbalat s lemma as t, V 2 0 and ỹ 0. Recurrent Networks p.17/40

18 Proof of LF II algorithm: contd... The instantaneous weight update equation using LF II algorithm can be finally expressed in difference equation model as follows: W(t + 1) = W(t) + ( ỹ 2 ) µ (Jp D) T ỹ J pt ỹ 2 + ɛ = W(t) + µ µ 1 Ẅ(t) J pt ỹ 2 + ɛ ỹ 2 J pt ỹ 2 + ɛ J p T ỹ (17) where µ 1 = µλ and the acceleration Ẅ(t) is computed as: Ẅ(t) = 1 [W(t) 2W(t 1) + W(t 2)] ( t) 2 and t is taken to be one time unit for simulation. Recurrent Networks p.18/40

19 Comparison with BP Algorithm Applying gradient-descent to V 2 = V 1 + λ 2 ẆT Ẇ, ) T ( V2 W = η W ( ) T [ ] T V1 d = η η W dw (λ 2 ẆT Ẇ) ( ) T y = η ỹ ηλẅ W Thus, the weight update equation for gradient descent method may be written as W(t + 1) = W(t) + η J pt ỹ µ Ẅ }{{} acceleration term (18) Recurrent Networks p.19/40

20 Adaptive learning rate and adaptive acceleration Comparing the two updates law, the adaptive learning rate in this case is given by η a ỹ 2 = µ J pt ỹ 2 + ɛ (19) and the adaptive acceleration rate is given by µ a = λ J pt ỹ 2 + ɛ (20) Recurrent Networks p.20/40

21 Convergence of LF II The global minimum of V 2 in is given by ỹ = 0, Ẇ = 0 (ỹ R n, W R m ) Global minimum can be reached provided vanish along the convergence trajectory. Ẇ does not Analyzing local minima conditions: Ẇ vanishes under following conditions 1. First condition: J = D (J, D R n m ) In case of neural networks, it is very unlikely that each element of J would be equal to that of D, thus this possibility can easily be ruled out for a multi-layer perceptron network. Recurrent Networks p.21/40

22 Convergence of LF II: contd Second Condition: Ẇ vanishes whenever (J D) T ỹ = 0 Assuming J D, Rank ρ(j D) = n ensures global convergence. 3. Third Condition: J T ỹ = D T ỹ = λẅ Solutions of above equation represent local minima The solution to above equation exists for every vector Ẅ Rm whenever rank ρ(j) = m Recurrent Networks p.22/40

23 Convergence of LF II: contd... For NN, n m ρ(j) n. Hence there are at least m n vectors Ẅ Rm for which solutions do not exist and hence local minima do not occur. Thus, increasing no. of hidden layers or hidden neurons (i.e, increasing m), chances of encountering local minima can be reduced. Increasing the number of output neurons increases both m and n as well as n/m. Thus, for MIMO systems, there are more local minima (for fixed number of weights) as compared to single output systems. Recurrent Networks p.23/40

24 Avoiding local minima local minimum V 1 global minimum W(t 1) W(t) W(t + 1) t 2 t 1 t t + 1 C B A D W Recurrent Networks p.24/40

25 Avoiding local minima: contd... Rewrite the update law for LF-II as W(t + 1) = W(t) + W(t + 1) = W(t) η V 1 W (t) µ Ẅ(t) Consider point B (at time t 1): The weight update for the interval (t 1, t] computed at this instant W(t) = W 1 (t 1) + W 2 (t 1). W 1 (t 1) = η V 1 (t 1) > 0 W W 2 (t 1) = µẅ(t 1) = µ( W (t 1) W (t 2)) > 0 It is to be noted that W (t 1) < W (t 2) as the velocity is decreasing towards the point of local minimum. W (t) > 0, hence speed increases. Recurrent Networks p.25/40

26 Avoiding local minima: contd... Consider point A (at t): Weight increments W 1 (t) = η V 1 W (t) = 0 W 2 (t) = µẅ(t) = µ( W (t) W (t 1)) > 0 W (t) < W (t 1) W 2 (t) > 0 W(t + 1) = W 1 (t) + W 2 (t) > 0 This helps in avoiding local minimum Recurrent Networks p.26/40

27 Avoiding local minima: contd... Consider point D (at instant t + 1): Weight contributions W 1 (t + 1) = η V 1 (t + 1) < 0 W W 2 (t + 1) = µẅ(t + 1) = µ( W (t + 1) W (t)) > 0 contribution due to BP term becomes negative as the slope V 1 > 0 on the right hand side of local minimum. W W (t + 1) < W (t) W(t + 2) = W 1 (t + 1) + W 2 (t + 1) > 0 if W 2 (t + 1) > W 1 (t + 1) Thus it is possible to avoid local minima by properly choosing µ. Recurrent Networks p.27/40

28 Simulation results - LF-I vs LF-II: XOR XOR LF I (λ=0.0, µ=0.55) LF II (λ=0.015, µ=0.65) Training Epochs Runs Figure 3: performance comparison for XOR Observation: LF II provides tangible improvement over LF I both in terms of convergence time and training epochs. Recurrent Networks p.28/40

29 LF I vs LF II: 3-bit parity bit Parity LF I (λ=0.0, µ=0.47) LF II (λ=0.03, µ=0.47) Training epochs Runs Figure 4: performance comparison for 3-bit parity Observation: LF II performs better than LF I both in terms of computation time and training epochs Recurrent Networks p.29/40

30 LF I vs LF II: 8-3 Encoder Encoder LF I (λ=0.0, µ=0.46) LF II (λ=0.01, µ=0.465) Training epochs Runs Figure 5: comparison for 8-3 encoder Observation: LF II takes minimum epochs in most of the runs Recurrent Networks p.30/40

31 LF I vs LF II: 2D Gabor function D Gabor Function LF I (µ=0.8, λ=0.0) LF II (µ=0.8, λ=0.6) rms training error Iterations (training data points) Figure 6: performance comparison for 2D Gabor function Observation: With increasing iterations, the performance of LF II improves as compared to LF I Recurrent Networks p.31/40

32 Simulation Results - Comparison: contd... XOR Algorithm epochs time (sec) parameters BP η = 0.5 BP η = 0.95 EKF λ = 0.9 LF-I µ = 0.55 LF-II µ = 0.65, λ = 0.01 Recurrent Networks p.32/40

33 Comparison among BP, EKF and LF-II 0.4 Convergence time (seconds) BP EKF LF - II Run Observation: LF takes almost same time for any arbitrary initial condition. Recurrent Networks p.33/40

34 Comparison among BP, EKF and LF: contd... 3-bit Parity Algorithm epochs time (sec) parameters BP η = 0.5 BP η = 0.95 EKF λ = 0.9 LF-I µ = 0.47 LF-II µ = 0.47, λ = 0.03 Recurrent Networks p.34/40

35 Comparison among BP, EKF and LF: contd Encoder Algorithm epochs time (sec) parameters BP η = 0.7 BP η = 0.9 LF-I µ = 0.46 LF-II µ = 0.465, λ = 0.01 Recurrent Networks p.35/40

36 Comparison among BP, EKF and LF: contd... 2D Gabor function Algorithm No. of Centers rms error/run parameters BP η 1,2 = 0.2 BP η 1,2 = 0.2 LF-I µ = 0.8 LF-II µ = 0.8, λ = 0.3 Recurrent Networks p.36/40

37 Discussion Global convergence of Lyapunov based learning Algorithms Consider the following Lyapunov function candidate: V 2 = µv σ V 1 W 2 ; where V 1 = 1 2ỹT ỹ (21) The objective is to select an weight update law Ẇ such that the global minimum (V 1 = 0 and V 1 = 0), is reached. W The rate derivative of the Lyapunov function V 2 is given as: V 2 = V 1 W [µi + σ 2 V 1 ]Ẇ (22) T W W Recurrent Networks p.37/40

38 If the weight update law Ẇ = [µi +σ 2 V 1 W W with ζ > 0 and η > 0, then Ẇ is selected as T ] 1 ( V1 W )T V 1 W 2 (ζ V 1 W 2 +η V 1 2 ) (23) V 2 = ζ V 1 W 2 η V 1 2 (24) which is negative definite with respect to V 1 and V 1 W. Thus, V 2 will finally converge to its equilibrium point given by V 1 = 0 and T V 1 W = 0. Recurrent Networks p.38/40

39 But the implementation of the weight update algorithm becomes very difficult due to the presence of a Hessian term 2 V 1 W W T. Thus, the above algorithm is of theoretical interest. The above weight update algorithm is similar to BP learning algorithm with a fixed learning rate. Recurrent Networks p.39/40

40 Conclusion LF Algorithms perform better than both EKF and BP algorithms in terms of speed and accuracy. LF II avoids local minima to a greater extent as compared to LF I. It is seen that by choosing a proper network architecture, it is possible to reach global minimum. LF-I Algorithm has an interesting parallel with conventional BP algorithm where the the fixed learning rate of BP is replaced by an adaptive learning rate. Recurrent Networks p.40/40

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

Nonlinear System Analysis

Nonlinear System Analysis Nonlinear System Analysis Lyapunov Based Approach Lecture 4 Module 1 Dr. Laxmidhar Behera Department of Electrical Engineering, Indian Institute of Technology, Kanpur. January 4, 2003 Intelligent Control

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Neural networks III: The delta learning rule with semilinear activation function

Neural networks III: The delta learning rule with semilinear activation function Neural networks III: The delta learning rule with semilinear activation function The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. We

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Linearization problem. The simplest example

Linearization problem. The simplest example Linear Systems Lecture 3 1 problem Consider a non-linear time-invariant system of the form ( ẋ(t f x(t u(t y(t g ( x(t u(t (1 such that x R n u R m y R p and Slide 1 A: f(xu f(xu g(xu and g(xu exist and

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Neuro-Fuzzy Comp. Ch. 4 March 24, R p 4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear

More information

Theory IIIb: Generalization in Deep Networks

Theory IIIb: Generalization in Deep Networks CBMM Memo No. 90 June 29, 2018 Theory IIIb: Generalization in Deep Networks Tomaso Poggio 1, Qianli Liao 1, Brando Miranda 1, Andrzej Banburski 1, Xavier Boix 1 and Jack Hidary 2 1 Center for Brains, Minds,

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems. p. 1/1

Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems. p. 1/1 Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems p. 1/1 p. 2/1 Converse Lyapunov Theorem Exponential Stability Let x = 0 be an exponentially stable equilibrium

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

CS 453X: Class 20. Jacob Whitehill

CS 453X: Class 20. Jacob Whitehill CS 3X: Class 20 Jacob Whitehill More on training neural networks Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples.

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze - SS 004 Holger Fröhlich (abg. Vorl. von S. Kaushik¹) Lehrstuhl Rechnerarchitektur, Prof. Dr. A. Zell ¹www.cse.iitd.ernet.in/~saroj Radial

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis Topic # 16.30/31 Feedback Control Systems Analysis of Nonlinear Systems Lyapunov Stability Analysis Fall 010 16.30/31 Lyapunov Stability Analysis Very general method to prove (or disprove) stability of

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Nonlinear Control Lecture 5: Stability Analysis II

Nonlinear Control Lecture 5: Stability Analysis II Nonlinear Control Lecture 5: Stability Analysis II Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2010 Farzaneh Abdollahi Nonlinear Control Lecture 5 1/41

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

An Improved Differential Evolution Trained Neural Network Scheme for Nonlinear System Identification

An Improved Differential Evolution Trained Neural Network Scheme for Nonlinear System Identification International Journal of Automation and Computing 6(2), May 2009, 137-144 DOI: 10.1007/s11633-009-0137-0 An Improved Differential Evolution Trained Neural Network Scheme for Nonlinear System Identification

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Control of Mobile Robots

Control of Mobile Robots Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and

More information

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lyapunov Stability Theory

Lyapunov Stability Theory Lyapunov Stability Theory Peter Al Hokayem and Eduardo Gallestey March 16, 2015 1 Introduction In this lecture we consider the stability of equilibrium points of autonomous nonlinear systems, both in continuous

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Computational Intelligence Winter Term 2017/18

Computational Intelligence Winter Term 2017/18 Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Artificial Neural Networks

Artificial Neural Networks 0 Artificial Neural Networks Based on Machine Learning, T Mitchell, McGRAW Hill, 1997, ch 4 Acknowledgement: The present slides are an adaptation of slides drawn by T Mitchell PLAN 1 Introduction Connectionist

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Computational Intelligence

Computational Intelligence Plan for Today Single-Layer Perceptron Computational Intelligence Winter Term 00/ Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Accelerated Learning

More information

CDS Solutions to the Midterm Exam

CDS Solutions to the Midterm Exam CDS 22 - Solutions to the Midterm Exam Instructor: Danielle C. Tarraf November 6, 27 Problem (a) Recall that the H norm of a transfer function is time-delay invariant. Hence: ( ) Ĝ(s) = s + a = sup /2

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Georgia Institute of Technology Nonlinear Controls Theory Primer ME 6402

Georgia Institute of Technology Nonlinear Controls Theory Primer ME 6402 Georgia Institute of Technology Nonlinear Controls Theory Primer ME 640 Ajeya Karajgikar April 6, 011 Definition Stability (Lyapunov): The equilibrium state x = 0 is said to be stable if, for any R > 0,

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information