y(x n, w) t n 2. (1)

Similar documents
Reading Group on Deep Learning Session 1

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Multilayer Perceptrons and Backpropagation

Multi-layer Neural Networks

Multilayer Perceptron

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural networks. Chapter 20. Chapter 20 1

4. Multilayer Perceptrons

Error Backpropagation

Lecture 4: Perceptrons and Multilayer Perceptrons

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lab 5: 16 th April Exercises on Neural Networks

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks. MGS Lecture 2

Gradient Descent Training Rule: The Details

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

NEURAL NETWORKS

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Network Training

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

AI Programming CS F-20 Neural Networks

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural networks. Chapter 19, Sections 1 5 1

Introduction to Neural Networks

Multilayer Perceptron

Logistic Regression & Neural Networks

Neural Networks (Part 1) Goals for the lecture

Revision: Neural Network

Neural Networks (and Gradient Ascent Again)

Artifical Neural Networks

Course 395: Machine Learning - Lectures

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Intro to Neural Networks and Deep Learning

The Multi-Layer Perceptron

Multilayer Neural Networks

Multilayer Perceptron = FeedForward Neural Network

Unit III. A Survey of Neural Network Model

Machine Learning Lecture 10

Computational Intelligence Winter Term 2017/18

Neural Networks. Henrik I. Christensen. Computer Science and Engineering University of California, San Diego

Neural networks III: The delta learning rule with semilinear activation function

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Supervised Learning in Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Advanced statistical methods for data analysis Lecture 2

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artificial Neural Networks 2

Artificial Intelligence

Linear Models for Regression

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks Lecture 4: Radial Bases Function Networks

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Introduction to Machine Learning

Artificial Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Network

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Computational Intelligence

Neural Networks and Deep Learning

Machine Learning Lecture 5

Pattern Classification

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Introduction to Neural Networks

Machine Learning

Feedforward Neural Nets and Backpropagation

Artificial Neural Network : Training

Neural Networks. Intro to AI Bert Huang Virginia Tech

Lecture 17: Neural Networks and Deep Learning

Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Feed-forward Network Functions

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Linear & nonlinear classifiers

Multilayer Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

CSC242: Intro to AI. Lecture 21

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Backpropagation: The Good, the Bad and the Ugly

Artificial Neural Networks. Edward Gatt

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Machine Learning. Neural Networks

Neural Networks biological neuron artificial neuron 1

Transcription:

Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N, together with a corresponding target {t n }, we minimize the error function E(w) = 1 2 N n=1 y(x n, w) t n 2. (1) Batch Gradient descent: Gradient descent is a method of minimization of a given cost or objective function E(w). w (τ+1) = w (τ) η E(w (τ) ) where η > 0 is the learning rate. After each such update, the gradient is re-evaluated and the process repeated. Note that each step requires that the whole data set be processed. This is called batch method. 2

On-line gradient descent: The error function comprises a set of independent observations comprising a sum of terms, one for each data point, E(w) = N n=1 E n (w). (2) The weight vector is updated one data point at a time. w (τ+1) = w (τ) η E n (w (τ) ) The process is repeated by cycling through the data either in sequence of by selecting points at random. gradient descent moves always downwards in a hilly landscape local minima can trap the movement 3

initialisation is important to avoid local minima choice of the learning rate η is crucial for speed of convergence; For online gradient descent algorithm, the direction of movement fluctuates highly between the steps but the average direction is approximately the steepest descent of the batch version lower computational cost slower - needs more iterations to converge 4

Example: Rosenblatt s Perceptron Rosenblatt s Perceptron is the linear discriminant function classifier, where the activation function takes the form φ(v) = sgn(v) = { 1 if v 0 1 if v < 0 (3) The training data set comprises N input vectors x 1,...,x N, with corresponding target values t 1,..., t N where t n { 1, 1}. 5

1. Initialization: Set w(0) = 0. Then perform the following computation for steps n = 1, 2,... 2. Compute the response of the Perceptron as y n = sgn([w(n)] T x n ) (4) 3. The online gradient descent algorithm is Or w(n + 1) = w(n + 1) = w(n) + η[t n y n ]x n (5) { w(n) if t n = y n (6) w(n) + 2ηt n x n if t n y n 4. Continuation: increment time step by one and go back to step 2. The perceptron is guaranteed to converge in finite steps if the data set is linearly separable. 6

Backpropagation: The Backpropagation is the application of gradient descent algorithm for training a MLP. One needs to calculate the derivative of the squared error function with respect to the weights of the network. Consider the squared error function E(w) = N n=1 E n (w). (7) from which we consider evaluating E n (w) as used in the sequential optimization. E n = 1 2 k (y nk t nk ) 2 (8) where y nk = y k (x n, w). If the model is a simple linear one as y k = i w ki x i (9) The gradient of this error function with respect to a weight w ji is given by w ji = (y nj t nj )x ni (10)

which can be interpreted as a local computation involving the product of an error signal (y nj t nj ) (at the output end of the link w ji ) and the variable x ni (at the input end of the link) In a general feed-forward network, each unit calculates a j = i w ji z i z j = h(a j ) (11) where h(.) is a nonlinear activation function. This step is often called forward propagation, which can be regarded as a forward flow of the information through the network. Apply the chain rule for partial derivatives to give w ji = w ji = δ j z i (12) 7

where δ j denotes a, referred to as error. j Equation (18) tells that the required derivative is obtained by multiplying the value of δ for the unit at the output end by the value of z for input end of the weight. For the output unit, we have δ k = y k t k (13) For the hidden units, we also use chain rule δ j = = k a k a k (14) where the sum runs over all units k to which unit j sends connections. We obtain the backpropagation formula δ j = h (a j ) k w kj δ k (15) which tells us that the value of δ for a particular hidden unit can be obtained by propagating the δ backwards from units higher up in the network. 8

Summary of Backpropagation 1. Apply an input vector x n to the network and forward propagate through the network to the find the activations of all hidden and output units. 2. Evaluate the δ k for all the outputs units using δ k = y k t k (16) 3. Backpropagate the δ to obtain δ j for each hidden unit in the network, using δ j = = k a k a k (17) 4. Evaluate the derivatives using w ji = δ j z i (18) 9

Example: A two layer network y k (x, w) = M j=0 w (2) kj h ( D i=0 w (1) ji x i ) where the output activation function is a linear function, and the hidden units h(a) have sigmoidal activation function given by h(a) = tanh(a) = ea e a e a + e a (19) A useful feature of this function is that its derivative can be expressed in a simple form h (a) = 1 h(a) 2 (20) For each training pattern in the training set, we first perform a forward propagation using a j = D i=0 w (1) ji x i z j = tanh(a j ) y k = M j=0 w (2) kj z j 10

Next we compute the δ for each output unit using δ k = y k t k. (21) Then we backpropagate to obtain δ s for the hidden units using δ j = (1 z 2 j ) K k=1 w (2) kj δ k. (22) Finally the derivatives with respect to the first layer and second layer weights are given by w (1) ji w (2) kj = δ j x i = δ k z i 11