The Multi-Layer Perceptron

Similar documents
y(x n, w) t n 2. (1)

4. Multilayer Perceptrons

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks

Multi-layer Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Multilayer Perceptron

Learning from Data: Multi-layer Perceptrons

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Multilayer Perceptron

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Deep Neural Networks (1) Hidden layers; Back-propagation

Multilayer Perceptrons and Backpropagation

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Introduction to Machine Learning

Feed-forward Network Functions

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Artificial Neural Networks Examination, March 2004

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural networks. Chapter 19, Sections 1 5 1

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Pattern Classification

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 5: Logistic Regression. Neural Networks

Course 395: Machine Learning - Lectures

Deep Neural Networks (1) Hidden layers; Back-propagation

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural networks. Chapter 20. Chapter 20 1

Reading Group on Deep Learning Session 1

Intelligent Systems Discriminative Learning, Neural Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks and Deep Learning

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC321 Lecture 5: Multilayer Perceptrons

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Introduction to Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Lab 5: 16 th April Exercises on Neural Networks

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Learning Neural Networks

Multilayer Perceptrons (MLPs)

epochs epochs

Introduction to Neural Networks

Artificial Intelligence

Machine Learning. Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Neural Networks: Introduction

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Artificial neural networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Unit III. A Survey of Neural Network Model

Multilayer Perceptron = FeedForward Neural Network

Multilayer Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Stochastic gradient descent; Classification

Neural Networks Lecture 3:Multi-Layer Perceptron

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)

Artificial Neural Networks Examination, June 2004

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Backpropagation Neural Net

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural networks. Chapter 20, Section 5 1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

AI Programming CS F-20 Neural Networks

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

MLPR: Logistic Regression and Neural Networks

Notes on Back Propagation in 4 Lines

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Artificial Neural Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Statistical Machine Learning from Data

Advanced statistical methods for data analysis Lecture 2

Learning and Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Lecture 17: Neural Networks and Deep Learning

Feedforward Neural Nets and Backpropagation

CS 4700: Foundations of Artificial Intelligence

Artificial Neural Network

EEE 241: Linear Systems

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Logistic Regression & Neural Networks

Artificial Neural Networks. MGS Lecture 2

Simple Neural Nets For Pattern Classification

Physics 178/278 - David Kleinfeld - Winter 2017

Artifical Neural Networks

Transcription:

EC 6430 Pattern Recognition and Analysis Monsoon 2011 Lecture Notes - 6 The Multi-Layer Perceptron Single layer networks have limitations in terms of the range of functions they can represent. Multi-layer networks are capable of approximating any continous function. Feed-forward networks (NO feed-back loops), ensures that the network outputs can be calculated as explict functions of the inputs and the weights. Feed-forward Network Functions The idea is to construct networks having sucessive layers of processing units, with connections running from every unit in one layer to every unit in the next layer, but no other connections are permitted. Figure 1, shows an example of a two layer network. Units which are not treated as output units are called hidden units. There are D inputs (excluding the bias), M hidden units (again excluding the bias given at this point) and K output units. 1

Figure 1: Two layer network The output of the j th hidden unit is obtained by, a j = D wjix 1 i + wj0 1 (1) i=1 w 1 ji: denotes a weight in the rst layer (superscript), going from i th input to the j th hidden unit. We can set x 0 equation as, = 1, (as in single layer networks) and rewrite above a j = D wjix 1 i (2) i=0 We can dene activation functions for the outputs of hidden unit, 2

z j = g (a j ) (3) The output of hidden units are transformed by using a second layer of processing elements. b k = M wkjz 2 j + wk0 2 (4) j=1 where, k = 1 K. k th output is now obtained by dening activation function, y k = g (b k ) (5) Heaviside, or Step activation functions { 0 when a < 0 g (a) = 1 when a 0 (6) Example: Binary Input with x i = 0 or 1. Network ouput is also 0 or 1. So this is essentially a Boolean function. We can implement any Boolean function, provided we have enough number of hidden units. For D inputs, there will be a possible 2 D number of binary patterns. We take one hidden unit for each input pattern which has an output target of 1. So now each pattern producing a 1 will have a hidden unit associated with it. Each hidden unit responds to just the corresponding pattern. Now for that input, weight from an input to a given hidden unit is set to +1, if the corresponding pattern has a 1 for that input. weight is set to 1, if the pattern has a 0 for that input. 3

The bias for the hidden unit is set to 1 b, where b is the number of non-zero inputs for that pattern. (Bias is input dependent). Each hidden unit is connected to the output with a weight +1. Output bias is set at 1. Assignment 7a: Solve 2 bit X-OR problem 1. Using a single layer perceptron. 2. Using a two-layer network procedure described above. Sigmoidal function Logistic sigmoidal function is given by, g (a) = 1 1 + exp ( a) (7) We also use a `tanh' activation function given by, g (a) = tanh (a) = ea e a (8) e a + e a The two functions above have linear relationship and hence it does not really aect the end result. `tanh' gives faster convergence during training. Back-propagation algorithm A Back Propagation network learns by example. You give the algorithm examples of what you want the network to do and it changes the network's weights so that, when training is nished, it will give you the required output for a particular input. To train the network you need to give it examples of what you want, with the output you want (Target) for a particular input. Consider the input-target set in gure 2, 4

Figure 2: Back-propagation input This is given to a 2 layer MPL as shown in gure 3. Figure 3: 2 layer MLP input The network is rst initialised by setting up all its weights to be small random numbers say between 1 and +1. Next, the input pattern is applied and the output calculated (this is called the forward pass). Calculate the Error. 5

Using standard sum-of-squares function, E n = 1 2 K (y k t k ) (9) k=1 where, y k is the response of output unit k, and t k is the corresponding target. Now consider the connection between a hidden layer neuron, and an output neuron. We dene a variable delta given as, δ k = En b k = g (b k ) En y k (10) For sum-of-squares function, we will end up with, δ k = y k t k (11) δ k = 1 g (a) (1 g (a)) (y k t k ), if, g (a) = 1 + exp ( a) (12) Now we have to evaluate the δ j for the input layer - hidden layer. δ j = g (a j ) k w kj δ k (13) note here that you are using δ k value from the second layer to compute δ j value of the rst layer. For sum-of-squares function, we will end up with, δ j = K w kj δ k (14) k=1 δ j = g (a) (1 g (a)) K w kj δ k, if, g (a) = k=1 1 1 + exp ( a) (15) 6

We now use the δ values to compute the change in weights. There are two options available here, Batch mode, where you give all inputs together. Online mode, where you give inputs one-by-one and compute weights for each input. For online mode, we have, w ji = ηδ j x i (16) For batch mode, w kj = ηδ k z j (17) w ji = η n δ n j x n i (18) w kj = η n δ n k z n j (19) η remains the learning rate parameter. Note: if g (a) = tanh (a) = ea e a e a +e a, g (a) = 1 g (a) 2. 7

Figure 4: Back propogation example Number of hidden neurons? Broadly - Set the hidden layer conguration using just two rules: 1. number of hidden layers equals one; and 8

2. the number of neurons in that layer is the mean of the neurons in the input and output layers. What is happening here? We are generating decision boundaries by quadratic discriminant of the form, y (x) = w 2 x 2 + w 1 x + w 0 (20) Keep in mind that we can generalize this to higher dimensions and not just limit ourselves to quadratic equations. This then leads to higher -order processing units. Remember that the order of the polynomial was a problem and a highorder polynomial could result in over-tting. Growing and pruning algorithms A network that is initially small is considered, and new units and connections are allowed to be added during training. Such techniques are called growing algorithms. Alternative arrangement is to start with a relatively large network and gradually remove either connections or complete hidden units. These are the pruning algorithms. Assignment 7b: Analysis on MLP 1. Fit using MLP, with varying number of hidden-units. (a) Dene three 2 dimensional guassians: G1, G2 and G3 (use matlab function: mvnrnd) (b) G1 has µ 1 = 0.25, 0.3 and Σ 1 = 0.2, 0.25; 0.25, 0.4 (c) G2 has µ 2 = 0.5, 0.6 and Σ 2 = 0.3, 0.1; 0.1, 0.4 (d) G3 has µ 2 = 0.7, 0.75 and Σ 2 = 0.3, 0.1; 0.1, 0.4 (e) Generate 100 data points from all three. 9

(f) Use rst 80 data points from each as training set and the next 20 from each as your test set. (g) So you now have 2-dimensional inputs, with 3 classes. Train a MLP network with 3 hidden units. (h) What is the training accuracy? What is the testing accuracy? 2. How does variation in the number of hidden units aect accuracy? (a) Vary the number of hidden-units to 2 and 4. (b) How does the accuracy value change? Is it more, less? (c) Vary it to 10, how does the testing accuracy change? Assignment 7c: MLP for classication with FLD. 1. Applicaton of MLP for classication. (a) Dene three 4 dimensional guassians. (b) Generate 100 data points for each class using an appropriate covariance matrix and means. (c) Use FLD to reduce to 3 dimensions. (d) Classify using a 2-layer MLP with 3 hidden units. End (e) What is the training and testing accuracies? (f) How does the accuracy values compare with Euclidean classication you did before? References [1] www.rgu.ac.uk/les/chapter3%20-%20bp.pdf 10