Multi-layer Perceptron Networks

Similar documents
Machine Learning. Neural Networks

Simple Neural Nets For Pattern Classification

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks and the Back-propagation Algorithm

Multilayer Perceptron

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural networks. Chapter 20, Section 5 1

Introduction To Artificial Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Course 395: Machine Learning - Lectures

4. Multilayer Perceptrons

Data Mining Part 5. Prediction

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Unit 8: Introduction to neural networks. Perceptrons

Chapter ML:VI (continued)

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

EEE 241: Linear Systems

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Multilayer Perceptrons and Backpropagation

CSC321 Lecture 5: Multilayer Perceptrons

Neural networks. Chapter 20. Chapter 20 1

Lab 5: 16 th April Exercises on Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

Chapter ML:VI (continued)

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Artificial Intelligence

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Input layer. Weight matrix [ ] Output layer

Intelligent Systems Discriminative Learning, Neural Networks

Introduction to Artificial Neural Networks

Introduction to Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

ECE521 Lecture 7/8. Logistic Regression

Statistical Machine Learning from Data

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Classification with Perceptrons. Reading:

Artificial Neural Networks The Introduction

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Neural Networks biological neuron artificial neuron 1

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Introduction to Machine Learning

AI Programming CS F-20 Neural Networks

Deep Feedforward Networks

Multilayer Neural Networks

Multilayer Neural Networks

Multi-layer Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Neural Networks Task Sheet 2. Due date: May

Deep Feedforward Networks

COMP-4360 Machine Learning Neural Networks

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Revision: Neural Network

Artificial Neural Network : Training

Feedforward Neural Nets and Backpropagation

Simulating Neural Networks. Lawrence Ward P465A

Numerical Learning Algorithms

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Lecture 7 Artificial neural networks: Supervised learning

Security Analytics. Topic 6: Perceptron and Support Vector Machine

CS 453X: Class 20. Jacob Whitehill

Grundlagen der Künstlichen Intelligenz

CSC 411 Lecture 10: Neural Networks

Linear discriminant functions

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Lecture 4: Feed Forward Neural Networks

Deep Feedforward Networks. Sargur N. Srihari

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Learning and Neural Networks

Machine Learning and Data Mining. Linear classification. Kalev Kask

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Notes on Back Propagation in 4 Lines

Machine Learning Linear Models

Integer weight training by differential evolution algorithms

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Week 5: The Backpropagation Algorithm

Practicals 5 : Perceptron

Statistical NLP for the Web

Radial-Basis Function Networks

CMSC 421: Neural Computation. Applications of Neural Networks

CS 4700: Foundations of Artificial Intelligence

The Perceptron. Volker Tresp Summer 2016

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Perceptron = FeedForward Neural Network

Transcription:

Multi-layer Perceptron Networks Our task is to tackle a class of problems that the simple perceptron cannot solve. Complex perceptron networks not only carry some important information for neuroscience, but are also useful in medical diagnostics, robotics, and the US Postal Service, where they decode (even handwritten) mailing addresses. We recall the binary output structure of the single-layer perceptron: o j f p nÿ w i x i q where f pθq i #, if θ ą 0. 0, if θ ď 0. and first ask whether the thresholding action of f pθq could be replaced with a more accomodating function. We then outfit the network with a hidden layer that allows it to solve more difficult classification problems. Sigmoidal Neurons Instead of returing a for input ą 0 and a 0 for input ď 0, a sigmoidal threshold returns non-binary output for inputs in a particular range. In the sigmoidal functions we will consider, the parameter β determines the steepness of the sigmoid, or in other words, the width of the range for which the output will lie somehwere between and 0. () Binary Output Sigmoidal Output = f (θ ) 0.5 σ (θ ) 0.5 = 2 = 0.5 0-5 0 5 θ 0-5 0 5 θ Figure : Comparison of binary and sigmoidal thresholding functions. Notice in the sigmoidal case that the range of inputs for which the output is not or 0 is smaller for larger β. σpθ, βq e βθ {p ` e βθ q (2) Is one example of a sigmoid, and produces the sigmoidal thresholding function illustrated above.

Hidden Layer We now add a hidden layer to our perceptron. Notice that the hidden neurons and final output neuron are sigmoidal and not binary. o Layer Input neuron n to output neuron 2 Figure 2: Two layer perceptron with (sigmoidal) hidden layer and sigmoidal output The output at each layer (hidden and output) is calculated by passing a weighted sum of inputs through the sigmoid function. Each hidden neuron sums its inputs multiplied by their weights (θ j ř n i w i,j x i for hidden neuron j), then returns σpθ j q as its output. The process is repeated at the next layer: the output neuron sums the contributions of the hidden neurons (outputs calculated at last step) multiplied by their weights, and passes that sum through the sigmoid. Next we ask how the weights are updated at each iteration of training. Recall that we are working under the umbrella of supervised learning, where desired outputs are known and we use the difference between current and desired outputs to update weight values. The process in a multilayer network is more difficult than in a single layer network. With a single layer, we could compare the actual output o to the desired output y, which is known. In this more complicated scenario, we can compare y to the guess of the final output neuron, but we cannot perform this comparison at the hidden layer because we do not know what the output of each hidden neuron should be. How do we handle the hidden neurons and layer weights? We employ the method of gradient descent. As this technique allows us to consider the effect of changing weights in layer on the output of layer 2, it is known in this context as back propagation. The mathematics of the derivation using gradient descent on the square error are omitted from the algorithm you will find on the following page. Inquire for details. 2

Step One: for input x rx...x n s, calculate output h j at each hidden layer neuron h j σp nÿ wi,jx i q where σpθq i Use β. ` e βθ (3) Step Two: for input h rh...h j s from hidden layer, calculate output of output neuron o σp ÿ w 2 j h j q, (4) j where σpθq is the same as above. Note that w 2 j indicates weights in layer 2, not squares. Step Three: calculate δ 2, which will be used to update layer 2 weights δ 2 po yqpoqp oq (5) because there is one output, δ 2 is a scalar. Step Four: calculate δ, which will be used to update layer weights δ j pδ 2 qpw 2 j h j p h j qq (6) because there are j hidden neuron outputs, δ is a j-vector. Step Five: calculate w 2 w 2 j lδ 2 h j (7) where l is the learning rate. Use l 0.. Step Six: calculate w w i,j lδ j x i (8) w i,j can be conveniently stored in matrix form. If δ is a j by vector and x is a by i vector, δ x will produce a matrix with j rows and i columns. Step Seven: update layer 2 weights Step Eight: update layer weights w 2 j w 2 j w 2 j (9) w i,j w i,j w i,j (0) 3

Project: Boolean Automator You will train a perceptron with a sigmoidal hidden layer to solve all 6 Boolean operations on two input arguments. With a single-layer perceptron, we were able to solve AND and OR problems, but not the nonlinearly separable XOR problem. Recall the structure of XOR: x x 2 y 0 0 0 0 0 0 Table : XOR with two inputs In their 969 book Perceptrons: An Introduction to Computational Geometry, Marvin Minsky and Seymour Papert prove that the addition of a hidden layer allows perceptrons to solve such problems. In fact, the multi-layer perceptron described here can solve all possible Boolean operations, with inputs x,2 and desired output y i for i...6: x x 2 y y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 y 0 y y 2 y 3 y 4 y 5 y 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Table 2: All Boolean operations on two inputs You will write two functions: booletron: a driver that runs multiperceptron on all 6 Boolean operations and checks its accuracy by comparing its output [guess] to the desired output [y]. guess = multiperceptron(x,y), where input matrix [x] remains the same for all 6 operations and [y] varies. multiperceptron trains the perceptron to perform the Boolean operation specified by [y] using the algorithm detailed previously and returns the perceptron s output after training. A network with two input neurons, a single hidden layer containing two neurons, and a single output neuron can solve the problem. However, you again need to add a bias bit in the form of another input neuron that always receives an input of. This can be done by adding another column of ones to the matrix x. You may want to keep track of the weights for layers and 2 in two separate matrices. You should initialize the weights using 4

W = W2 = -. Implementing the training algorithm for 0 5 iterations will suffice. Because we are using a non-binary thresholding function and a reasonable number of iterations to train, we do not expect the perceptron s guess to perfectly match desired (binary) output y. In application, outputs at a particular level of accuracy could be rounded to match y. To check the accuracy of multiperceptron use Mean Squared Error, via the function MSE. mse(guess-y) will do the trick. For each Boolean operator, display the perceptron s output, the desired output, and the Mean Squared Error. For example, for the XOR output, booletron could print: Perceptron output= 0.064756 0.94630 0.94387 0.068749 Desired output= 0 0 MSE = 0.0037385 Furthermore, for only the XOR problem, you will keep track of the perceptron s accuracy at each iteration of training. This can be achieved by calculating the absolute error of the perceptron s output (read: absolute value of the difference between the final output and desired output). This will be a scalar value: remember that the desired output changes based on the inputs at each iteration. After the final trial, create a figure like that below displaying the convergence of the perceptron s guess to the actual output over the training period. 0.7 Perceptron Absolute Error: XOR Error 0 0 00,000 Trial Figure 3: Absolute error of perceptron s output over XOR training 5

Your work will be graded as follows: 5 pts for header containing detailed usage of each function 5 pts for further comments in code 2 pts for indentation 8 pts for correct driver booletron 0 pts for display of correct outputs and errors 5 pts for correct multiperceptron function 5 pts for correct error figure for XOR 6