Multilayer Perceptron

Similar documents
y(x n, w) t n 2. (1)

Multilayer Perceptrons and Backpropagation

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks (Part 1) Goals for the lecture

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Unit III. A Survey of Neural Network Model

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Pattern Classification

Multilayer Neural Networks

Introduction to Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

Machine Learning

Lecture 5: Logistic Regression. Neural Networks

Neural Networks and the Back-propagation Algorithm

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks biological neuron artificial neuron 1

Multilayer Perceptron

Neural networks. Chapter 20. Chapter 20 1

Lecture 4: Perceptrons and Multilayer Perceptrons

Logistic Regression & Neural Networks

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Multilayer Perceptron = FeedForward Neural Network

Machine Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Revision: Neural Network

Intro to Neural Networks and Deep Learning

Computational statistics

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

4. Multilayer Perceptrons

Feedforward Neural Nets and Backpropagation

Classification with Perceptrons. Reading:

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Nonlinear Classification

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Data Mining Part 5. Prediction

Artificial Neural Networks. Edward Gatt

Mul7layer Perceptrons

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Reading Group on Deep Learning Session 1

Supervised Learning in Neural Networks

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Introduction to Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Chapter 2 Single Layer Feedforward Networks

Computational Intelligence Winter Term 2017/18

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Neural networks. Chapter 20, Section 5 1

Neural Networks: Basics. Darrell Whitley Colorado State University

COMP-4360 Machine Learning Neural Networks

Artificial Intelligence

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Multilayer Neural Networks

Artificial Neural Networks 2

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Artificial Neural Networks

18.6 Regression and Classification with Linear Models

Computational Intelligence

CSC321 Lecture 5: Multilayer Perceptrons

Stochastic gradient descent; Classification

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Artificial Neural Networks

The Multi-Layer Perceptron

Artificial Neural Networks. MGS Lecture 2

Machine Learning Lecture 10

Machine Learning. Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Intelligent Systems Discriminative Learning, Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

Artificial Neural Networks. Historical description

Advanced statistical methods for data analysis Lecture 2

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Multilayer Perceptrons (MLPs)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Nets Supervised learning

Gradient Descent Training Rule: The Details

Artifical Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Multi-layer Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Neural Networks and Deep Learning

Artificial Neural Networks

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Neural Networks and Deep Learning.

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Transcription:

Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl

Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training MLP Regularization in MLP 1

Aprendizagem Automática Perceptron 2

Perceptron Inspired on the neuron: BruceBlaus, CC-BY, source Wikipedia 3

Perceptron Action potential is "all or nothing" Chris 73, CC-BY, source Wikipedia 4

Perceptron Model neuron as linear combination of inputs and non-linear function Chrislb, CC-BY, source Wikipedia 5

Perceptron Orginal perceptron combined a linear combination of the inputs and a threshold function: 1, y = w j x j + w 0 s(y) = { j=1 Note: we can include in w as before. d w 0 0, y > 0 y 0 Training rule for the original perceptron w i = w i + Δw i Δ w i = η(t o) x i Adjust weights slightly to correct misclassified examples. Greater adjustment to those with larger inputs. Problem: not differentiable. But there are alternatives. 6

Perceptron To use in multiple layers, need continuous derivative (E.g. sigmoid) 1 s(y) = = 1 + e y 1 1 + e T x w 7

Perceptron Strictly speaking, perceptron is the original, single layer, network model by Frank Rosenblatt (1957), with threshold activation. However, the term is often also used for neurons with continuous threshold functions, such as in the the multilayer perceptron In this course, we will not worry about the name...

Aprendizagem Automática One Neuron 8

Single Neuron Can classify linearly separable sets, such as the OR function 9

Single Neuron Training a single neuron Minimize quadratic error between class and output 1 N E = ( t j s j ) 2 2 j=1 Like perceptron, present each example and adjust weights. E t 1 = ( 2 t j s j ) 2 10

Single Neuron Training a single neuron Gradient of the error wrt w: E t 1 δe j δw i = δe j δs j δs j δnet j δnet j δw i = ( t j s j δe j ) 2 = ( t j s j ) 2 δs j δs j s j 1 = = s j (1 s j ) 1 + e net j δnet j M ne = + = t j w 0 i=1 w i x i δnet j δw i x i Update rule ( η generally between 0.1 and 0.5): Δ w j δe j = η i = η( t j s j ) s j (1 s j ) x j i δw i 11

Single Neuron Intuitive explanation: Δ w j δe j = η i = η( t j s j ) s j (1 s j ) x j i δw i 12

Single Neuron Stochastic Gradient Descent Online learning: one step per example, in random order 13

Single Neuron Stochastic Gradient Descent Batch training: add Δw j i for batch, then update. 14

Single Neuron OR: error per epoch, classifier. 15

Single Neuron Canot classify non-linearly separable sets (e.g. XOR function) 16 17

Single Neuron XOR: error per epoch, classifier. 18

Aprendizagem Automática Multilayer Perceptron 19

Multilayer Perceptron We can solve the XOR problem with a hidden layer 20

Multilayer Perceptron Fully connected Feed-forward Input layer: x 1, x 2 Hidden layer: H 1, H 2 Output layer: (all sigmoidal) O 1 21

Multilayer Perceptron Training a Multilayer Perceptron Output neuron n of layer k receives input from m from layer i through weight j Same as single neuron but using output of previous instead of x Δw j mkn = η δe j kn δs j kn δs j kn δnet j kn δnet j kn δw mkn Compute δ = η( t j s j ) (1 ) = η kn sj s j kn kn sj δkns j im im for each neuron δ = ( t j s j ) (1 ) kn sj s j kn kn 22

Multilayer Perceptron For a weight m on hidden layer i, we must propagate the output error backwards from all neurons ahead Gradient of error w.r.t. weight of output neuron: δe j kn δs j kn δnet j kn δs j kn δnet j kn δw mkn Propagate back the errors of all forward neurons (and compute δ): Δw j min = η ( δs j δnet j δs j ) kp kp in p δe j kp δs j kp δnet j kp δs j in δnet j in δnet j in δw min = η( ) (1 ) = η p δkpw mkp s j in s j in xj i δinx j i 23

Multilayer Perceptron Intuitive explanation: Δw j min = η ( δs j δnet j δs j ) kp kp in p δe j kp δs j kp δnet j kp δs j in δnet j in δnet j in δw min = η( ) (1 ) = η p δkpw mkp s j in s j in xj i δinx j i 24

Multilayer Perceptron Backpropagation Algorithm Propagate the input forward through all layers s( x) = For output neurons compute δ k = s k (1 s k )(t s k ) Backpropagate errors to back layers to compute all 1 1 + e ( T x ) w δ = (1 ) δ i s i s i p δ pw pk Note: w pk are weights of "front" neurons connecting to neuron i Update weights (for forward layers, Δ w ki = η x is s of back layer) δix ki 25

Multilayer Perceptron XOR problem, one hidden layer 26

Multilayer Perceptron XOR: error per epoch, classifier. 27

Multilayer Perceptron Hidden layer maps inputs to a linearly separable representation. (Image shows output of hidden layer) 28

Multilayer Perceptron Autoassociator We can use the "reencoding" of the hidden layers for reducing the dimensions. Mitchell's binary encoder 8 inputs 3 hidden neurons 8 output neurons 10000000... 00000001 Training: Backpropagation, forcing output to be the same as input 29

Multilayer Perceptron Autoassociator 30

Multilayer Perceptron MLP, in practice Start with random values close to zero (sigmoidal function saturates rapidly away from zero) 31

Multilayer Perceptron MLP, in practice Start with random values close to zero (sigmoidal function saturates rapidly away from zero) Since weight initialization and order of examples is random, expect different runs to converge at different epochs 32

Multilayer Perceptron MLP, in practice Start with random values close to zero (sigmoidal function saturates rapidly away from zero) Since weight initialization and order of examples is random, expect different runs to converge at different epochs Standardize or normalize the inputs (keep scaling factors for classification) Since there is only one Also to avoid saturating the sigmoidal function η for all dimensions, all values should be on same scale The logistic function ouputs in [0,1], so use these class values For multiple classes, use multiple output neurons 33

Multilayer Perceptron MLP, in practice Stochastic Gradient Descent Present training examples in random orderr One pass through all examples is one epoch Evaluate the error, (t s) 2, for training or cross-validation Repeat until convergence or before overfitting (cross-validation) 34

Multilayer Perceptron Regularization with early stop E.g. 5-fold cross-validation, stop around 40 epochs 35

Multilayer Perceptron Regularization by weight decay Decrease weight values by a small fraction δe Δ w j = η λw j δw j Useless parameters will tend to zero 36

Aprendizagem Automática Summary 37

Multilayer Perceptron Summary Perceptron, classical and continuous output function Stochastic gradient descent (squared error) One layer vs several layers (multilayer perceptron) Training with backpropagation Regularization: early stop and weight decay Further reading Mitchell, Chapter 4 Alpaydin, Chapter 11 (note: linear output) Marsland, Chapter 3 (Bishop, Chapter 5) 38