PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Similar documents
In the Name of God. Lecture 11: Single Layer Perceptrons

Single layer NN. Neuron Model

3.4 Linear Least-Squares Filter

Chapter 2 Single Layer Feedforward Networks

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Artificial Neural Network

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

ADALINE for Pattern Classification

Rosenblatt s Perceptron

Neural Networks and the Back-propagation Algorithm

Neural Networks (Part 1) Goals for the lecture

Simple Neural Nets For Pattern Classification

The Perceptron Algorithm 1

Artificial Neural Networks The Introduction

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

4. Multilayer Perceptrons

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

y(x n, w) t n 2. (1)

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

Artificial Neural Networks

Course 395: Machine Learning - Lectures

Linear discriminant functions

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Linear Discrimination Functions

Least Mean Squares Regression

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Radial-Basis Function Networks

Unit III. A Survey of Neural Network Model

Lab 5: 16 th April Exercises on Neural Networks

The Perceptron Algorithm

Radial-Basis Function Networks

NN V: The generalized delta learning rule

Lecture 4: Perceptrons and Multilayer Perceptrons

Artifical Neural Networks

Least Mean Squares Regression. Machine Learning Fall 2018

Classification with Perceptrons. Reading:

Inf2b Learning and Data

Revision: Neural Network

CSC Neural Networks. Perceptron Learning Rule

1 Machine Learning Concepts (16 points)

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Unit 8: Introduction to neural networks. Perceptrons

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Lecture 6. Notes on Linear Algebra. Perceptron

Hopfield Neural Network

Introduction to Neural Networks

Machine Learning Lecture 5

Artificial Neural Networks. Historical description

COMP-4360 Machine Learning Neural Networks

Physics 178/278 - David Kleinfeld - Winter 2017

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

The Perceptron. 4.1 Introduction. Organization of the Chapter

Neural Networks DWML, /25

Intelligent Systems Discriminative Learning, Neural Networks

V. Adaptive filtering Widrow-Hopf Learning Rule LMS and Adaline

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Machine Learning: The Perceptron. Lecture 06

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Artificial Neural Network : Training

Introduction to Deep Learning

Linear & nonlinear classifiers

More about the Perceptron

Introduction to Artificial Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural networks. Chapter 20. Chapter 20 1

Neural networks and support vector machines

Neural networks III: The delta learning rule with semilinear activation function

Linear Discriminant Functions

Midterm: CS 6375 Spring 2015 Solutions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural networks. Chapter 19, Sections 1 5 1

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Adaptive Filtering Part II

18.6 Regression and Classification with Linear Models

Neural Networks biological neuron artificial neuron 1

Pattern Classification

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks Lecture 2:Single Layer Classifiers

Part 8: Neural Networks

Reification of Boolean Logic

Static Neural Networks, Basic Classifiers & Classifier Ensembles

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

AI Programming CS F-20 Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Ch4: Perceptron Learning Rule

Reading Group on Deep Learning Session 1

GRADIENT DESCENT AND LOCAL MINIMA

Multilayer Perceptron = FeedForward Neural Network

Artificial Neural Networks

Multilayer Perceptron

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Transcription:

PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity

Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with just one neuron: M M Single Layer Perceptron 2

Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: x 1 w 1 x 2 w 2 w m b (bias) v y ϕ(v) x m ϕ is the sign function: ϕ(v) = +1 IF v >= 0-1 IF v < 0 Is the function sign(v) Single Layer Perceptron 3

Perceptron: Applications The perceptron is used for classification: classify correctly a set of examples into one of the two classes C 1, C 2 : If the output of the perceptron is +1 then the input is assigned to class C 1 If the output is -1 then the input is assigned to C 2 Single Layer Perceptron 4

Perceptron: Classification m i= 1 The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2 decision region for C1 x 2 w 1 x 1 + w 2 x 2 + b > 0 w x + i i b = 0 decision boundary C 1 decision x C 1 region for C 2 2 w 1 x 1 + w 2 x 2 + b <= 0 w 1 x 1 + w 2 x 2 + b = 0 Single Layer Perceptron 5

Perceptron: Limitations The perceptron can only model linearly separable functions. The perceptron can be used to model the following Boolean functions: AND OR COMPLEMENT But it cannot model the XOR. Why? Single Layer Perceptron 6

Perceptron: Limitations The XOR is not linear separable It is impossible to separate the classes C 1 and C 2 with only one line x 2 1 1 C 1-1 C 2 0-1 1 C 1 0 1 x 1 Single Layer Perceptron 7

Perceptron: Learning Algorithm Variables and parameters x(n) = input vector = [+1, x 1 (n), x 2 (n),, x m (n)] T w(n) = weight vector = [b(n), w 1 (n), w 2 (n),, w m (n)] T b(n) = bias y(n) = actual response d(n) = desired response η = learning rate parameter Single Layer Perceptron 8

The fixed-increment learning algorithm Initialization: set w(0) =0 Activation: activate perceptron by applying input example (vector x(n) and desired response d(n)) Compute actual response of perceptron: y(n) = sgn[w T (n)x(n)] Adapt weight vector: if d(n) and y(n) are different then w(n + 1) = w(n) + η[d(n)-y(n)]x(n) Where d(n) = +1 if x(n) C 1-1 if x(n) C 2 Continuation: increment time step n by 1 and go to Activation step Single Layer Perceptron 9

Example Consider a training set C 1 C 2, where: C 1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C 2 = {(-1,-1), (-1,1), (0,1)} elements of class -1 Use the perceptron learning algorithm to classify these examples. w(0) = [1, 0, 0] T η = 1

Example C 2 x 2 1 - - + Decision boundary: 2x 1 -x 2 = 0-1 1/2 1 x 1 - + -1 + C 1 Single Layer Perceptron 11

Convergence of the learning algorithm Suppose datasets C 1, C 2 are linearly separable. The perceptron convergence algorithm converges after n 0 iterations, with n 0 n max on training set C 1 C 2. Proof: suppose x C 1 output = 1 and x C 2 output = -1. For simplicity assume w(1) = 0, η = 1. Suppose perceptron incorrectly classifies x(1) x(n) C 1. Then w T (k) x(k) 0. Error correction rule: w(2) = w(1) + x(1) w(3) = w(2) + x(2) w(n+1) = x(1)+ + x(n) w(n+1) = w(n) + x(n).

Convergence theorem (proof) Let w 0 be such that w 0T x(n) > 0 x(n) C 1. w 0 exists because C 1 and C 2 are linearly separable. Let α = min w 0T x(n) x(n) C 1. Then w 0T w(n+1) = w 0T x(1) + + w 0T x(n) nα Cauchy-Schwarz inequality: w 0 2 w(n+1) 2 [w 0 T w(n+1)] 2 w(n+1) 2 n 2 α 2 (A) w 0 2 Single Layer Perceptron 13

Convergence theorem (proof) Now we consider another route: w(k+1) = w(k) + x(k) w(k+1) 2 = w(k) 2 + x(k) 2 + 2 w T (k)x(k) euclidean norm 14243 0 because x(k) is misclassified w(k+1) 2 w(k) 2 + x(k) 2 k=1,..,n =0 w(2) 2 w(1) 2 + x(1) 2 w(3) 2 w(2) 2 + x(2) 2 w(n+1) 2 n k = 1 x( k) 2 Single Layer Perceptron 14

convergence theorem (proof) Let β = max x(n) 2 x(n) C 1 w(n+1) 2 n β (B) For sufficiently large values of k: (B) becomes in conflict with (A). Then n cannot be greater than n max such that (A) and (B) are both satisfied with the equality sign. 2 2 n maxα 2 w 0 = n Perceptron convergence algorithm terminates in at most n max = β w 0 2 iterations. α 2 max β n max = w α 0 2 2 β Single Layer Perceptron 15

Adaline: Adaptive Linear Element The output y is a linear combination o x x 1 x 2 w 1 w 2 y x m w m y = m j= 0 x j (n)w j(n) Single Layer Perceptron 16

Adaline: Adaptive Linear Element Adaline: uses a linear neuron model and the Least-Mean- Square (LMS) learning algorithm The idea: try to minimize the square error, which is a function of the weights E(w(n)) = 1 2 e 2 (n) e(n) = d(n) m j= 0 x j (n)w j(n) We can find the minimum of the error function E by means of the Steepest descent method Single Layer Perceptron 17

Steepest Descent Method start with an arbitrary point find a direction in which E is decreasing most rapidly (gradient of E(w)) = [ ] E E, K, w 1 wm make a small step in that direction w(n + 1) = w(n) η(gradient of E(n)) Single Layer Perceptron 18

Least-Mean-Square algorithm (Widrow-Hoff algorithm) Approximation of gradient(e) E(w(n)) w(n) = e(n) e(n) w(n) = e(n)[ x(n) T ] Update rule for the weights becomes: w(n + 1) = w(n) + η x(n)e(n) Single Layer Perceptron 19

Summary of LMS algorithm Training sample: input signal vector desired response User selected parameter η >0 Initialization set ŵ(1) = 0 x(n) d(n) Computation for n = 1, 2, compute e(n) = d(n) - ŵ T (n)x(n) ŵ(n+1) = ŵ(n) + η x(n)e(n) Single Layer Perceptron 20