Single layer NN. Neuron Model

Similar documents
In the Name of God. Lecture 11: Single Layer Perceptrons

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Simple Neural Nets For Pattern Classification

Chapter 2 Single Layer Feedforward Networks

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Radial-Basis Function Networks

Radial-Basis Function Networks

The Perceptron Algorithm 1

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Neural Networks (Part 1) Goals for the lecture

3.4 Linear Least-Squares Filter

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Linear Discrimination Functions

Linear discriminant functions

Rosenblatt s Perceptron

Artificial Neural Network

ADALINE for Pattern Classification

Course 395: Machine Learning - Lectures

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Lecture 6. Notes on Linear Algebra. Perceptron

Intelligent Systems Discriminative Learning, Neural Networks

Linear & nonlinear classifiers

The Perceptron Algorithm

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC Neural Networks. Perceptron Learning Rule

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Classification with Perceptrons. Reading:

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Artificial Neural Networks The Introduction

Lecture 4: Perceptrons and Multilayer Perceptrons

Multilayer Perceptron

1 Machine Learning Concepts (16 points)

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Optimization and Gradient Descent

Artificial Neural Networks Examination, June 2005

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Vorlesung Neuronale Netze - Radiale-Basisfunktionen (RBF)-Netze -

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Midterm: CS 6375 Spring 2015 Solutions

Multilayer Perceptron = FeedForward Neural Network

Neural Networks and the Back-propagation Algorithm

Inf2b Learning and Data

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Revision: Neural Network

Neural Networks biological neuron artificial neuron 1

Linear & nonlinear classifiers

Lecture 5: Logistic Regression. Neural Networks

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

The Perceptron algorithm

Unit III. A Survey of Neural Network Model

Binary Classification / Perceptron

18.6 Regression and Classification with Linear Models

4. Multilayer Perceptrons

NN V: The generalized delta learning rule

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Linear Discriminant Functions

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Linear models and the perceptron algorithm

More about the Perceptron

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Artificial Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Data Mining Part 5. Prediction

Neural networks III: The delta learning rule with semilinear activation function

Pattern Classification

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Part 8: Neural Networks

Artificial Neural Networks Examination, March 2004

ECE521 Lectures 9 Fully Connected Neural Networks

GRADIENT DESCENT AND LOCAL MINIMA

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

A MULTILAYER EXTENSION OF THE SIMILARITY NEURAL NETWORK

Introduction: The Perceptron

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

9 Classification. 9.1 Linear Classifiers

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Mining Classification Knowledge

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Linear Classifiers. Michael Collins. January 18, 2012

Learning from Examples

Artificial Neural Networks

Introduction to Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

Learning with multiple models. Boosting.

The Perceptron. Volker Tresp Summer 2014

Neural Networks Lecture 2:Single Layer Classifiers

Transcription:

Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent among each other Each weight only affects one of the outputs Neural Networks NN Neuron Model The (McCulloch-Pitts) perceptron is a single layer NN with a non-linear ϕ, the sign function + if v 0 ϕ(v) = if v < 0 b (bias) x x w w v ϕ(v) y w n x n Neural Networks NN Rossella Cancelliere

Training How can we train a perceptron for a classification task? We try to find suitable values for the weights in such a way that the training examples are correctly classified. Geometrically, we try to find a hyperplane that separates the examples of the two classes. Neural Networks NN 3 Geometric View m i= The equation below describes a (hyper-)plane in the input space consisting of real valued m-dimensional vectors. The plane splits the input space into two regions, each of them describing one class. decision region for C w xi + w0 i = 0 decision boundary x w x + w x + w 0 >= 0 C C x w x + w x + w 0 = 0 Neural Networks NN 4 Rossella Cancelliere

Classification The perceptron is used for binary classification. Given training examples of classes C, C train the perceptron in such a way that it classifies correctly the training examples: If the output of the perceptron is + then the input is assigned to class C If the output is - then the input is assigned to C Neural Networks NN 5 The learning algorithm C a c ax + by + c = 0 y = x b b P(4,3) Example: C r : x y + = 0 y = x + x(n)=[-, 4, 3] w(n)=[,, -] w(n+)=x(n)+w(n) = [0, 5, ] y = 5 x T Training rule: w(n+)=w(n) if w(n) x(n) > 0 and x(n) belongs to class C T w(n+)=w(n) if w(n) x(n) < 0 and x(n) belongs to class C T w(n+)=w(n) -η(n) x(n) if w(n) x(n) > 0 and x(n) belongs to class C T w(n+)=w(n) + η(n) x(n) if w(n) x(n) < 0 and x(n) belongs to class C Neural Networks NN 6 Rossella Cancelliere 3

The learning algorithm n=; initialize w(n) randomly; while (there are misclassified training examples) Select a misclassified augmented example (x(n),d(n)) w(n+) = w(n) + ηd(n)x(n); n = n+; end-while; η = learning rate parameter (real number) Neural Networks NN 7 Convergence theorem (proof) Suppose the classes C, C are linearly separable (that is, there exists a hyper-plane that separates them). Then the perceptron algorithm applied to C C terminates successfully after a finite number of iterations. Proof: Consider the set C containing the inputs of C C transformed by replacing x with -x for each x with class label -. For simplicity assume w() = 0, η =. Let x() x(k) C be the sequence of inputs that have been used after k iterations. Then w() = w() + x() w(3) = w() + x() w(k+) = x() + + x(k) w(k+) = w(k) + x(k) Neural Networks NN 8 Rossella Cancelliere 4

Convergence theorem (proof) Since C and C are linearly separable then there exists w * such that w * T x > 0, x C. Let α = min w * T x Then w * T w(k+) = w * T x() + + w * T x(k) kα By the Cauchy-Schwarz inequality we get: w * w(k+) [w * T w(k+)] k α w(k+) (A) w * Neural Networks NN 9 Convergence theorem (proof) Now we consider another route: w(k+) = w(k) + x(k) w(k+) = w(k) + x(k) + w T (k)x(k) euclidean norm 443 0 because x(k) is misclassified w(k+) w(k) + x(k) =0 w() w() + x() w(3) w() + x() w(k+) k i= x(i) Neural Networks NN 0 Rossella Cancelliere 5

Convergence theorem (proof) Let β = max x(n) x(n) C w(k+) k β (B) For sufficiently large values of k: (B) becomes in conflict with (A). Then k cannot be greater than k max such that (A) and (B) are both satisfied with the equality sign. kmaxα w* = kmaxβ kmax = β w* α β w * The algorithm terminates successfully in at most iterations, i.e. α and w( k + ) w( k ) 0 lim w( k ) = k w( k max ) lim = k Neural Networks NN Example Consider the -dimensional training set C C, C = {(,), (, -), (0, -)} with class label C = {(-,-), (-,), (0,)} with class label - Train a perceptron on C C Neural Networks NN Rossella Cancelliere 6

A possible implementation Consider the augmented training set C C, with first entry fixed to (to deal with the bias as extra weight): (,, ), (,, -), (, 0, -),(,-, -), (,-, ), (,0, ) Replace x with -x for all x C and use the following update rule: T w( n) + ηx( n) if w ( n) x( n) 0 w( n + ) = w( n) otherwise Epoch = the application of the update rule to each example of the training set. Then terminate the execution of the learning algorithm if the weights do not change after one epoch. Neural Networks NN 3 Execution the execution of the perceptron learning algorithm for each epoch is illustrated below, with w()=(,0,0), η =, and transformed inputs (,, ), (,, -), (,0, -), (-,, ), (-,, -), (-,0, -) Adjusted pattern Weight applied w(n) x(n) Update? New weight (,, ) (, 0, 0) No (, 0, 0) (,, -) (, 0, 0) No (, 0, 0) (,0, -) (, 0, 0) No (, 0, 0) (-,, ) (, 0, 0) - Yes (0,, ) (-,, -) (0,, ) 0 Yes (-,, 0) (-,0, -) (-,, 0) No (-,, 0) End epoch Neural Networks NN 4 Rossella Cancelliere 7

Execution Adjusted pattern Weight applied w(n) x(n) Update? New weight (,, ) (-,, 0) No (-,, 0) (,, -) (-,, 0) No (-,, 0) (,0, -) (-,, 0) - Yes (0,, -) (-,, ) (0,, -) No (0,, -) (-,, -) (0,, -) 3 No (0,, -) (-,0, -) (0,, -) No (0,, -) End epoch At epoch 3 no weight changes. (check!) stop execution of algorithm. Final weight vect.: (0,, -) decision hyperplane is x -x = 0. Neural Networks NN 5 Result C x - - + Decision boundary: x -x = 0 - / x - + - + C Neural Networks NN 6 Rossella Cancelliere 8

Limitations The perceptron can only model linearly separable classes, like (those described by) the following Boolean functions: AND OR It cannot model the XOR! Neural Networks NN 7 Adaline: Adaptive Linear Element When the two classes are not linearly separable, it may be desirable to obtain a linear separator that minimizes the mean squared error. Adaline (Adaptive Linear Element): E uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithm For the k-th example ( k ) ( k ) x, d (w) = j= 0 m ( k ) ( k ) ( d y ) = d x ( k ) ( k ) ( k ) ( ) Neural Networks NN 8 j w j Total error: N E tot = E k = ( k ) Rossella Cancelliere 9

Adaline The total error E_tot is the sum of the squared errors of all the examples. E_tot is a quadratic function of the weights whose derivative exists everywhere. Then incremental gradient descent may be used to minimize E_tot. At each iteration LMS algorithm selects an example and decreases the network error E of that example. Neural Networks NN 9 Incremental Gradient Descent start from an arbitrary point in the weight space the direction in which the error E of an example (as a function of the weights) is decreasing most rapidly is the opposite of the gradient of E: ( k ( E ) (w)) = w m take a small step (of size η) in that direction Neural Networks NN 0 E E, K, w ( k w(k + ) = w(k) η( E ) (w)) Rossella Cancelliere 0

Weights Update Rule Computation of Gradient(E): N ( k ) E tot E (w ) = w w j ( k ) ( k ) E (w ) w j w(k k = j E = y + ) ( k ) ( k ) (w ) y w Delta rule for weight update: = w(k) ( k ) ( k ) ( d y ) x k j Neural Networks NN j = + ηe ( k ) ( k ) x LMS learning algorithm k=; initialize w(k) randomly; while (E_tot unsatisfactory and k<max_iterations) ( k ) ( k ) Select an example x,d e ( ) ( w(k) x ) ( k) ( k) T ( k) = d w( k + ) = w(k) + η e ( k ) ( ) x k k = k+; end-while; η = learning rate parameter (real number) Neural Networks NN Rossella Cancelliere

Comparison: and Adaline Adaline Architecture Single-layer Single-layer Neuron model Learning algorithm Application Non-linear Minimize number of misclassified examples Linear classification linear Minimize total error Linear classification regression Neural Networks NN 3 Rossella Cancelliere