Artificial Neural Networks Lecture Notes Part 3

Similar documents
Artificial Neural Networks Lecture Notes Part 2

COMP-4360 Machine Learning Neural Networks

Simple Neural Nets For Pattern Classification

Reification of Boolean Logic

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Neural Networks biological neuron artificial neuron 1

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Course 395: Machine Learning - Lectures

CS 4700: Foundations of Artificial Intelligence

Lecture 6. Notes on Linear Algebra. Perceptron

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

CS 4700: Foundations of Artificial Intelligence

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Introduction To Artificial Neural Networks

Artificial Neural Networks

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning. Neural Networks

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Artificial Neural Networks. Part 2

Neural Networks Introduction CIS 32

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

CSC Neural Networks. Perceptron Learning Rule

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Neural Networks: Introduction

The Perceptron. Volker Tresp Summer 2016

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks and the Back-propagation Algorithm

Classification with Perceptrons. Reading:

Input layer. Weight matrix [ ] Output layer

Artificial Intelligence

Artificial Neural Network

Neural networks. Chapter 20, Section 5 1

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

EEE 241: Linear Systems

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

COMS 4771 Introduction to Machine Learning. Nakul Verma

The Perceptron. Volker Tresp Summer 2014

Revision: Neural Network

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

CMSC 421: Neural Computation. Applications of Neural Networks

Neural networks. Chapter 20. Chapter 20 1

CSC173 Workshop: 13 Sept. Notes

Computability and Complexity

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Artificial Neural Networks Examination, March 2004

Linear models and the perceptron algorithm

The Perceptron Algorithm

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Feedforward Neural Nets and Backpropagation

The Perceptron algorithm

Applied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy. Richard Johansson

CSC321 Lecture 4 The Perceptron Algorithm

More about the Perceptron

Artificial Neural Networks

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

CSC242: Intro to AI. Lecture 21

Binary Classification / Perceptron

Artificial Neural Networks Examination, June 2005

Unit 8: Introduction to neural networks. Perceptrons

The Perceptron Algorithm 1

Introduction to Deep Learning

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Chapter 2 Single Layer Feedforward Networks

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Computational Intelligence Winter Term 2009/10

Linear Classifiers. Michael Collins. January 18, 2012

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Computational Intelligence

CSE 2001: Introduction to Theory of Computation Fall Suprakash Datta

Introduction to Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural Networks (Part 1) Goals for the lecture

Sections 18.6 and 18.7 Artificial Neural Networks

Artificial Neural Network

Finite-state machines (FSMs)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Neural Networks Lecture 2:Single Layer Classifiers

Decision, Computation and Language

Data Mining Part 5. Prediction

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Midterm: CS 6375 Spring 2015 Solutions

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Single layer NN. Neuron Model

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Parts 3-6 are EXAMPLES for cse634

Artificial Neural Networks

Linear Classifiers: Expressiveness

Multilayer Perceptron = FeedForward Neural Network

AI Programming CS F-20 Neural Networks

HOMEWORK 4: SVMS AND KERNELS

Transcription:

Artificial Neural Networks Lecture Notes Part 3 About this file: This is the printerfriendly version of the file "lecture03.htm". If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments: Background image is from http://www.anatomy.usyd.edu.au/online/neuroanatomy/tutorial1/tutorial1.html (edited) at the University of Sydney Neuroanatomy web page. Mathematics symbols images are from metamath.org's GIF images for Math Symbols web page. Other image credits are given where noted, the remainder are native to this file. Contents o Threshold Logic Units (Cont'd) Recurrent Networks The Mealy Model of Finite Automata o The Perceptron On McCullochPitts Units The Perceptron Limitations of the Perceptron Proposition 3.1.1 Proof By Contradiction Trainging TLU's: The Perceptron Rule Adjusting The Weight Vector The Perceptron Learning Rule The Perceptron Convergence Theorem Single Layer Nets NonLinearly Separable Classes Threshold Logic Units (Cont'd) o Recurrent Networks Previously, we spoke of feedforward netowrks. By Contrast, Recurrent Networks are those whose partial computations are recycled through the network itself. See. figure 2.21 in Rojas. Assumption: Computation of the activation of each unit consumes one time unit, t. In fig. 2.21, The network takes a string in binary as input, and turns any sequence '11' into '10'. That is, any two consecutive ones are transformed into the sequence 10. Page 1 of 9

For example: S = 00110110 R = 00100100 o The Mealy Model of Finite Automata Refer to fig. 2.23 in the textbook. The finite automaton may be formally described as a structure M t = <Q, S, R, f, g, q I> where, Q: is the set of finite states of the FA S R f g q I is the Input alphabet is the output alphabet is the nextstate function f: Q S Q is the output function g: Q S R is the Initial state of the machine (the state it is in, before receiving input) The Perceptron The machine M t acts as a timedelay machine, that outputs at time t+1 the input given to it at time t. We can write: x S * y R *. On McCullochPitts Units o McCullochPitts units are similar to logic gates. o They have no free parameters that can be adjusted. o Learning can only be implemented by modifying the network topology and thresholds, which is more complex than merely adjusting weights. The Perceptron Frank Rosenblatt developed the perceptron model in 1958, which features o Numeric weights, and o a special interconnection pattern. In the original model o Units are threshold elements, o Connectivity is determined stochastically. In the 1960's, Minsky & Papert refined and perfected this model. See figure 3.1 (Rojas, p. 56), The Classical Perceptron. Page 2 of 9

Limitations of the Perceptron Minsky & Papert studied the limitations of various models of the perceptron. Example: Diameter Limited Perceptrons: The receptive field of each predicate has a limited diameter. Proposition 3.1.1 (Rojas, p. 58) No diameter limited perceptron can decide whether a geometric figure is connected or not. Proof By Contradiction Diameters of receptive fields are limited hence no single receptive field contains points from both the left and the right ends of the patterns. Assume that we have three different groups of predicates. Threshold element performs the computation: S = Pi Group 1 w 1i P i + Pi Group 2 w 2i P i + Pi Group 3 w 3i P i >= 0. If S >= 0, then figure is recognized as connected. If S < 0, then figure is recognized as disconnected. o If A changed to B weights in group 2 must increase. o If A changed to C weights in group 1 must increase. However What if A changed to D? D is indistinguishable from C in group 1, and D is indistinguishable from B in group 2 D would be categorized as connected which it is not. Trainging TLU's: The Perceptron Rule o In order for a TLU to perform a given classification it must have the desired decision surface. o This is determined by the weight vector and the threshold. Page 3 of 9

o Hence, it is necessary to adjust the weights and threshold. This is usually done via an iterative procedure. o At each presentation, small changes are made to weights and thresholds. o This process is known as training the net, and the set of examples is the training set. o From the network's viewpoint, it undergoes a process of learning, or adapting to the training set, and the prescription for how to change the weights at each step is the learning rule. >= Condition for an output of 1. i.e. : >= 0 or: + ( ) >= 0 we may think of the threshold as an extra weight tied to an input set to 1. o Bias: The negative of the threshold. o Weight Vector: Originally of dimension n o Augmented Weight Vector: (n+1)dimensional vector, w 1, w 2, w 3,..., w n, o Thus, >= 0 y = 1 < 0 y = 0 o A twoinput TLU that outputs a "1" with input (1,1), and "0" for all other inputs: Suitable (nonaugmented) weight vector (1/2, 1/2) with threshold 3/4. The decision line goes through the points x 1 = (1/2, 1) and x 2 =(1, 1/2) Since w x 1 = w x 2 = 3/4 =. Page 4 of 9

Adjusting The Weight Vector o Training a TLU with augmented weight vector training set: (the input), t (the target). o Input vector presented to TLU, where the desired response, or target, is t = 1. o However, it produces an output y = 0). o TLU has thus misclassified and we must adjust the weights! o To produce a "0", the activation must have been negative when it should have been positive. the dot product was negative, and the two vectors were pointing away from each other. o Thus, we need to rotate so that it points more in the direction of. How do we do this? o We add a fraction of to (so as not to upset previous learning). o i.e., ' = +. ( * ) o suppose now that target t = 0, but y = 1 o This means the activation was positive when it should have been negative. o Thus, we need to rotate away from, which we can accomplish by substracting a fraction of from. o Thus, we have ' = ( ** ) o Combining ( * ) and ( ** ) we obtain ' = + (t y). ( *** ) or = ' = (t y) or, in terms of components, w i = (t y)v i, i = 1 to n+1, where w n+1 = and v n+1 = 1 (always.) o This is called the Perceptron Learning Rule because historically it was first used with perceptrons. Page 5 of 9

The Perceptron Learning Rule Begin Repeat For each training vector pair (V, t) Evaluate the output y when V is input to the TLU If y!= t Then Form a new weight vector W' according to (***) Else Do nothing End If End For Until y = t for all vectors End o The algorithm will generate a valid weight vector for the problem at hand, if one exists. The Perceptron Convergence Theorem o If two classes of vectors, X, Y are linearly separable, then application of the perceptron training algorithm will eventually result in a weight vector 0, such that 0 defines a TLU whose decision hyperplane separates X and Y. o This theorem was first proven by Rosenblatt in 1962. o Once 0 has been found, it remains stable and no further changes are made to the weights. o Returning to our previous example, initial weights are w 10 and w 20.4 initial threshold = 0.3. The learning rate =. Page 6 of 9

o HOMWORK: Write a program to implement the Perceptron Learning Rule and solve this problem: x 1 x 2 w 1 w 2 y t (ty) w 1 w 2??? 0 0 0.00 0.40 0.30 0.00 0 0 0.00 0 0 0 0 1 0.00 0.40 0.30 0.40 1 0 0 1 0 0.00 0.15 0.55 0.00 0 0 0.00 0 0 0 1 1 0.00 0.15 0.55 0.15 0 1 0 0 0.40 0.30 0.00 0 0 0.00 0 0 0 0 1 0.40 0.30 0.40 1 0 0 1 0 0.15 0.55 0 0 0.00 0 0 0 1 1 0.15 0.55 0.40 0 1 0 0 0.50 0.40 0.30 0.00 0 0 0.00 0 0 0 0 1 0.50 0.40 0.30 0.40 1 0 0 1 0 0.50 0.15 0.55 0.50 1 0 0 1 1 0.50 0.15 0.70 0.40 0 1 0 0 0.50 0.40 0.45 0.00 0 0 0.00 0 0 0 0 1 0.50 0.40 0.45 0.40 0 0 0.00 0 0 0 1 0 0.50 0.40 0.45 0.50 1 0 0 1 1 0.50 0.40 0.70 0.90 0 1 Single Layer Nets o We may use a single perceptron or TLU to classify two linearly separable classes A and B. o Patterns may have many inputs "cartoon" picture or schematic. Page 7 of 9

o It is possible to train multiple nodes on the input space to obtain a set of linearly separable dichotomies Example: Classifying handwritten alphabetic characters where 26 dichotomies are required, each one separating one letter class from the rest of the alphabet. e.g, A's from Not A's; B's from Not B's; etc. NonLinearly Separable Classes o Four classes, A, B, C and D separated by two planes in pattern space: Note: that this is a twodimensional schematic or representation of an ndimensional space. o Here, we could not use a single layer net! Class A, for example, is not linearly separable from the others taken together. o We can, however, "chop" the pattern space into linearly separable regions, and look for particular combinations of overlap within these regions. o Class of A and B (combined, AB) is linearly separable from C and D together (CD). So are AD and BC. Class y 1 AB 1 CD 0 Class y 2 AD 1 BC 0 Unit U 1 Unit U 2 o If a member of class A is input to each of U 1 and U 2, then y 1 = 1, y 2 = 1. Page 8 of 9

o Thus, y 1 = 1, y 2 = 1 iff input is in A. o We obtain a unique code for each of the boxes: y 1 y 2 Class 0 0 C 0 1 D 1 0 B 1 1 A o Note: The less information we have to supply ourselves, the more useful a network will be. o Here, we required two pieces of information: i) The four classes may be separated by two hyperplanes. ii) AB was linearly separable from CD and AD was linearly separable from BC. o We will need a new training algorithm to accomplish this. HOMEWORK: Using the perceptron learning rule, Find the weights required to perform the following classifications: 58. Vectors (1,1,1,1) and (0,1,0,0) are members of the class, and therefore have target value 1. 59. Vectors (1,1,1,0) and (1,0,0,1) are not members of the class, and have target value 0. Use a learning rate of 1, and starting weights of 0. Using each of the training input, test the responses of the net. vectors as Page 9 of 9