CE213 Artificial Intelligence Lecture 14

Similar documents
Simple Neural Nets For Pattern Classification

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Lecture 7 Artificial neural networks: Supervised learning

Perceptron. (c) Marcin Sydow. Summary. Perceptron

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

CE213 Artificial Intelligence Lecture 13

ECE521 Lectures 9 Fully Connected Neural Networks

Artifical Neural Networks

Classification with Perceptrons. Reading:

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Least Mean Squares Regression

Introduction to Neural Networks

Feedforward Neural Nets and Backpropagation

Linear Classifiers. Michael Collins. January 18, 2012

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Practicals 5 : Perceptron

Artificial Neural Network : Training

18.6 Regression and Classification with Linear Models

In the Name of God. Lecture 11: Single Layer Perceptrons

Introduction to Artificial Neural Networks

Multilayer Perceptrons and Backpropagation

CSC321 Lecture 4: Learning a Classifier

Artificial Intelligence

ADALINE for Pattern Classification

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Machine Learning Linear Models

Unit III. A Survey of Neural Network Model

CSC242: Intro to AI. Lecture 21

Data Mining Part 5. Prediction

CSC321 Lecture 4: Learning a Classifier

Artificial Neural Networks. Historical description

Least Mean Squares Regression. Machine Learning Fall 2018

Part 8: Neural Networks

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

Supervised Learning. George Konidaris

An artificial neural networks (ANNs) model is a functional abstraction of the

Chapter 2 Single Layer Feedforward Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks The Introduction

Neural Networks (Part 1) Goals for the lecture

Lecture 4: Perceptrons and Multilayer Perceptrons

Computational statistics

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

CSC321 Lecture 5: Multilayer Perceptrons

Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm

Lecture 4: Feed Forward Neural Networks

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Notes on Back Propagation in 4 Lines

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

The Perceptron Algorithm 1

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Security Analytics. Topic 6: Perceptron and Support Vector Machine

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Hopfield Neural Network

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mining Classification Knowledge

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Lecture 6. Notes on Linear Algebra. Perceptron

Linear discriminant functions

Neural Networks Introduction

Course 395: Machine Learning - Lectures

Linear & nonlinear classifiers

Neural Networks and the Back-propagation Algorithm

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Networks Examination, June 2004

Introduction To Artificial Neural Networks

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Artificial Neural Networks Examination, June 2005

Machine Learning. Neural Networks

Single layer NN. Neuron Model

CS 730/730W/830: Intro AI

CSC Neural Networks. Perceptron Learning Rule

Mining Classification Knowledge

Artificial Neural Networks

Neural Networks: Introduction

Unit 8: Introduction to neural networks. Perceptrons

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Statistical Machine Learning from Data

Machine Learning Practice Page 2 of 2 10/28/13

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

COMP304 Introduction to Neural Networks based on slides by:

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Linear Regression. S. Sumitra

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

The Perceptron algorithm

Artificial neural networks

Nonlinear Classification

Statistical NLP for the Web

CSC321 Lecture 4 The Perceptron Algorithm

Linear & nonlinear classifiers

Multilayer Neural Networks

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Final Examination CS 540-2: Introduction to Artificial Intelligence

Transcription:

CE213 Artificial Intelligence Lecture 14 Neural Networks: Part 2 Learning Rules -Hebb Rule - Perceptron Rule -Delta Rule Neural Networks Using Linear Units [ Difficulty warning: equations! ] 1

Learning in Neural Networks Consider a linear separation unit (LSU) or MP neuron first: x 1 w 1 x 2 x 3 w 2 w 3 θ y where w n x n θ =1 <θ =0 Given a labelled sample dataset, how to determine the values of w i and θ so that the LSU will classify or fit the sample data correctly? 1) Analytical way(usually when n 3) 2) Learning 2

Learning in Neural Networks (2) The function of an MP neuron can be constructed by appropriate choice of weights and threshold, i.e., w 1 x 1 + w 2 x 2 = θ. Could we train a neuron or neural network to achieve the desired weights by presenting it with a sequence of examples/samples? What information is available from samples and an MP neuron? The current input: The current weights: The actual output : The desired output: x 1 x n w 1 w n y z 3

Learning in Neural Networks (3) A General Solution to Learning in Neural Networks: ( +1)= ( )+ (,,, ) where w i (t) is the weight of the ithinput at time t is a function that determines the change in weight. A learning procedure would simply be: x 2 learnt weights initial weights Initialisation of weight values; REPEAT Present next example/sample; Adjust all weights using ; UNTIL Training complete; (There could be different completion criteria.) https://www.youtube.com/watch?v=vgwemzhplsa x 1 4

Learning in Neural Networks (4) What about setting the threshold? Set threshold to zero and add another input x 0 with weight. Suppose x 0 is always 1. 0 =1 x 1 x 2 x 3 w 1 w 2 w 3 x 0 =1 0 w 0 y w n < 0 =0 x n It is equivalent to set θ= 5

Learning in Neural Networks (5) So the fixed input is essentially an adaptive threshold, because =1 < =0 ( is equivalent to threshold θ) In this way, we can arrange for the threshold to be modified in the same way as the weights. Now let s focus on how to determine (,,, ). 6

The Hebb Rule Hebb(1949) proposed that learning could take the form of strengthening the connections between any pair of neurons that were simultaneously active. This implies a learning rule of the form (x i may be the output of another neuron) ( +1)= ( )+ where αis a constant that determines the rate of learning. Later researchers modified this for training MP neurons. During learning, the output of the neuron (y) is forced to the required/desired value (z). Hence the learning rule becomes This is now known as the Hebb Rule. ( +1)= ( )+ Is this supervised learning? 7

The Perceptron Rule The Perceptron Rule (late 1950s) introduced an important new idea: More specifically: IF IT AIN T BROKE, DON T FIX IT. Only adjust the weights when the neuron would have given the wrong answer. Thus we have the notion of a rule only used to correct errors. The Perceptron Weight Adjustment Rule: ( +1)= ( )+ only if which is, of course, the Hebb rule restricted to error correction. 8

The Delta Rule The delta rule generalises the idea of error correction introduced in the perceptron rule. Rather than having a rule that is only applied if an error occurs, this approach includes the error as part of the rule: ( +1)= ( )+ ( ) This is the delta rule (proposed in 1960, also known as the Widrow- Hoff rule) Why is it called the delta rule? Because the difference or error (z - y) is often written as δ. 9

The Delta Rule (2) Function Approximation Using the Delta Rule Let s consider the task of training a function approximator. If we take away the threshold we have a linear sum unit: = The inputs and outputs could be real numbers, not just 1s and 0s. Such a unit clearly computes a hyperplane in n-dimensional space. 10

The Delta Rule (3) How might this unit behave if trained? Suppose training data can be described z= f(x 1,,x n ). Then the weights will adapt to values such that the unit provides a linear approximation to the function f. Thus neural units and networks can be used for function approximation as well as for classification. The two are closely related: Function approximation is a surface in n-dimensional space. This surface separates two classes of points in n-dimensional space. For function approximation, it can be shown that the delta rule is an optimal strategy for changing the weights. (Strict proof is beyond the scope of CE213) 11

Limitations of the Delta Rule Two major limitations It can only learn linear functions or linear separation for classification. It is equivalent to linear regression. Standard techniques for calculating regressions are orders of magnitude faster can avoid the need to choose an appropriate learning rate parameter α. Delta rule can start training neurons as soon as data is available. Choosing the learning rate parameter α If αis too small, learning will be very slow. If αis too large, the final weight values will cycle around the optimum values. 12

More Than One Output (Network) A Single Layer Linear Neural Network (consisting of linear units): y 1 y 2 y m x 1 x 2 x 3 x n The behaviour of this network is most conveniently expressed by = where is the input vector is the output vector is the weight matrix (w ij, i=1,2,,m; j=1,2,,n) 13

More Than One Output (2) The weight of the jthinput to the ithneuron is element [i,j] of the weight matrix W. So the delta rule for learning now takes the form where +1 = + =( ) T (T transpose) or w ij ( t + 1) = wij ( t) + α x j( zi yi ) 14

More Than One Output (3) How does such a multiple output, single layer network behave? Clearly it can be viewed as m single neurons operating in parallel: There are no connections that could allow the behaviour of one neuron to affect another. A single unit finds an approximation to = ( ) The multiple output network simply extends that to the case where the function has vector rather than scalar values. = ( ) 15

Multiple Layer Network Using Linear Units Let s try adding another layer of linear units (not linear separation units): h This is known as a feed-forward neural networkbecause signals flow in one direction (forward). 16

Multiple Layer Network Using Linear Units (2) Now let us see what it does: h = (lower layer) = h (higher layer Therefore = ( )=( ) But the product of two matrices is another matrix. So let = Hence = A multilayer linear neural network can be equivalently implemented by a single layer neural network. 17

Multiple Layer Network Using Linear Units (3) Conclusion There is no point building a multilayer feedforwardnetwork if the units perform linear transformations of their inputs. Therefore But The only way to build multilayer neural networks that are more powerful than single layer networks is to use units (neurons) with non-linear output functions. The delta rule is designed to find the best weights for linear units only. Need of new learning rule/algorithm! 18

Summary Learning in Neural Networks Using the Delta Rule The Hebb Rule The Perceptron Rule The Delta Rule Limitations of the Delta Rule Multilayer Networks Using Linear Units 19