Neural networks. Chapter 20. Chapter 20 1

Similar documents
Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial neural networks

CS:4420 Artificial Intelligence

Grundlagen der Künstlichen Intelligenz

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Supervised Learning Artificial Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Learning and Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Data Mining Part 5. Prediction

Ch.8 Neural Networks

Artificial Neural Networks

y(x n, w) t n 2. (1)

Artifical Neural Networks

Introduction to Artificial Neural Networks

Machine Learning. Neural Networks

AI Programming CS F-20 Neural Networks

CSC242: Intro to AI. Lecture 21

Introduction To Artificial Neural Networks

CS 4700: Foundations of Artificial Intelligence

Machine Learning (CSE 446): Neural Networks

Multilayer Perceptrons (MLPs)

Lecture 5: Logistic Regression. Neural Networks

Introduction to Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Multilayer Perceptron

Lecture 7 Artificial neural networks: Supervised learning

Artificial Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks

Unit III. A Survey of Neural Network Model

4. Multilayer Perceptrons

Lecture 4: Feed Forward Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Linear classification with logistic regression

18.6 Regression and Classification with Linear Models

Feedforward Neural Nets and Backpropagation

CS 4700: Foundations of Artificial Intelligence

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks and the Back-propagation Algorithm

Neural Networks and Deep Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Neural Networks

Revision: Neural Network

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Course 395: Machine Learning - Lectures

Neural Networks biological neuron artificial neuron 1

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Artificial Neural Network

Lab 5: 16 th April Exercises on Neural Networks

Multilayer Neural Networks

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Artificial Neural Networks

Artificial Neural Network and Fuzzy Logic

Pattern Classification

Linear Regression, Neural Networks, etc.

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Networks. Edward Gatt

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Artificial Neural Networks. Historical description

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Multi-layer Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Unit 8: Introduction to neural networks. Perceptrons

Artificial Neural Network

Learning from Examples

Neural Nets Supervised learning

EEE 241: Linear Systems

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Artificial Intelligence

Neural Networks. ICS 273A UC Irvine Instructor: Max Welling

CSC321 Lecture 5: Multilayer Perceptrons

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks. MGS Lecture 2

Deep Feedforward Networks. Sargur N. Srihari

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Linear discriminant functions

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Transcription:

Neural networks Chapter 20 Chapter 20 1

Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2

Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms 10ms cycle time Signals are noisy spike trains of electrical potential Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma Chapter 20 3

McCulloch Pitts unit Output is a squashed linear function of the inputs: a i g(in i ) = g ( Σ j W j,i a j ) a 0 = 1 a i = g(in i ) Wj,i a j Bias Weight W 0,i Σ in i g a i Input Links Input Function Activation Function Output Output Links Chapter 20 4

Activation functions g(in i ) g(in i ) +1 +1 (a) in i (b) in i (a) is a step function or threshold function (b) is a sigmoid function 1/(1 + e x ) Changing the bias weight W 0,i moves the threshold location Chapter 20 5

Implementing logical functions McCulloch and Pitts: every Boolean function can be implemented (with large enough network) AND? OR? NOT? MAJORITY? Chapter 20 6

Implementing logical functions McCulloch and Pitts: every Boolean function can be implemented (with large enough network) W 0 = 1.5 W 0 = 0.5 W 0 = 0.5 W 1 = 1 W 2 = 1 W 1 = 1 W 2 = 1 W 1 = 1 AND OR NOT Chapter 20 7

Feed-forward networks: single-layer perceptrons multi-layer networks Network structures Feed-forward networks implement functions, have no internal state Recurrent networks: Hopfield networks have symmetric weights (W i,j = W j,i ) g(x) = sign(x), a i = ± 1; holographic associative memory Boltzmann machines use stochastic activation functions, MCMC in BNs recurrent neural nets have directed cycles with delays have internal state (like flip-flops), can oscillate etc. Chapter 20 8

Feed-forward example 1 W 1,3 W 1,4 3 W 3,5 5 2 W 2,3 W 2,4 4 W 4,5 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) = g(w 3,5 g(w 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g(w 1,4 a 1 + W 2,4 a 2 )) Chapter 20 9

Perceptrons Input Units Perceptron output 1 0.8 0.6 0.4 0.2 0-4 -2 Output 0 x 2 W 4-4 -2 0 1 j,i Units 2 4 x 2 Chapter 20 10

Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) Can represent AND, OR, NOT, majority, etc. Represents a linear separator in input space: Σ j W j x j > 0 or W x > 0 I 1 I 1 I 1 1 1 1? 0 0 0 0 1 I 2 I 2 0 1 0 1 I 2 I 1 I 2 I 1 I 2 (a) and (b) or (c) I 1 xor I 2 Chapter 20 11

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 1 2 (y h W(x)) 2 Chapter 20 12

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 1 2 (y h W(x)) 2 Perform optimization search by gradient descent: W j =? Chapter 20 13

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 1 2 (y h W(x)) 2 Perform optimization search by gradient descent: = Err rr = Err W j W j W j ( y g(σ n j = 0W j x j ) ) Chapter 20 14

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 1 2 (y h W(x)) 2 Perform optimization search by gradient descent: = Err rr = Err W j W j = Err g (in) x j W j ( y g(σ n j = 0W j x j ) ) Chapter 20 15

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 1 2 (y h W(x)) 2 Perform optimization search by gradient descent: = Err rr = Err W j W j = Err g (in) x j Simple weight update rule: W j W j W j + α Err g (in) x j ( y g(σ n j = 0W j x j ) ) E.g., +ve error increase network output increase weights on +ve inputs, decrease on -ve inputs Chapter 20 16

Perceptron learning W = random initial values for iter = 1 to T for i = 1 to N (all examples) x = input for example i y = output for example i W old = W Err = y g(w old x) for j = 1 to M (all weights) W j = W j + α Err g (W old x) x j Chapter 20 17

Perceptron learning contd. Derivative of sigmoid g(x) can be written in simple form: 1 g(x) = 1 + e x g (x) =? Chapter 20 18

Perceptron learning contd. Derivative of sigmoid g(x) can be written in simple form: g(x) = g (x) = Also, So g(x) = 1 1 + e x e x (1 + e x ) = 2 e x g(x) 2 1 1 + e g(x) + x e x g(x) = 1 e x = 1 g(x) g(x) g (x) = 1 g(x) g(x) 2 g(x) = (1 g(x))g(x) Chapter 20 19

Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set 1 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 10 20 30 40 50 60 70 80 90 100 Proportion correct on test set 1 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 10 20 30 40 50 60 70 80 90 100 Training set size - MAJORITY on 11 inputs Training set size - RESTAURANT data Chapter 20 20

Multilayer networks Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k Chapter 20 21

Expressiveness of MLPs All continuous functions w/ 1 hidden layer, all functions w/ 2 hidden layers h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 1 2-4 -2 0 4 2 4 x 2 h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 1 2-4 -2 0 4 2 4 x 2 Chapter 20 22

In general have n output nodes, E 1 2 i Err2 i, Training a MLP where Err i = (y i a i ) and i runs over all nodes in the output layer. Need to calculate W ij for any W ij. Chapter 20 23

Can approximate derivatives by: Training a MLP cont. f (x) W ij (W) f(x + h) f(x) h E(W + (0,...,h,...,0)) E(W) h What would this entail for a network with n weights? Chapter 20 24

Can approximate derivatives by: Training a MLP cont. f (x) W ij (W) f(x + h) f(x) h E(W + (0,...,h,...,0)) E(W) h What would this entail for a network with n weights? - one iteration would take O(n 2 ) time Complicated networks have tens of thousands of weights, O(n 2 ) time is intractable. Back-propagation is a recursive method of calculating all of these derivatives in O(n) time. Chapter 20 25

In general have n output nodes, E 1 2 i Err2 i, Back-propagation learning where Err i = (y i a i ) and i runs over all nodes in the output layer. Output layer: same as for single-layer perceptron, W j,i W j,i + α a j i where i =Err i g (in i ) Hidden layers: back-propagate the error from the output layer: j = g (in j ) W j,i i. i Update rule for weights in hidden layers: W k,j W k,j + α a k j. Chapter 20 26

Back-propagation derivation For a node i in the output layer: W j,i = (y i a i ) a i W j,i Chapter 20 27

Back-propagation derivation For a node i in the output layer: W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i Chapter 20 28

Back-propagation derivation For a node i in the output layer: W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i W j,i Chapter 20 29

Back-propagation derivation For a node i in the output layer: W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i W j,i k W k,i a j Chapter 20 30

Back-propagation derivation For a node i in the output layer: W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i = (y i a i )g (in i )a j = a j i where i = (y i a i )g (in i ) W j,i k W k,i a j Chapter 20 31

Back-propagation derivation: hidden layer For a node j in a hidden layer: W k,j =? Chapter 20 32

Reminder : Chain rule for partial derivatives For f(x, y), with f differentiable wrt x and y, and x and y differentiable wrt u and v: f u = f x x u + f y y u and f v = f x x v + f y y v Chapter 20 33

Back-propagation derivation: hidden layer For a node j in a hidden layer: W k,j = W k,j E(a j1, a j2,...,a jm ) where {j i } are the indices of the nodes in the same layer as node j. Chapter 20 34

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i a i a i W k,j where i runs over all other nodes i in the same layer as node j. Chapter 20 35

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i = a j a j W k,j a i a i since W k,j a i W k,j = 0 for i j Chapter 20 36

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i = a j a j W k,j = a j g (in j )a k a i a i since W k,j a i W k,j = 0 for i j Chapter 20 37

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i a j =? = a j a j W k,j = a j g (in j )a k a i a i since W k,j a i W k,j = 0 for i j Chapter 20 38

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i a j = = a j a j W k,j = a j g (in j )a k a i a i since a j E(a k1, a k2,...,a km ) W k,j a i W k,j = 0 for i j where {k i } are the indices of the nodes in the layer after node j. Chapter 20 39

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i a j = k = a j a j W k,j = a j g (in j )a k a k a k a j a i a i since W k,j a i W k,j = 0 for i j where k runs over all nodes k that node j connects to. Chapter 20 40

Back-propagation derivation: hidden layer For a node j in a hidden layer: = a j + W k,j a j W k,j i = a j a j W k,j = a j g (in j )a k a i a i since W k,j a i W k,j = 0 for i j a j = k = k a k a k a j g (in k )W j,k a k Chapter 20 41

Back-propagation derivation: hidden layer If we define then j g (in j ) k W j,k k W k,j = j a k Chapter 20 42

Back-propagation pseudocode for iter = 1 to T for e = 1 to N (all examples) x = input for example e y = output for example e run x forward through network, computing all {a i }, {in i } for all nodes i (in reverse order) compute i = for all weights W j,i W j,i = W j,i + α a j i (y i a i ) g (in i ) if i is output node g (in i ) k W i,k k o.w. Chapter 20 43

Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply Restaurant data: 14 Total error on training set 12 10 8 6 4 2 0 0 50 100 150 200 250 300 350 400 Number of epochs Usual problems with slow convergence, local minima Chapter 20 44

Restaurant data: Back-propagation learning contd. 1 0.9 % correct on test set 0.8 0.7 0.6 0.5 0.4 Multilayer network Decision tree 0 10 20 30 40 50 60 70 80 90 100 Training set size Chapter 20 45

Handwritten digit recognition 3-nearest-neighbor = 2.4% error 400 300 10 unit MLP = 1.6% error LeNet: 768 192 30 10 unit MLP = 0.9% error Chapter 20 46

Summary Most brains have lots of neurons; each neuron linear threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, credit cards, etc. Chapter 20 47