Neural networks. Chapter 20, Section 5 1

Similar documents
Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial neural networks

CS:4420 Artificial Intelligence

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Supervised Learning Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

CS 4700: Foundations of Artificial Intelligence

Learning and Neural Networks

CS 4700: Foundations of Artificial Intelligence

Ch.8 Neural Networks

Introduction To Artificial Neural Networks

Artificial Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Machine Learning. Neural Networks

Data Mining Part 5. Prediction

AI Programming CS F-20 Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Lecture 5: Logistic Regression. Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Feedforward Neural Nets and Backpropagation

Lecture 4: Feed Forward Neural Networks

Introduction to Neural Networks

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Course 395: Machine Learning - Lectures

Lecture 4: Perceptrons and Multilayer Perceptrons

EEE 241: Linear Systems

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Artificial Neural Networks

Artifical Neural Networks

Introduction to Artificial Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

Neural Networks biological neuron artificial neuron 1

Linear classification with logistic regression

Multilayer Perceptron

CSC242: Intro to AI. Lecture 21

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

CSC321 Lecture 5: Multilayer Perceptrons

Simple Neural Nets For Pattern Classification

Lecture 7 Artificial neural networks: Supervised learning

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Machine Learning (CSE 446): Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural Networks and Deep Learning

18.6 Regression and Classification with Linear Models

Multilayer Perceptrons (MLPs)

Multilayer Neural Networks

Artificial Neural Networks. Historical description

Artificial Neural Network

Unit III. A Survey of Neural Network Model

Linear Regression, Neural Networks, etc.

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Neural Networks: Introduction

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks and the Back-propagation Algorithm

Multilayer Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Revision: Neural Network

Intelligent Systems Discriminative Learning, Neural Networks

Artificial Neural Networks. Edward Gatt

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Artificial Neural Network

Fundamentals of Neural Networks

y(x n, w) t n 2. (1)

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Artificial Neural Networks The Introduction

4. Multilayer Perceptrons

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

COMP-4360 Machine Learning Neural Networks

Artificial Neural Network and Fuzzy Logic

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Linear discriminant functions

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Introduction to Machine Learning

In the Name of God. Lecture 9: ANN Architectures

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Numerical Learning Algorithms

Deep Learning Lecture 2

Learning from Examples

How to do backpropagation in a brain

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural-networks. Outline. Neural networks. Brains. Brains. Perceptrons Multilayer perceptrons Applications of neural networks. Chapter 20, Section 5 3

Transcription:

Neural networks Chapter 20, Section 5 Chapter 20, Section 5

Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2

Brains 0 neurons of > 20 types, 0 4 synapses, ms 0ms cycle time Signals are noisy spike trains of electrical potential Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma Chapter 20, Section 5 3

McCulloch Pitts unit Output is a squashed linear function of the inputs: a i g(in i ) = g ( Σ j W j,i a j ) a 0 = a i = g(in i ) Wj,i a j Bias Weight W 0,i Σ in i g a i Input Links Input Function Activation Function Output Output Links A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do Chapter 20, Section 5 4

Activation functions g(in i ) g(in i ) + + (a) in i (b) in i (a) is a step function or threshold function (b) is a sigmoid function /( + e x ) Changing the bias weight W 0,i moves the threshold location Chapter 20, Section 5 5

Implementing logical functions W 0 =.5 W 0 = 0.5 W 0 = 0.5 W = W 2 = W = W 2 = W = AND OR NOT McCulloch and Pitts: every Boolean function can be implemented Chapter 20, Section 5 6

Feed-forward networks: single-layer perceptrons multi-layer perceptrons Network structures Feed-forward networks implement functions, have no internal state Recurrent networks: Hopfield networks have symmetric weights (W i,j = W j,i ) g(x) = sign(x), a i = ± ; holographic associative memory Boltzmann machines use stochastic activation functions, MCMC in Bayes nets recurrent neural nets have directed cycles with delays have internal state (like flip-flops), can oscillate etc. Chapter 20, Section 5 7

Feed-forward example W,3 W,4 3 W 3,5 5 2 W 2,3 W 2,4 4 W 4,5 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) = g(w 3,5 g(w,3 a + W 2,3 a 2 ) + W 4,5 g(w,4 a + W 2,4 a 2 )) Adjusting weights changes the function: do learning this way! Chapter 20, Section 5 8

Single-layer perceptrons Input Units Perceptron output 0.8 0.6 0.4 0.2 0-4 -2 Output 0 x 2 W 4-4 -2 0 j,i Units 2 4 x 2 Output units all operate separately no shared weights Adjusting weights moves the location, orientation, and steepness of cliff Chapter 20, Section 5 9

Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 957, 960) Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space: Σ j W j x j > 0 or W x > 0 x x x? 0 0 x 2 0 0 x 2 0 0 x 2 (a) x and x 2 (b) x or x2 (c) x xor x2 Minsky & Papert (969) pricked the neural network balloon Chapter 20, Section 5 0

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 2 Err 2 2 (y h W(x)) 2, Perform optimization search by gradient descent: E W j = Err Err = Err W j W j = Err g (in) x j Simple weight update rule: W j W j + α Err g (in) x j ( y g(σ n j = 0W j x j ) ) E.g., +ve error increase network output increase weights on +ve inputs, decrease on -ve inputs Chapter 20, Section 5

Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Training set size - MAJORITY on inputs Training set size - RESTAURANT data Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it Chapter 20, Section 5 2

Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k Chapter 20, Section 5 3

Expressiveness of MLPs All continuous functions w/ 2 layers, all functions w/ 3 layers h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2 4-4 -2 0 2 4 x 2 h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2 4-4 -2 0 2 4 x 2 Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (cf DTL proof) Chapter 20, Section 5 4

Back-propagation learning Output layer: same as for single-layer perceptron, W j,i W j,i + α a j i where i = Err i g (in i ) Hidden layer: back-propagate the error from the output layer: j = g (in j ) W j,i i. i Update rule for weights in hidden layer: W k,j W k,j + α a k j. (Most neuroscientists deny that back-propagation occurs in the brain) Chapter 20, Section 5 5

Back-propagation derivation The squared error on a single example is defined as E = 2 i (y i a i ) 2, where the sum is over the nodes in the output layer. E W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i = (y i a i )g (in i )a j = a j i W j,i j W j,ia j Chapter 20, Section 5 6

Back-propagation derivation contd. E W k,j = i (y i a i ) a i W k,j = i (y i a i ) g(in i) = (y i a i )g (in i ) in i = i W i k,j i W k,j W k,j = i iw j,i a j W k,j = i iw j,i g(in j ) W k,j = i iw j,i g (in j ) in j = i iw j,i g (in j ) W k,j W k,j k W k,j a k j W j,ia j = i iw j,i g (in j )a k = a k j Chapter 20, Section 5 7

Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply Training curve for 00 restaurant examples: finds exact fit 4 Total error on training set 2 0 8 6 4 2 0 0 50 00 50 200 250 300 350 400 Number of epochs Typical problems: slow convergence, local minima Chapter 20, Section 5 8

Back-propagation learning contd. Learning curve for MLP with 4 hidden units: Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Decision tree Multilayer network 0 0 20 30 40 50 60 70 80 90 00 Training set size - RESTAURANT data MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily Chapter 20, Section 5 9

Handwritten digit recognition 3-nearest-neighbor = 2.4% error 400 300 0 unit MLP =.6% error LeNet: 768 92 30 0 unit MLP = 0.9% error Current best (kernel machines, vision algorithms) 0.6% error Chapter 20, Section 5 20

Summary Most brains have lots of neurons; each neuron linear threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modelling, and neural system modelling subfields have largely diverged Chapter 20, Section 5 2