Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Similar documents
Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 19, Sections 1 5 1

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural networks. Chapter 20. Chapter 20 1

Artificial neural networks

CS:4420 Artificial Intelligence

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Supervised Learning Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

CS 4700: Foundations of Artificial Intelligence

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Learning and Neural Networks

CS 4700: Foundations of Artificial Intelligence

Introduction To Artificial Neural Networks

Ch.8 Neural Networks

Data Mining Part 5. Prediction

Artificial Neural Networks

AI Programming CS F-20 Neural Networks

Machine Learning. Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Feedforward Neural Nets and Backpropagation

EEE 241: Linear Systems

Artificial Neural Networks

Artifical Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Course 395: Machine Learning - Lectures

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks biological neuron artificial neuron 1

Introduction to Artificial Neural Networks

Introduction to Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Lecture 4: Feed Forward Neural Networks

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Unit 8: Introduction to neural networks. Perceptrons

Lecture 5: Logistic Regression. Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks

CSC242: Intro to AI. Lecture 21

Artificial Neural Networks. Historical description

Neural Networks: Introduction

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Lecture 7 Artificial neural networks: Supervised learning

Multilayer Perceptron

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Artificial Neural Network

Linear classification with logistic regression

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Simple Neural Nets For Pattern Classification

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Revision: Neural Network

Multilayer Perceptrons (MLPs)

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Unit III. A Survey of Neural Network Model

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks and Deep Learning

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks The Introduction

Fundamentals of Neural Networks

18.6 Regression and Classification with Linear Models

Linear Regression, Neural Networks, etc.

Artificial Neural Networks. Edward Gatt

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning (CSE 446): Neural Networks

Neural Networks and the Back-propagation Algorithm

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

In the Name of God. Lecture 9: ANN Architectures

COMP-4360 Machine Learning Neural Networks

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Intelligent Systems Discriminative Learning, Neural Networks

Artificial Neural Network and Fuzzy Logic

Learning from Examples

Artificial Neural Network

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

y(x n, w) t n 2. (1)

Artificial Intelligence

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Multilayer Neural Networks

Neural Networks (Part 1) Goals for the lecture

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Linear discriminant functions

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Introduction to Machine Learning

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Neural Networks Introduction

Lab 5: 16 th April Exercises on Neural Networks

Numerical Learning Algorithms

Transcription:

Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2

Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks 2/ 2

Brains 0 neurons of > 20 types, 0 4 synapses, ms 0ms cycle time Signals are noisy spike trains of electrical potential Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma 3/ 2

McCulloch Pitts unit Output is a squashed linear function of the inputs: a i g(in i ) = g (Σ j W j,i a j ) a 0 = a i = g(in i ) Wj,i a j Bias Weight W 0,i Σ in i g a i Input Links Input Function Activation Function Output Output Links A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do 4/ 2

Activation functions g(in i ) g(in i ) + + (a) in i (b) in i (a) is a step function or threshold function (b) is a sigmoid function /(+e x ) Changing the bias weight W 0,i moves the threshold location 5/ 2

Implementing logical functions W 0 =.5 W 0 = 0.5 W 0 = 0.5 W = W 2 = W = W 2 = W = AND OR NOT McCulloch and Pitts: every Boolean function can be implemented 6/ 2

Network structures Feed-forward networks: single-layer perceptrons multi-layer perceptrons Feed-forward networks implement functions, have no internal state Recurrent networks: Hopfield networks have symmetric weights (Wi,j = W j,i ) g(x)=sign(x), a i =± ; holographic associative memory Boltzmann machines use stochastic activation functions, MCMC in Bayes nets recurrent neural nets have directed cycles with delays = have internal state (like flip-flops), can oscillate etc. 7/ 2

Feed-forward example W,3 W,4 3 W 3,5 5 2 W 2,3 W 2,4 4 W 4,5 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) = g(w 3,5 g(w,3 a + W 2,3 a 2 )+W 4,5 g(w,4 a + W 2,4 a 2 )) Adjusting weights changes the function: do learning this way! 8/ 2

Single-layer perceptrons Input Units Perceptron output 0.8 0.6 0.4 0.2 0-4 -2 Output 0 x 2 W 4-4 -2 0 j,i Units 2 4 x 2 Output units all operate separately no shared weights Adjusting weights moves the location, orientation, and steepness of cliff 9/ 2

Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 957, 960) Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space: Σ j W j x j > 0 or W x>0 x x 0 0 x? 0 x 2 0 x 2 0 0 x 2 (a) x and x 2 (b) x or x2 (c) x xor x2 Minsky & Papert (969) pricked the neural network balloon 0 / 2

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 2 Err2 2 (y h W (x))2, Perform optimization search by gradient descent: ( E = Err Err = Err y W j W j W j g(σ n ) j = 0 W jx j ) = Err g (in) x j Simple weight update rule: W j W j +α Err g (in) x j E.g., +ve error = increase network output = increase weights on +ve inputs, decrease on -ve inputs / 2

Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Training set size - MAJORITY on inputs Training set size - RESTAURANT data Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it 2 / 2

Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k 3 / 2

Expressiveness of MLPs All continuous functions w/ 2 layers, all functions w/ 3 layers h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2 4-4 -2 0 2 4 x 2 h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2 4-4 -2 0 2 4 x 2 Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (cf DTL proof) 4 / 2

Back-propagation learning Output layer: same as for single-layer perceptron, W j,i W j,i +α a j i where i =Err i g (in i ) Hidden layer: back-propagate the error from the output layer: j = g (in j ) i W j,i i. Update rule for weights in hidden layer: W k,j W k,j +α a k j. (Most neuroscientists deny that back-propagation occurs in the brain) 5 / 2

Back-propagation derivation The squared error on a single example is defined as E = (y i a i ) 2, 2 i where the sum is over the nodes in the output layer. E W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i = (y i a i )g (in i )a j = a j i W j,i j W j,i a j 6 / 2

Back-propagation derivation contd. E W k,j = i = i (y i a i ) a i W k,j = i (y i a i )g (in i ) in i W k,j = i (y i a i ) g(in i) W k,j i W k,j j W j,i a j = i a j i W j,i = W k,j i i W j,i g(in j ) W k,j = i = i = i i W j,i g (in j ) in j W k,j i W j,i g (in j ) W k,j i W j,i g (in j )a k = a k j ( ) W k,j a k k 7 / 2

Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply Training curve for 00 restaurant examples: finds exact fit 4 Total error on training set 2 0 8 6 4 2 0 0 50 00 50 200 250 300 350 400 Number of epochs Typical problems: slow convergence, local minima 8 / 2

Back-propagation learning contd. Learning curve for MLP with 4 hidden units: Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Decision tree Multilayer network 0 0 20 30 40 50 60 70 80 90 00 Training set size - RESTAURANT data MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily 9 / 2

Handwritten digit recognition 3-nearest-neighbor = 2.4% error 400 300 0 unit MLP =.6% error LeNet: 768 92 30 0 unit MLP = 0.9% error Current best (kernel machines, vision algorithms) 0.6% error 20 / 2

Summary Most brains have lots of neurons; each neuron linear threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modelling, and neural system modelling subfields have largely diverged 2 / 2