Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Similar documents
Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Artificial neural networks

CS:4420 Artificial Intelligence

CMSC 421: Neural Computation. Applications of Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Supervised Learning Artificial Neural Networks

Learning and Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Introduction To Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Ch.8 Neural Networks

CS 4700: Foundations of Artificial Intelligence

Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Data Mining Part 5. Prediction

AI Programming CS F-20 Neural Networks

CS 4700: Foundations of Artificial Intelligence

Introduction to Artificial Neural Networks

Simple Neural Nets For Pattern Classification

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Artifical Neural Networks

Artificial Neural Networks The Introduction

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Introduction Biologically Motivated Crude Model Backpropagation

Machine Learning. Neural Networks

Neural Networks biological neuron artificial neuron 1

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Course 395: Machine Learning - Lectures

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Lecture 4: Feed Forward Neural Networks

CSC242: Intro to AI. Lecture 21

Feedforward Neural Nets and Backpropagation

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Multilayer Perceptron

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

EEE 241: Linear Systems

Lecture 7 Artificial neural networks: Supervised learning

Unit III. A Survey of Neural Network Model

Artificial Neural Network and Fuzzy Logic

Introduction to Neural Networks

Neural Networks (Part 1) Goals for the lecture

Artificial Neural Networks. Historical description

Unit 8: Introduction to neural networks. Perceptrons

Artificial Neural Networks

COMP-4360 Machine Learning Neural Networks

Artificial Neural Network

Linear classification with logistic regression

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

y(x n, w) t n 2. (1)

In the Name of God. Lecture 9: ANN Architectures

Revision: Neural Network

Neural Networks: Introduction

Multilayer Perceptrons (MLPs)

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Lecture 5: Logistic Regression. Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Multilayer Neural Networks

Fundamentals of Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

CSC321 Lecture 5: Multilayer Perceptrons

Introduction to Machine Learning

Neural Networks Introduction

Artificial Intelligence

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

In the Name of God. Lecture 11: Single Layer Perceptrons

Neural Networks and Deep Learning

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial Neural Networks

4. Multilayer Perceptrons

Multilayer Neural Networks

18.6 Regression and Classification with Linear Models

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Intelligent Systems Discriminative Learning, Neural Networks

Linear discriminant functions

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Artificial Neural Networks

Artificial Neural Networks. Part 2

Learning from Examples

Transcription:

Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau

Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2

Example Applications pattern classification virus/explosive detection, financial predictions, etc. image processing control character recognition, manufacturing inspection, etc. autonomous vehicles, industrial processes, etc. optimization bionics VLSI layout, scheduling, etc. prostheses, brain implants, etc. brain/cognitive models memory, learning, disorders, etc. 3

Nature-Inspired Computation natural system formal models, theories applications biology, interdisciplinary computer science, physics, etc. engineering Examples neural networks genetic programming swarm intelligence self-replicating machines... Inspiration for neural networks? 4

Brains 0 neurons of > 20 types, 0 4 synapses, ms 0ms cycle time Signals are noisy spike trains of electrical potential Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma 5

Brains as inspiration for neural networks 6

Computers vs. Brains Computers Brains information access: global local control: centralized decentralized processing method: sequential massively parallel how programmed: computer programmer self-organizing adaptability: minimally prominently 7

McCulloch Pitts unit Simple mathematical model inspired by neurons Much simpler than what neurons actually do Output of unit i depends on a weighted sum of the inputs: a i = g(in i ) = g(σ j W j,i a j ) Bias Weight a 0 = a i = g(in i ) W 0,i a j Wj,i Σ in i g a i Input Links Input Function Activation Function Output Output Links 8

Activation functions g(in i ) g(in i ) + + (a) in i (b) in i (a) is a step function or threshold function g(in i ) = if in i > 0; g(in i ) = 0 otherwise (b) is a sigmoid function /( + e x ) Changing the bias weight W 0,i moves the threshold location 9

Implementing logical functions Use step function as activation function W 0 =.5 W 0 = 0.5 W 0 = 0.5 W = W 2 = W = W 2 = W = AND OR NOT McCulloch and Pitts: every Boolean function can be implemented as a collection of units 0

Network structures Feed-forward networks: single-layer perceptrons multi-layer perceptrons Implement functions, have no internal state Recurrent networks: Hopfield networks have symmetric weights (Wi, j = W j,i ) g(x)=sign(x), a i = ± ; holographic associative memory Boltzmann machines use stochastic activation functions recurrent neural nets have directed cycles with delays have internal state (like flip-flops), can oscillate etc.

Feed-forward example W,3 3 W,4 W 3,5 5 2 W 2,3 W 2,4 4 W 4,5 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g(w 3,5 a 3 +W 4,5 a 4 ) = g(w 3,5 g(w,3 a +W 2,3 a 2 ) +W 4,5 g(w,4 a +W 2,4 a 2 )) Adjusting weights changes the function do learning this way 2

Perceptrons Input Units Perceptron output 0.8 0.6 0.4 0.2 0-4 -2 Output 0 x 2 W 4-4 -2 0 j,i Units 2 4 x 2 Output units all operate separately no shared weights Adjusting weights moves the location, orientation, and steepness of cliff 3

Perceptron learning x0 x xn W0 W Wn in = Wi xi = W X a = g(in) Learn by adjusting weights to reduce error on training set Squared error for an example with input x and true output y: E = 2 Err2 2 (y a)2 Optimization search by gradient descent: E W j = 2 Err2 = Err Err = Err ( y g(σ n j W j W j W =0W j x j ) ) j = Err g (in) x j Simple weight update rule: W j W j + α Err g (in) x j E.g., positive error increase network output increase weights on pos. inputs, decrease on neg. inputs 4

Perceptron learning, continued Perceptron learning rule converges to a consistent function for any linearly separable data set Perceptron easily learns majority function Decision-tree learning doesn t majority is representable, but only as a very large decision tree DTL won t learn that without a very large data set Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Training set size - MAJORITY on inpu 5

Perceptron learning, continued Perceptron learning rule converges to a consistent function for any linearly separable data set Decision-tree learning easily learns restaurant function Perceptron can t Not linearly separable perceptron can t represent it Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Perceptron Decision tree 0 0 20 30 40 50 60 70 80 90 00 Training set size - RESTAURANT data 6

Expressiveness of perceptrons Rosenblatt perceptron algorithm 957?? developed physical device, described as key to the future of mankind Minsky & Papert (969) showed it was impossible Consider a perceptron with g = step function (Rosenblatt, 957, 960) Represents linear separator in input space: Σ j W j x j > 0 or W x > 0?x 0? x x 0 x 2? 0?0 x 2 0 0 x 2 (a) x and x 2 (b) x or x2 (c) x xor x2 Can represent AND, OR, NOT, majority, etc., but not XOR Dark ages for neural network research: 970 985 7

Multilayer feed-forward networks Renaissance for neural network research: 985-995 Multilayer feed-forward network layers usually fully connected numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k 8

Expressiveness of MLPs h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2-4 -2 0 4 2 4 x 2 h W (x, x 2 ) 0.8 0.6 0.4 0.2 0-4 -2 0 x 2-4 -2 0 4 2 4 x 2 With 2 layers, can express all continuous functions With 3 layers, all functions Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (like in the proof for decision-tree learning) 9

Back-propagation learning j aj Wji Wjk k i ak ai Output layer: same as for single-layer perceptron, where i =Err i g (in i ) W j,i W j,i + α a j i Hidden layer: back-propagate the error from the output layer: j = g (in j ) W j,i i. i Update rule for weights in hidden layer: W k, j W k, j + α a k j. (Most neuroscientists deny that back-propagation occurs in the brain) 20

Back-propagation derivation Like perceptron learning, back-propagation is optimization search by gradient descent The squared error on a single example is defined as E = 2 i (y i a i ) 2, where the sum is over the nodes in the output layer E W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i = (y i a i )g (in i )a j = a j i W j,i ( ) W j,i a j j 2

Back-propagation derivation, continued E W k, j = (y i a i ) a i i W k, j = = (y i a i )g (in i ) in i i a j = i W j,i i W k, j = = i W j,i g (in j ) in j i W k, j = i = i i W j,i g (in j ) i W k, j = W k, j i ( i W j,i g (in j )a k = a k j (y i a i ) g(in i) i W k, j i W k, j i W j,i g(in j ) W k, j ) W k, j a k k ( ) W j,i a j j 22

Back-propagation learning, continued At each epoch, sum gradient updates for all examples and apply Training curve for 00 restaurant examples: finds exact fit 4 Total error on training set 2 0 8 6 4 2 0 0 50 00 50 200 250 300 350 400 Number of epochs Typical problems: slow convergence, local minima 23

Back-propagation learning, continued Learning curve for multi-layer perceptron with 4 hidden units: Proportion correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 Decision tree Multilayer network 0 0 20 30 40 50 60 70 80 90 00 Training set size - RESTAURANT data Multi-layer perceptrons are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily 24

Computational Example W 2 = 2 a 2 y 2 0 a 0 W 0 = ai = output of unit i a W 3 =2 Wi j = weight of link from unit i to unit j For output units, j = g (in j )(y j a j ). For hidden units, j = g (in j )Σ i w ji i. 3 a 3 y 3 Suppose g(in i ) = in i for every unit i. Then g (in i ) =. (Note: bad idea, one wouldn t use such a g in practice!) Problem. For what values of 2 and 3 will = 0? 0 = = g (in )(W 2 2 +W 3 3 ) = W 2 2 +W 3 3 = 2 + 2 3 i.e., 2 = 2 3. 25

Computational Example W 2 = 2 a 2 y 2 = 0 a 0 = W 0 = a W 3 =2 3 a 3 y 3 = Problem 2. α = 0.; just one training example: a 0 =,y 2 =,y 3 =. First epoch: a = g(in ) = W 0 a 0 = = a 2 = g(in 2 ) = W 2 a = = a 3 = g(in 3 ) = W 3 a = 2 = 2 2 = y 2 a 2 = = 0 3 = y 3 a 3 = 2 = = 2 + 2 3 = 0 + 2( ) = 2 W 2 W 2 + αa 2 = + 0. 0 = W 3 W 3 + αa 3 = 2 + 0. ( ) =.9 W 0 W 0 + αa 0 = + 0. ( 2) = 0.8 26

Computational Example W 2 = 2 a 2 y 2 = 0 a 0 = W 0 =0.8 a W 3 =.9 3 a 3 y 3 = Second epoch: a = g(in ) = W 0 a 0 = 0.8 = 0.8 a 2 = g(in 2 ) = W 2 a = 0.8 = 0.8 a 3 = g(in 3 ) = W 3 a =.9 0.8 =.52 2 = y 2 a 2 = 0.8 = 0.2 3 = y 3 a 3 =.52 = 0.52 = W 2 2 +W 3 3 = 0.2 +.9 ( 0.52) = 0.2 0.988 = 0.788 W 2 W 2 + αa 2 = + 0. 0.8 0.2 =.06 W 3 W 3 + αa 3 =.9 + 0. 0.8 ( 0.52) =.8584 W 0 W 0 + αa 0 = 0.8 + 0. ( 0.788) = 0.722. 27

Summary Most brains have lots of neurons; each neuron linear threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modeling, and neural system modeling subfields have largely diverged 28