Sections 18.6 and 18.7 Artificial Neural Networks

Similar documents
Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 20, Section 5 1

Artificial neural networks

Neural networks. Chapter 19, Sections 1 5 1

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural networks. Chapter 20. Chapter 20 1

CS:4420 Artificial Intelligence

Grundlagen der Künstlichen Intelligenz

18.6 Regression and Classification with Linear Models

Data Mining Part 5. Prediction

CSC242: Intro to AI. Lecture 21

Lecture 4: Feed Forward Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Supervised Learning Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Feedforward Neural Nets and Backpropagation

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Introduction To Artificial Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Unit 8: Introduction to neural networks. Perceptrons

Artificial Neural Networks

Introduction to Artificial Neural Networks

Learning and Neural Networks

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

CS 4700: Foundations of Artificial Intelligence

Machine Learning. Neural Networks

Ch.8 Neural Networks

Multilayer Perceptron

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Networks

Introduction to Neural Networks

AI Programming CS F-20 Neural Networks

Linear Regression, Neural Networks, etc.

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artificial Intelligence

Neural Networks biological neuron artificial neuron 1

Introduction Biologically Motivated Crude Model Backpropagation

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artifical Neural Networks

Artificial Neural Network

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks and the Back-propagation Algorithm

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Neural Networks and Deep Learning

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Revision: Neural Network

EEE 241: Linear Systems

Linear classification with logistic regression

CS 4700: Foundations of Artificial Intelligence

Lecture 7 Artificial neural networks: Supervised learning

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Input layer. Weight matrix [ ] Output layer

Learning from Examples

Numerical Learning Algorithms

Deep Feedforward Networks

Unit III. A Survey of Neural Network Model

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 5: Logistic Regression. Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks: Introduction

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Neural Networks (Part 1) Goals for the lecture

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

CMSC 421: Neural Computation. Applications of Neural Networks

Artificial Neural Networks. Historical description

y(x n, w) t n 2. (1)

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

CSC321 Lecture 5: Multilayer Perceptrons

Nonlinear Classification

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Machine Learning (CSE 446): Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Multilayer Perceptron = FeedForward Neural Network

Neural Networks Introduction

Course 395: Machine Learning - Lectures

Artificial Neural Networks

Introduction to feedforward neural networks

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Introduction to Machine Learning

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Linear discriminant functions

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Pattern Classification

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Transcription:

Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University

Outline The brain vs artifical neural networks Univariate regression Linear models Nonlinear models Linear classification Perceptron learning Single-layer perceptrons Multilayer perceptrons (MLPs) Back-propagation learning Applications of neural networks

Understanding the brain Because we do not understand the brain very well we are constantly tempted to use the latest technology as a model for trying to understand it. In my childhood we were always assured that the brain was a telephone switchboard. (What else could it be?) I was amused to see that Sherrington, the great British neuroscientist, thought that the brain worked like a telegraph system. Freud often compared the brain to hydraulic and electro-magnetic systems. Leibniz compared it to a mill, and I am told that some of the ancient Greeks thought the brain functions like a catapult. At present, obviously, the metaphor is the digital computer. John R. Searle (Prof. of Philosophy at UC, Berkeley)

Understanding the brain (cont d) The brain is a tissue. It is a complicated, intricately woven tissue, like nothing else we know of in the universe, but it is composed of cells, as any tissue is. They are, to be sure, highly specialized cells, but they function according to the laws that govern any other cells. Their electrical and chemical signals can be detected, recorded and interpreted and their chemicals can be identified, the connections that constitute the brains woven feltwork can be mapped. In short, the brain can be studied, just as the kidney can. David H. Hubel (1981 Nobel Prize Winner)

The human neuron 10 11 neurons of > 20 types, 1ms-10ms cycle time Signals are noisy spike trains of electrical potential

How do neurons work? The fibers of surrounding neurons emit chemicals (neurotransmitters) that move across the synapse and change the electrical potential of the cell body Sometimes the action across the synapse increases the potential, and sometimes it decreases it. If the potential reaches a certain threshold, an electrical pulse, or action potential, will travel down the axon, eventually reaching all the branches, causing them to release their neurotransmitters. And so on...

McCulloch-Pitts unit Output is a squashed ( linear function of the inputs ) a i g(in i ) = g j W j,ia j It is a gross oversimplification of real neurons, but its purpose is to develop an understanding of what networks of simple units can do

Univariate linear regression problem A univariate linear function is a straight line with input x and output y. The problem is to learn a univariate linear function given a set of data points. Given that the formula of the line is y = w 1 x + w 0, what needs to be learned are the weights w 0, w 1. Each possible line is called a hypothesis: h w = w 1 x + w 0

Univariate linear regression problem (cont d) There are an infinite number of lines that fit the data. The task of finding the line that best fits these data is called linear regression. Best is defined as minimizing loss or error. A commonly used loss function is the L 2 norm where Loss(h w ) = N j=1 L 2(y j, h w (x j )) = N j=1 (y j h w (x j )) 2 = N j=1 (y j (w 1 x j + w 0 )) 2.

Minimizing loss Try to find w = argmin w Loss(h w ). To mimimize N j=1 (y j (w 1 x j + w 0 )) 2, find the partial derivatives with respect to w 0 and w 1 and equate to zero. N w 0 j=1 (y j (w 1 x j + w 0 )) 2 = 0 N w 1 j=1 (y j (w 1 x j + w 0 )) 2 = 0 These equations have a unique solution: w 1 = N(P x j y j ) ( P x j )( P y j ) N( P xj 2) (P x j ) 2 ) w 0 = ( y j w 1 ( x j ))/N. Univariate linear regression is a solved problem.

Beyond linear models The equations for minimum loss no longer have a closed-form solution. Use a hill-climbing algorithm, gradient descent. The idea is to always move to a neighbor that is better. The algorithm is: w any point in the parameter space loop until convergence do for each w i in w do w i w i α w i Loss( w) α is called the step size or the learning rate.

Solving for the linear case w i Loss( w) = w i (y h w (x)) 2 = 2(y h w (x)) w i (y h w (x)) = 2(y h w (x)) w i (y (w 1 x + w 0 )) For w 0 and w 1 we get: w 0 Loss( w) = 2(y h w (x)) w 1 Loss( w) = 2(y h w (x)) x The learning rule becomes: w 0 w 0 + α j (y h w (x)) and w 1 w 1 + α j (y h w (x)) x

Batch gradient descent For N training examples, minimize the sum of the individual losses for each example: w 0 w 0 + α j (y j h w (x j )) and w 1 w 1 + α j (y j h w (x j )) x j Convergence to the unique global minimum is guaranteed as long as a small enough α is picked. The summations require going through all the training data at every step, and there may be many steps Using stochastic gradient descent only a single training point is considered at a time, but convergence is not guaranteed for a fixed learning rate α.

Linear classifiers with a hard threshold The plots show two seismic data parameters, body wave magnitude x 1 and surface wave magnitute x 2. Nuclear explosions are shown as black circles. Earthquakes (not nuclear explosions) are shown as white circles. In graph (a), the line separates the positive and negative examples. The equation of the line is: x 2 = 1.7x 1 4.9 or 4.9 + 1.7x 1 x 2 = 0

Classification hypothesis The classification hypothesis is: h w = 1 if w. x 0 and 0 otherwise It can be thought of passing the linear function w. x through a threshold function. Mimimizing Loss depends on taking the gradient of the threshold function The gradient for the step function is zero almost everywhere and undefined elsewhere!

Perceptron learning Output is a squashed linear function of the inputs ( ) a i g(in i ) = g j W j,ia j A simple weight update rule that is guaranteed to converge for linearly separable data: w i w i + α(y h w ( x)) x i where, y is the true value, and h w ( x) is the hypothesis output.

Perceptron learning rule w i w i + α(y h w ( x)) x i If the output is correct, i.e., y = h w ( x), then the weights are not changed. If the output is lower than it should be, i.e, y is 1 but h w ( x) is 0, then w i is increased when the corresponding input x i is positive and decreased when the corresponding input x i is negative. If the output is higher than it should be, i.e, y is 0 but h w ( x) is 1, then w i is decreased when the corresponding input x i is positive and increased when the corresponding input x i is negative.

Perceptron learning procedure Start with a random assignment to the weights Feed the input, let the perceptron compute the answer If the answer is correct, do nothing If the answer is not correct, update the weights by adding or subtracting the input vector (scaled down by α) Iterate over all the input vectors, repeating as necessary, until the perceptron learns

Expressiveness of perceptrons Consider a perceptron where g is the step function (Rosenblatt, 1957, 1960) It can represent AND, OR, NOT, but not XOR (Minsky & Papert, 1969) A perceptron represents a linear separator in input space: j W jx j > 0 or W x > 0

Multilayer perceptrons (MLPs) Remember that a single perceptron will not converge if the inputs are not linearly separable. In that case, use a multilayer perceptron. The numbers of hidden units are typically chosen by hand.

Activation functions (a) is a step function or threshold function (b) is a sigmoid function 1/(1 + e x )

Feed-forward example Feed-forward network: parameterized family of nonlinear functions Output of unit 5 is a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) = g(w 3,5 g(w 1,3 a 1 +W 2,3 a 2 )+W 4,5 g(w 1,4 a 1 +W 2,4 a 2 )) Adjusting the weights changes the function: do learning this way!

Single-layer perceptrons Output units all operate separately no shared weights Adjusting the weights moves the location, orientation, and steepness of cliff

Expressiveness of MLPs All continuous functions with 2 layers, all functions with 3 layers Ridge: Combine two opposite-facing threshold functions Bump: Combine two perpendicular ridges Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units

Back-propagation learning Output layer: similar to a single-layer perceptron w i,j w i,j + α a i j where j = Err j g (in j ) Hidden layer: back-propagate the error from the output layer: i = g (in i ) j w i,j j The update rule for weights in hidden layer is the same: w i,j w i,j + α a i j (Most neuroscientists deny that back-propagation occurs in the brain)

Handwritten digit recognition 3-nearest-neighbor classifier (stored images) = 2.4% error Shape matching based on computer vision = 0.63% error 400-300-10 unit MLP = 1.6% error LeNet 768-192-30-10 unit MLP = 0.9% error Boosted neural network = 0.7% error Support vector machine = 1.1% error Current best: virtual support vector machine = 0.56% error Humans 0.2% error

MLP learners MLPs are quite good for complex pattern recognition tasks The resulting hypotheses cannot be understood easily Typical problems: parameters to decide, slow convergence, local minima

Summary Brains have lots of neurons; each neuron perceptron (?) None of the neural network models distinguish humans from dogs from dolphins from flatworms. Whatever distinguishes higher cognitive capacities (language, reasoning) may not be apparent at this level of analysis. Actually, real neurons fire all the time; what changes is the rate of firing, from a few to a few hundred impulses a second. Neurally inspired computing rather than brain science. Perceptrons (one-layer networks) are used for linearly separable data. Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation. Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modelling, and neural system modelling subfields have largely diverged

Sources for the slides AIMA textbook (3 rd edition) AIMA slides: http://aima.cs.berkeley.edu/ Neuron cell: http://www.enchantedlearning.com/subjects/anatomy/brain/neuron.shtml (Accessed December 10, 2011) Robert Wilensky s CS188 slides http://www.cs.berkeley.edu/ wilensky/cs188 (Accessed prior to 2009)