Neural Networks: Basics. Darrell Whitley Colorado State University

Similar documents
Neural Networks (Part 1) Goals for the lecture

4. Multilayer Perceptrons

Neural networks. Chapter 19, Sections 1 5 1

Lecture 5: Logistic Regression. Neural Networks

Artificial Neural Networks

Neural networks. Chapter 20. Chapter 20 1

Deep Feedforward Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

AI Programming CS F-20 Neural Networks

Backpropagation Neural Net

Lecture 7 Artificial neural networks: Supervised learning

Introduction to Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Multilayer Perceptron

EPL442: Computational

CMSC 421: Neural Computation. Applications of Neural Networks

Multilayer Perceptron

Unit III. A Survey of Neural Network Model

Feedforward Neural Nets and Backpropagation

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Neural networks. Chapter 20, Section 5 1

Neural Nets Supervised learning

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, June 2005

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

EEE 241: Linear Systems

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Chapter 2 Single Layer Feedforward Networks

Feed-forward Network Functions

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Machine Learning

Artificial Neural Networks

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Deep Feedforward Networks

Neural Networks biological neuron artificial neuron 1

Unit 8: Introduction to neural networks. Perceptrons

Networks of McCulloch-Pitts Neurons

Data Mining Part 5. Prediction

Neural Networks DWML, /25

Artifical Neural Networks

Artificial Intelligence

Supervised Learning in Neural Networks

Simple neuron model Components of simple neuron

Artificial Neuron (Perceptron)

Machine Learning

COMP-4360 Machine Learning Neural Networks

Artificial Neural Networks. Edward Gatt

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks Examination, June 2004

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning. Neural Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSC242: Intro to AI. Lecture 21

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Artificial Neural Network : Training

Multilayer Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CS:4420 Artificial Intelligence

Neural Networks and the Back-propagation Algorithm

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks Task Sheet 2. Due date: May

Multilayer Perceptrons and Backpropagation

Gradient Descent Training Rule: The Details

Multilayer Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Computational Intelligence Winter Term 2017/18

CSC321 Lecture 5: Multilayer Perceptrons

epochs epochs

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Input layer. Weight matrix [ ] Output layer

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Computational Intelligence

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Statistical NLP for the Web

Simulating Neural Networks. Lawrence Ward P465A

Simple Neural Nets For Pattern Classification

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Transcription:

Neural Networks: Basics Darrell Whitley Colorado State University

In the Beginning: The Perceptron X1 W W 1,1 1,2 X2 W W 2,1 2,2 W source, destination

In the Beginning: The Perceptron

The Perceptron Learning Rule In Out Target Weight Threshold 0 0 1 n.a. T- 1 0 1 W+ T- 0 1 0 n.a. T+ 1 1 0 W- T+

The Perceptron Learning Rule Some things are linear and easy to learn.

The Perceptron Learning Rule In general, IF a perceptron can learn something it will. IF a perceptron cannot learn something... Easily implements: And, Or, Not. So logically complete if we build multi-layered networks.

A Simple XOR Network X1 1.0 1.0 Threshold = 1.5 (And) 1.0 X2 1.0 1.0 1.0 Threshold = 0.5 (XOR) Threshold = 0.5 (Or)

A Simple XOR Network with Bias Nodes X1 1.0 1.0 1.0 X2 1.0 1.0 1.0 1.0 1.5 0.5 0.0

A Simple XOR Network Note that the hidden layer is a transformed representation that is now linearly separable. X1 X2 H1 H2 OUT 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0

Another Solution to XOR

Weights X1 W W 1,1 1,2 X2 W W 2,1 2,2 Wsource, destination [ W1,1 W [X 1 X 2 ] 1,2 W 2,1 W 2,2 ]

Weights X1 W W 1,2 V 1,1 1,1 V 1,2 X2 W W 2,1 2,2 Wsource, destination V V 2,1 2,2 XW V = XM S(XW )V XM

Linear Separation 1.0 X1 W1 + X2 W2 = Threshold X2 0.0 0.0 X1 1.0 Let W 0 = T hreshold X 1 W 1 + X 2 W 2 + W 0 = 0 X 2 W 2 = X 1 W 1 W 0 X 2 = (W 1 /W 2 )X 1 (W 0 /W 2 )

Linear Separation

How Neurons Communicate Warning: real neurons are comple, neural networks are simple

Neural Spike Trains

How to Artificial Neural Networks Learn? Like Perceptrons, learning is (largely) accomplished by weight adjustments. Recall, we have also converted the neuron thresholds into weights. But we need a different kind of activation function. Activation function = Transfer Function

The Activation Model

Sigmoid Sigmoid(Out) = (1 + e Out/T emp ) 1

Sigmoid, Temperature and Gain Sigmoid(Out) = (1 + e Out/T emp ) 1 The Gain can also be changed by rescaling all of the weights.

Sigmoid, Temperature and Gain Sigmoid(Out) = (1 + e Out/T emp ) 1 The Gain can also be changed by rescaling all of the weights.

Logistics Sigmoid Derivative S() = 1 1 + e S() = (1 + e ) 1 S () = ( (1 + e ) 2 )( e ) S () = e (1 + e ) 2 S 1 () = { (1 + e ) }{ e (1 + e ) } S () = S(){ 1 + e 1 (1 + e ) } S () = S(){ 1 + e 1 + e 1 (1 + e ) } S () = S()(1 S())

Logistics Sigmoid and its derivative input S() S()(1-S()) 0.000000 0.500000 0.250000 0.500000 0.622459 0.235004 1.000000 0.731059 0.196612 1.500000 0.817574 0.149146 2.000000 0.880797 0.104994 2.500000 0.924142 0.070104 3.000000 0.952574 0.045177 3.500000 0.970688 0.028453 4.000000 0.982014 0.017663 4.500000 0.989013 0.010866 5.000000 0.993307 0.006648 5.500000 0.995930 0.004054 6.000000 0.997527 0.002467 6.500000 0.998499 0.001499 7.000000 0.999089 0.000910 7.500000 0.999447 0.000552 8.000000 0.999665 0.000335 8.500000 0.999797 0.000203

Sigmoid Derivative S(Out)(1 S(Out)) When the derivative is zero, there is no learning.

Sigmoid Derivative Instead of target of 0 and 1, or between 0 and 1, use targets of 0.1 and 0.9, or between 0.1 and 0.9. This can help to prevent network paralysis.

Other Sigmoids Elliots function, hyperbolic tangent These activate between 1 and -1. Some spread-out the derivative.

The Delta Rule Let E p be the error for a particular input pattern. We will just look at one pattern, and drop the inde. Let T j be the desired Target pattern for node j. The output of a simple linear net is given by: O j = i X i W i,j E = 1/2(T j O j ) 2 This is a composite function: (Error (Out (W i,j )))

The Delta Rule From this composite function: (Error (Out (W i,j ))) For one layer, we can apply the Chain Rule: δe = δe δw i,j δo j δe δo j = (T j O j ) δo j δw i,j δe δw i,j = (T j O j )X i δo j δw i,j = X i

The Delta Rule For networks with sigmoid units using the logistic function: S j = 1 1 + e Oj/t O j = i X i W i,j Again, a composite function: (Error (Sig (Out (W i,j )))) δe δs j = (T j S j ) δe = δe δs j δw i,j δs j δo j δo j δw i,j δs j δo j = S j (1 S j ) δe δw i,j = (T j S j )S j (1 S j )X i δo j δw i,j = X i

The Delta Rule: Back Propagation Now consider a 2-layer network: Wi,q Wq,j i q j (Error (Sig.j (Out.j (Sig.q (Out.q (W i,q )))))) δe = δe δs j δo j δs q δw i,q δs j δo j δs q δo q δo q δw i,q

The Delta Rule: Back Propagation δe = δe δs j δo j δs q δw i,q δs j δo j δs q δo q δo q δw i,q δe = δe δs j = (T j S j )S j (1 S j ) δo j δs j δo j δe δw i,q = { j δo j δs q δs q δo q = S q (1 S q ) = w q,j δo q δw i,q = X i (T j S j )S j (1 S j )w q,j }S q (1 S q )X i

Updating the weights δe δw i,q = { j (T j S j )S j (1 S j )w q,j }S q (1 S q )X i W i,q = δe δw i,q W i,q = W i,q + α W i,q where α is the step size.

Momentum (one variation) Assume δe δw i,q is the current back prop error. Consider: W i,q (t) = β W i,q (t 1) + (1 β)( δe δw i,q ) Again we update: W i,q = W i,q + α W i,q (1) If β = 0 the update is only the current back prop error. (2) If β = 1 the update use only the previous back prop error. For 0 < β < 0.5 (1) If two steps are increasing, the stepsize increases. (1) If two steps are decreasing, the stepwise decreases. (3) If one step decreases & the net increases, momentum is smoothed.

We are learning weights in different levels

Hyperplanes and Separation o o o o o o o o oo o o o o o o

Margins and Support Vectors o o o o o o

Incremental Learning versus Batch Learning Consider XOR again X1 X2 OUT 0 0 0 1 0 1 0 1 1 1 1 0 You could update the weights after each pattern is presented (incremental or stochastic). You could present all of the patterns, then accumulate the errors and update the weights (Batch).

Convergence to local optima (almost) All of the Back Propagation solutions are local optima (almost). And it does not seem to matter (much).

Convergence to local optima (almost) If we use some kind of (Cross) Validation to stop training early, we are not reaching the lowest possible error. If the neural networks have a high degree of symmetry there can be many identical local optima.

Many local optima are the same With 20 hidden units there are approimately 20! symmetries in the associated search space.

Feedforward networks are input-output boes This is not at all biological. There is no memory. There are also associative memory neural networks. And there are recurrent neural networks.