Part 8: Neural Networks

Similar documents
Artificial Neural Networks. Part 2

Multilayer Feedforward Networks. Berlin Chen, 2002

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Introduction to Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Introduction to Artificial Neural Networks

Simple Neural Nets For Pattern Classification

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks The Introduction

Neural networks. Chapter 19, Sections 1 5 1

Artificial Neural Network

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Networks. Historical description

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Neural networks. Chapter 20. Chapter 20 1

Lecture 7 Artificial neural networks: Supervised learning

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Multilayer Neural Networks

Neural Networks (Part 1) Goals for the lecture

Multilayer Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Lecture 4: Feed Forward Neural Networks

Data Mining Part 5. Prediction

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Multilayer Perceptron

Introduction To Artificial Neural Networks

Unit III. A Survey of Neural Network Model

Machine Learning. Neural Networks

Pattern Classification

Multilayer Perceptron = FeedForward Neural Network

Artificial Neural Network and Fuzzy Logic

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Course 395: Machine Learning - Lectures

Multilayer Perceptrons and Backpropagation

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Perceptron Tutorial

Artificial Neural Networks Examination, June 2005

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Neural networks. Chapter 20, Section 5 1

An artificial neural networks (ANNs) model is a functional abstraction of the

Artificial Intelligence

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

EEE 241: Linear Systems

CMSC 421: Neural Computation. Applications of Neural Networks

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Logistic Regression. Machine Learning Fall 2018

Introduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng

Lab 5: 16 th April Exercises on Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks and the Back-propagation Algorithm

Networks of McCulloch-Pitts Neurons

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Artificial Neural Networks Examination, March 2004

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Introduction Biologically Motivated Crude Model Backpropagation

AI Programming CS F-20 Neural Networks

Neural Networks Lecture 2:Single Layer Classifiers

Reification of Boolean Logic

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

CS:4420 Artificial Intelligence

Artificial Neural Networks Examination, June 2004

Neural Networks Introduction

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artifical Neural Networks

Revision: Neural Network

PV021: Neural networks. Tomáš Brázdil

Fundamentals of Neural Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Single layer NN. Neuron Model

CS 4700: Foundations of Artificial Intelligence

Introduction to Neural Networks

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Motivation for the topic of the seminar

Using a Hopfield Network: A Nuts and Bolts Approach

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Multilayer Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Machine Learning

Artificial Neural Network : Training

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Chapter ML:VI (continued)

Linear Discriminant Functions

Chapter 2 Single Layer Feedforward Networks

CS 4700: Foundations of Artificial Intelligence

Machine Learning

COMP-4360 Machine Learning Neural Networks

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Artificial Neural Networks

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Instituto Tecnológico y de Estudios Superiores de Occidente Departamento de Electrónica, Sistemas e Informática. Introductory Notes on Neural Networks

Artificial Neural Networks

Transcription:

METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors

- INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as a part of nervous system and the brain (Figure: http://hubpages.com/hub/self-affirmations)

Biological Neural Netors - There are 0 billion neurons in human brain. - A huge number of connections - All tass such as thining, reasoning, learning and recognition are performed by the information storage and transfer beteen neurons - Each neuron fires sufficient amount of electric impulse is received from other neurons. - The information is transferred through successive firings of many neurons through the netor of neurons.

Artificial Neural Netors An artificial NN, or ANN or (a connectionist model, a neuromorphic system) is meant to be - A simple, computational model of the biological NN. - A simulation of above model in solving problems in pattern recognition, optimization etc. - A parallel architecture of simple processing elements connected densely.

Y Y 2 a neuron Y, Y 2 outputs X, X 2 inputs neuron eights X X 2 An Artificial Neural Net

Any application that involves - Classification - Optimization - Clustering - Scheduling - Feature Extraction may use ANN! Most of the time integrated ith other methods such as - Expert systems - Marov models WHY ANN? Easy to implement Self learning ability When parallel architectures are used, very fast. Performance at least as good as other approaches, in principle they provide nonlinear discriminants, so solve any P.R. problem. Many boards and softare available

APPLICATION AREAS: - Character Recognition - Speech Recognition - Texture Segmentation - Biomedical Problems (Diagnosis) - Signal and Image Processing (Compression) - Business (Accounting, Mareting, Financial Analysis)

Bacground: Pioneering Wor 940 - McCulloch-Pitts (Earliest NN models) 950 - Hebb- (Learning rule, 949) - Rosenblatt(Perceptrons, 958-62) 960 -Widro, Hoff (Adaline, 960) 970 980 - Hopfield, Tam (Hopfield Model, 985) - RumelHart, McClelland (Bacpropagation) - Kohnen (Kohonen s nets, 986) 990 - Grossberg, Carpenter (ART) 90 s Higher order NN s, time-delay NN, recurrent NN s, radial basis function NN -Applications in Finance, Engineering, etc. - Well-accepted method of classification and optimization. 2000 s Becoming a bit outdated.

ANN Models: Can be examined in - Single Neuron Model 2-Topology 3- Learning X - Single Neuron Model: 0 Y N X N f(α) α f (α) f(α) + + α - - linear Step(bipolar) Sigmoid α General Model: α Y f N ( x + ) f( α) i i i o f ( net)

f (α) - Activation function Binary threshold / Bipolar / Hardlimiter Sigmoid When d, Mc Culloch-Pitts Neuron: - Binary Activation f( α) /(+ exp( αd) df dα f( f) - All eights of positive activations and negative activations are the same. Excitatory(E) Inhibitory(I) Fires only if E T, I 0 here E excit. inputs I inb. inputs TThreshold

Higher-Order Neurons: The input to the threshold unit is not a linear but a multiplicative function of eights. For example, a second-order neuron has a threshold logic ith α N x + + ith binary inputs. i i o i i, N More poerful than traditional model. i x i x 2. NN Topologies: 2 basic types: - Feedforard - Recurrent loops alloed Both can be single layer or many successive layers.

Youtput vector TTarget output vector Xinput vector A feed-forard net A recurrent net

3.Learning: Means finding the eights using the input samples so that the input-output pairs behave as desired. supervised- samples are labeled (of non category) P(X,T) input-target output pair unsupervised- samples are not labeled. Learning in general is attained by iteratively modifying the eights. Can be done in one step or a large no of steps. Hebb s rule: If to interconnected neurons are both on at the same time, the eight beteen them should be increased (by the product of the neuron values). Single pass over the training data (ne)(old)+xy Fixed-Increment Rule (Perceptron): - More general than Hebb s rule iterative ( ne) old ( ) - (change only if error occurs.) t target value assumed to be (if desired), 0 (if not desired). σ is the learning rate. + σxt

Delta Rule: Used in multilayer perceptrons. Iterative. ( ne) ( old) + σ( t y) x X Y here t is the target value and the y is the obtained value. ( t is assumed to be continuous) Assumes that the activation function is identity. σ( t y) x ( ne) ( old) Extended Delta Rule: Modified for a differentiable activation function. σ( t y) xf'( α)

PATTERN RECOGNITION USING NEURAL NETS A neural netor (connectionist system) imitate the neurons in human brain. In human brain there are 0 3 neurons. A neural net model 2 outputs 3 inputs Each processing element either fires or it does not fire W i eights beteen neurons and inputs to the neurons.

The model for each neuron: X X 2 0 Y Y X n n n f( x i i) f( x i i ) i 0 i f- activation function, normally nonlinear n f ( 0 α ) Hard-limiter + - α

Sigmoid + α Sigmoid f - + e α) α ( TOPOLOGY: Ho neurons are connected to each other. Once the topology is determined, then the eights are to be found, using learning samples. The process of finding the eights is called the learning algorithm. Negative eights inhibitory Positive eights - excitatory

Ho can a NN be used for Pattern Classification? - Inputs are feature vectors - Each output represent one category. - For a given input, one of the outputs fire (The output that gives you the highest value). So the input sample is classified to that category. Many topologies used for P.R. - Hopfield Net - Hamming Net - Multilayer perceptron - Kohonen s feature map - Boltzman Machines

MULTILAYER PERCEPTRON Single layer y...y m x...x n Linear discriminants: - Cannot solve problems ith nonlinear decision boundaries x x 2 XOR problem No linear solution exists

Multilayer Perceptron y...y m Hidden layer 2 Hidden layer x...x n Fully connected multilayer perceptron It as shon that a MLP ith 2 hidden layers can solve any decision boundaries.

Learning in MLP: Found in mid 80 s. Bac Propagation Learning Algorithm - Start ith arbitrary eights 2- Present the learning samples one by one to inputs of the netor. If the netor outputs are not as desired (y for the corresponding output and 0 for the others) - adust eights starting from top level by trying to reduce the differences 3- Propagate adustments donards until you reach the bottom layer. 4- Continue repeating 2 & 3 for all samples again & again until all samples are correctly classified.

Example: AND for XOR - for others NOT AND -.7 -.4 OR GATE AND GATE Fires only hen X, X 2 X, X 2 or - Output of neurons: or - Output for X, X 2,- or -, - for other combinations.5 -.5 X X 2

( X X X + X X + X2)( XX2) ( X+ X2)( X+ X2) X 2 XOR X 2 2 + - Activation function -.7.4 Tae (NOT AND)

Expressive Poer of Multilayer Netors If e assume there are 3 layers as above, (input, output, one hidden layer). For classification, if there are c categories, d features and m hidden nodes. Each output is our familiar discriminant function. y y 2 y c y m d g ( X) f f i By alloing f to be continuous and varying, is the formula for the discriminant function for a 3-layer (one input, one hidden, one output) netor. (m number of nodes in the hidden layer) i x i + 0 + 0

Any continuous function can be implemented ith a layer 3 layer netor as above, ith sufficient number of hidden units. (Kolmogorov (957)). That means, any boundary can be implemented. Then, optimum bayes rule, (g(x) a posteriori probabilities) can be implemented ith such netor! g( X) In practice: 2n+ - Ho many nodes? - Ho do e mae it learn the eights ith learning samples? - Activity function? d I ( i i ( x i )) Bac-Propagation Algorithm Learning algorithm for multilayer feed-forard netor, found in 80 s by Rumelhart et al. at MIT.

We have a sample set (labeled). { X, X2,..., X n } z z c NN t,...,t c - Target output t i -high, t lo for i and X C i z,..., z c -actual output x x d We ant: Find W (eight vector) so that difference beteen the target output and the actual output is minimized. Criterion function 2 2 J( W) ( t z) T Z 2 2 is minimized for the given learning set.

The Bac Propagation Algorithm ors basically as follos - Arbitrary initial eights are assigned to all connections 2- A learning sample is presented at the input, hich ill cause arbitrary outputs to be peaed. Sigmoid nonlinearities are used to find the output of each node. 3- Topmost layer s eights are changed to force the outputs to desired values. 4- Moving don the layers, each layers eights are updated to force the desired outputs. 5- Iteration continues by using all the training samples many times, until a set of eights that ill result ith correct outputs for all learning samples are found. (or ) J(W) <θ The eights are changed according to the folloing criteria: If the node is any node and i is one of the nodes a layer belo (connected to node ), update i as follos i ( t+ ) ( t) + ηδ Xi i (Generalized delta rule) Where X is either the output of node I or is an input and δ z ( z)( t z) in case is an output node z is the output and t is the desired output at node.

If is an intermediate node, δ x ( x ) δ here x is the output of node. m zz i δ - hidden node x i In case of is output z gradient descent Ho do e get these updates? Apply gradient descent algorithm to the netor. 2 J( W) T Z 2

Gradient Descent: move toards the direction of the negative of the gradient of J(W) J W η W η - learning rate For each component i i J η ( t+ ) t () + i i i No e have to evaluate J for output and hidden nodes. But e can rite this as follos: i J i J α α i here α f i i X i

But α i No call X i J α δ Then, ηδ X i i No, ill vary depending on being an output node or hidden node. Output node use chain rule So, δ J net J z z net Derivative of the activation function (sigmoid) ( t z)( z)( z ) i ( t+ ) ( t) + η( z )( z )( t i z ) X i

Hidden Node: use chain rule again No So net J δ i i i net net y y J J i i, δ evaluate [ ] 2 2 i i z t y y J ( ) c c y z z t y z z J 2 c y net net z z t ) ( δ δ i c i i i X t t + + ) ( ) ( δ η