Artificial Neural Networks The Introduction

Similar documents
Simple Neural Nets For Pattern Classification

Artificial Neural Networks. Historical description

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Introduction to Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks (Part 1) Goals for the lecture

Introduction to Artificial Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial Neural Network and Fuzzy Logic

EEE 241: Linear Systems

Unit III. A Survey of Neural Network Model

Lecture 4: Feed Forward Neural Networks

Artifical Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Part 8: Neural Networks

Machine Learning. Neural Networks

Fundamentals of Neural Networks

Introduction To Artificial Neural Networks

Artificial Neural Network

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks Examination, June 2005

Neural networks. Chapter 20, Section 5 1

Artificial Neural Networks

Feedforward Neural Nets and Backpropagation

Neural networks. Chapter 19, Sections 1 5 1

Data Mining Part 5. Prediction

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Unit 8: Introduction to neural networks. Perceptrons

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Single layer NN. Neuron Model

Neural Networks Introduction

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

COMP-4360 Machine Learning Neural Networks

Artificial neural networks

Neural networks. Chapter 20. Chapter 20 1

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

In the Name of God. Lecture 11: Single Layer Perceptrons

Computational Intelligence Winter Term 2009/10

Simple neuron model Components of simple neuron

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Artificial Neural Networks Examination, March 2004

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Artificial Neural Networks Examination, June 2004

Introduction and Perceptron Learning

Instituto Tecnológico y de Estudios Superiores de Occidente Departamento de Electrónica, Sistemas e Informática. Introductory Notes on Neural Networks

Computational Intelligence

Artificial Neural Networks. Part 2

Chapter 2 Single Layer Feedforward Networks

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Computational Intelligence

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Learning and Memory in Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

CSC Neural Networks. Perceptron Learning Rule

Multilayer Perceptron

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Inf2b Learning and Data

Revision: Neural Network

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Network

Perceptron. (c) Marcin Sydow. Summary. Perceptron

The Perceptron Algorithm

Neural Networks Lecture 2:Single Layer Classifiers

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CMSC 421: Neural Computation. Applications of Neural Networks

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Artificial Neural Networks. Edward Gatt

Artificial Intelligence

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Linear discriminant functions

Artificial Neural Networks

CS:4420 Artificial Intelligence

Fundamentals of Neural Network

PV021: Neural networks. Tomáš Brázdil

Chapter 9: The Perceptron

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Chapter ML:VI (continued)

3.4 Linear Least-Squares Filter

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Sections 18.6 and 18.7 Artificial Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Neural Networks biological neuron artificial neuron 1

Transcription:

Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000 01101011 01100001 01110100 01100101 01100100 01110010 01111001 00100000 01110000 01101111 01100011 01101001 01110100 01100001 01100011 01110101 00101100 00100000 01000110 01000101 01001100 00100000 01000011 01010110 01010101 01010100 00101100 00100000 Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical University in Prague

Outline What are Artificial Neural Networks (ANNs)? Inspiration in Biology. Neuron models and learning algorithms: MCP neuron, Rosenblatt's perceptron, MP-Perceptron, ADALINE/MADALINE Linear separability.

Information Resources https://cw.felk.cvut.cz/doku.php/courses/a4m33bia/start M. Šnorek: Neuronové sítě a neuropočítače. Vydavatelství ČVUT, Praha, 2002. Oxford Machine Learning: https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/ S. Haykin.: Neural Networks: A Comprehensive Foundation, 2nd edition, Prentice Hall, 1998 R. Rojas: Neural Networks - A Systematic Introduction, Springer-Verlag, Berlin, New-York, 1996, http://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/pmwiki/pmwiki.php?n=books.neuralnetworksbook J. Šíma, R. Neruda: Theoretical Issues of Neural Networks, MATFYZPRESS, Prague, 1996, http://www2.cs.cas.cz/~sima/kniha.html

What Are Artificial Neural Networks (ANNs)? Artificial information systems which imitate functions of neural systems of living organisms. Non-algorithmic approach to computation learning, generalization. Massively-parallel processing of data using large number of simple computational units (neurons). Note: we are still very far from the complexity found in the Nature! But there have been some advances lately we will see ;)

ANN: the Black Box Function of an ANN can be understood as a transformation T of an input vector X to an output vector Y Input vectors Output Y = T (X) x1 x2 x3 xn Artificial Neural Network y1 y2 ym What transformations T can be realized by neural networks? It is the scientific question since the beginning of the discipline.

What Can Be Solved by ANNs? Function approximation - regression. Classification (pattern/sequence recognition). Time series prediction. Fitness prediction (for optimization). Control (i.e. robotics). Association. Filtering, clustering, compression.

ANN Learning & Recall ANNs work most frequently in two phases: Learning phase adaptation of ANN's internal parameters. Evaluation phase (recall) use what was learned.

Learning Paradigms Supervised teaching by examples, given a set of example pairs Pi=(xi,yi) find transformation T which approximates yi=t(xi) for all i. Unsupervised self-organization, no teacher. Reinforcement Teaching examples not available they are generated by interactions with the environment (mostly control tasks).

When to Stop Learning Process? When for all patterns of a training set a given criterion is met: the ANN's error is less than..., stagnation for given number of epochs. Training set a subset of example pattern set dedicated to learning phase. Epoch application of all training set patterns.

Inspiration: Biological Neurons

Biological Neurons II Neural system of a 1 year old child contains approx. 1011-12 neurons (loosing 200 000 a day). The diameter of soma nucleus ranges from 3 to 18 μm. Dendrite length is 2-3 mm, there is 10 to 100 000 of dendrites per neuron. Axon length can be longer than 1 m.

Synapses The synapses perform a transport of signals between neurons. The synapses are of two types: excitatory, inhibitory. Another point of view chemical, electrical.

Neural Activity - a Reaction to Stimuli After neuron fires through axon, it is unable to fire again for about 10 ms. The speed of signal propagation ranges from 5 to 125 m/s. Neuron s potential Threshold gained Activity in time Threshold t Threshold failed to reach t

Hebbian Learning Postulate Donald Hebb 1949. When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased. cells that fire together, wire together

Neuron Model: History Warren McCulloch, Walter Pitts 1899-1969 1923-1969 McCulloch, W. S. and Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115-133.

McCulloch-Pitts (MCP) Neuron x1 x2 x3 w1 w2 w3 nonlinear transfer function S Θ xn wn neuron inputs neuron output y threshold weights N y neuron's output (activity) xi neuron's i-th input out of total N, wi i-th synaptic weight, S (nonlinear) transfer (activation) function, Θ threshold, bias (performs shift). The expression in brackets is the inner potential. They worked with binary inputs only. No learning algorithm. y=s w i x i Θ OR i=1 N y=s w i x i, where w S 0 = Θ and x0 = 1 1 Θ x Heaviside (step) function i=0

Other Transfer Functions S(x) S(x) β Θ x x δ b) two-step function a) linear S(x) S(x) 1 1 Θ x x d) Heaviside (step) function c) sigmoid S(x) S(x) +γ +1 x -γ e) semilinear x -1 f) hyperbolic tangent

MCP Neuron Significance The basic type of neuron. Most ANNs are based on perceptron type neurons. Note: there are other approaches (we will see later...)

Rosenblatt's Perceptron (1957) Frank Rosenblatt The creator of the first neurocomputer: MARK I (1960). Learning algorithm for MCP-neurons. Classification of letters.

Rosenblatt's Perceptron Network not all connected (initially randomized, constant weights) fully connected, learning pattern A pattern B perceptrons input pattern feature detection - daemons distribution layer Rosenblatt = McCulloch-Pitts + learning algorithm

Rosenblatt - Learning Algorithm Randomize weights. If the output is correct, no weight is changed. Desired output was 1, but it is 0 increment the weights of the active inputs. Desired output was 0, but it is 1 decrement the weights of the active inputs. Three possibilities for the amplitude of the weight change: Increments/decrements: only fixed values. I/D based on error size. It is advantageous to have them higher for higher error may lead to instability.

Geometric Interpretation of Perceptron Let's have a look at the inner potential (in 2D) boundary w 1 x 1 w 2 x 2 =0 w1 x 2 = x 1 /w 2 w2 weight vector Line with direction -w1/w2 a and shift -Θ/w2. Linear division (hyper)plane: decision boundary. What will the transfer function change?

MP-Perceptron MP = Marvin Minsky a Seymour Papert, MIT Research Laboratory of Electronics, In 1969 they published: Perceptrons

MP-Perceptron input vector weights +1 w1 w0 threshold x1 w2 x2 w3 x3 Σ +1 s y -1 binary output wn xn perceptron learning algorithm error Σ + target d output value

MP-Perceptron: Delta Rule Learning For each example pattern, modify perceptron weights: w i t 1 =w i t η e t x i t e t =d t y t error where wi η weight of i-th input learning rate <0;1) d xi y value of i-th input neuron output desired output

Demonstration Learning AND... x1 0(-1) 1(+1) white square 0, black 1 1(+1) 0 encoded as -1, 1 as +1-1 AND -1 = false -1 AND +1 = false x2 0(-1) +1 AND -1 = false +1 AND +1 = true

Minsky-Papert s Blunder 1/2 Both have exactly proved, that this ANN (understand: perceptron) is not convenient even for implementation of as simple logic function as XOR. -1 XOR -1 = false -1 XOR +1 = true +1 XOR -1 = true +1 XOR +1 = false Impossible to separate the classes by a single line!

Minsky-Pappert s Blunder 2/2 They have previous correct conclusion incorrectly generalized for all ANNs. The history showed, that they are responsible for more than two decades delay in the ANN research. Linear non-separability!

XOR Solution - Add a Layer +1 1 3 2 +1 see http://home.agh.edu.pl/~vlsi/ai/xor_t/en/main.htm

ADALINE Bernard Widrow, Stanford university, 1960. Single perceptron. Bipolar output: +1,-1. Binary, bipolar or real valued input. Learning algorithm: LMS/delta rule. Application: echo cancellation in long distance communication circuits (still used).

ADALINE - ADAptive Linear Neuron input vector weights w1 +1 w0 inner potential bias x1 w2 x2 w3 Σ x3 y -1 nonlinear function wn xn LMS algorithm s +1 linear error binary output - Σ + target d output value http://www.learnartificialneuralnetworks.com/perceptronadaline.html

ADALINE Learning Algorithm: LMS/Delta Rule w i t 1 =w i t η e t x i t e t =d t s t where wi η weight of i-th input learning rate <0;1) d xi value of i-th input s neurons inner potential desired output Least Mean Square also called Widrow-Hoff's Delta rule.

MP-perceptron vs. ADALINE Perceptron: wi= η(d-y)xi ADALINE: wi = η(d-s)xi y -vs- s (output or inner potential). ADALINE can asymptotically approach minimum of error for linearly non-separable problems. Perceptron NOT. Perceptron always converges to error free classifier of linearly separable problem, delta rule might not succeed!

LMS Algorithm? Goal: minimize error E, where E = Σ(ej(t)2), j is for j-th input pattern. E is only a function of the weights. Gradient descent method.

Graphically: Gradient Descent (w1,w2) (w1+δw1,w2 +Δw2) 36NaN - Neuronové sítě M. Šnorek 35

Online vs. Batch Learning Online learning: Batch learning: gradient descent over individual training examples. gradient descent over the entire training data set. Mini batch learning: gradient descent over training data subsets.

Multi-Class Classification? 1-of-N Encoding Perceptron can only handle two-class outputs. What can be done? Answer: for N-class problems, learn N perceptrons: Perceptron 1 learns Output=1 vs Output 1 Perceptron 2 learns Output=2 vs Output 2 Perceptron N learns Output=N vs Output N To classify an output for a new input, just predict with each perceptron and find out which one puts the prediction the furthest into the positive region.

MADALINE (Many ADALINE) Processes binary signals. Bipolar encoding. One hidden, one output layer. Supervised learning Again B. Widrow.

MADALINE: Architecture input layer hidden output ADALINEs majority Note: We learn only weights at ADALINE inputs.

MADALINE: Linearly Non-Separable Problems MADALINE can learn to linearly non-separable problems: 1 0 0 1 1 0 1 0

MADALINE Limitations MADALINE is unusable for complex tasks. ADALINEs learn independently, they are unable to separate the input space to form a complex decision boundary.

Next Lecture MultiLayer Perceptron (MLP). Radial Basis Function (RBF). Group Method of Data Handling (GMDH).