Multilayer Neural Networks

Size: px
Start display at page:

Download "Multilayer Neural Networks"

Transcription

1 Pattern Recognition Lecture 4 Multilayer Neural Netors Prof. Daniel Yeung School of Computer Science and Engineering South China University of Technology Lec4: Multilayer Neural Netors Outline Introduction 6. Artificial Neural Netor 6. Multiple Layer Perceptron NN 6. Bac-Propagation Algorithm 6.3 Regularization 6. Relating NN and Bayes Theory 6.6 Practical Techniques 6.8 Lec4: Multilayer Neural Netors

2 Introduction Recall, Linear Discriminant Functions: Σ g 3 4 Σ g 5 Input Layer Output Layer Limited generalization capability Cannot handle the non-linearly separable problem Lec4: Multilayer Neural Netors 3 Introduction Solution : Mapping Function φ φ 3 φ φ Σ g 4 φ Σ g 5 φ Input Layer Output Layer Pro: Simple structure still using LDF Cons: Selection of φ and its parameters Discuss already in Lecture 03 Lec4: Multilayer Neural Netors 4

3 Introduction Solution : Multi-Layer Neural Netor 3 Σ g 3 4 Σ g 5 Input Layer Hidden Layers Output Layer No need to choose the nonlinear mapping φ, and no need to have any prior noledge relevant to the classification problem. Lec4: Multilayer Neural Netors 5 Multi-LayerNeural Netor multilayer Perceptrons The number of hidden layer should be more than one The hidden layers serve as a mapping function Will be introduced in this lecture Lec4: Multilayer Neural Netors 6

4 Artificial Neural Netor ANN A very simplified model of the brain Human Brain input output input output Artificial NN Basically a function approimator Transforms inputs into outputs to the best of its ability Lec4: Multilayer Neural Netors 7 Artificial Neural Netor ANN Composed of neurons hich cooperate together I synapse I I d f Ө neuron input Lec4: Multilayer Neural Netors 8 d output

5 Artificial Neural Netor ANN Lec4: Multilayer Neural Netors 9 Artificial Neural Netor ANN Ho does a neuron or? The output of a neuron is a functionof the eighted sumof the inputsplus a bias optional unit hich alays emits a value of or - as a threshold. bias I I f Ө = f I + I + + d I d + bias I d d activation function Lec4: Multilayer Neural Netors 0

6 Artificial Neural Netor ANN Activation Function The function f is the activation function Eamples: Linear Function Output is the same as input Differentiable f = Sign Function Decision Maing Not differentiable > 0 f - < 0 Sigmoid Function Smooth, continuous, and monotonically increasing differentiable f = / + e - Lec4: Multilayer Neural Netors Artificial Neural Netor ANN XOR Eample y = bias y 0.7 y 0.4 = sgn y y = +.5 Lec4: Multilayer Neural Netors

7 Artificial Neural Netor ANN XOR Eample y = = = y = sgn++0.5 = sgn.5 = y = sgn+-.5 = sgn0.5 = y = sgn = sgn-.3 = - y 0.7 y 0.4 = sgn y = - = - y = +.5 y = sgn = sgn-.5 = - y = sgn---.5 = sgn-3.5 = - y = sgn = sgn-.3 = - Lec4: Multilayer Neural Netors 3 Artificial Neural Netor ANN XOR Eample The first hidden unit OR gate y = implements an The second hidden unit an AND gate y = +.5 implements The final output unit implements an AND NOT gate: 0.7 y 0.4 y= sgn y y= y AND NOT y = OR AND NOT AND = XOR Lec4: Multilayer Neural Netors 4

8 Artificial Neural Netor ANN Structure of ANN: 3 Σ g Σ g Input Layer Hidden Layers Output Layer A simple three-layer neural netor Input Layer: input units Hidden Layer: 3 hidden units Output Layer: output units Lec4: Multilayer Neural Netors 5 Illustrative eample: 3 Σ g Σ g Input Layer Hidden Layers Output Layer = length of salmon/seabass = lightness of salmon/seabass i = Weights for assigning importance of each input from neuron Top hidden neuron = Length discriminant function Middle hidden neuron = Combination of length and lightness discriminat function Bottom hidden neuron = Lightness discriminant function g = final output g = final output Lec4: Multilayer Neural Netors 6

9 Artificial Neural Netor ANN net y f ' net 3 f f g f f g Input Layer Hidden Layers Output Layer net = d m= m m g y = f net f n net' Lec4: Multilayer Neural Netors 7 f = ' i d i= m= = n i= im ' m i y i ' g = f net Artificial Neural Netor ANN A to-layer netor classifier can only implement a linear decision boundary A three-, four-and higher-layer netors can implement arbitrarydecision boundaries The decision regions need not be conve, nor simply connected Lec4: Multilayer Neural Netors 8

10 Multiple Layer Perceptron NN MLPNN The most commonnn More thanone layer Sigmoid is used as activation function A general function approimator Not limited to linear problems Multilayers Sigmoid function Input Layer Output Layer Lec4: Multilayer Neural Netors 9 Multiple Layer Perceptron NN MLPNN Eample Lec4: Multilayer Neural Netors 0

11 Training: Weight Determination Weights can be determined by training Reduce the errorbeteen the desired outputs and the NN outputs of training samples Bac-propagation algorithmis the most idely used method for determining the eight Natural etension of LMS algorithm Pros: Simpleand generalmethod Cons: Sloand trapped at local minima Lec4: Multilayer Neural Netors Bac-Propagation BP Algorithm Calculation of the derivative flos bacards through the netor Hence, it is called bacpropagation These derivatives point in the direction of themaimum increase of the error function find out here ma error being made and go bac to try to decrease this error A small steplearning rate in the opposite directionill result in the maimum decrease of the local error function: E ' = + α here αis the learning rate & E the error function Lec4: Multilayer Neural Netors

12 Bac-Propagation BP Algorithm Most common measure of error is the mean square error: J= target output / Update ruleof eight is J + = η here ηis the learning rate hich controls the size of each step Lec4: Multilayer Neural Netors 3 Bac-Propagation BP Algorithm Net slides sho BP for a 3-layer NN To types of eights Hidden-to-Output ho Input-to-Hidden ih ih ho g g Lec4: Multilayer Neural Netors 4

13 Bac-Propagation BP Algorithm 3-Layer NN The learning rule for the hidden-to-output units J = J output output net net output output = f net net = n i= i y i J output = -target -output output net = f ' net net = y Lec4: Multilayer Neural Netors 5 Bac-Propagation BP Algorithm 3-Layer NN The learning rule for the input-to-hidden units: J y = i f J = y y = J y net c = = c = = = c = c = y net net net = d m= i m target? output target? output target? output output y output net target? output f' net m net y net Lec4: Multilayer Neural Netors 6?? y net = f net output output output c d = mm i i = i m=

14 Bac-Propagation BP Algorithm 3-Layer NN Summary Hidden-to-Output Weight Input-to-Hidden Weight J i J = f' i = y c net target? output f' net f' net = target? output Lec4: Multilayer Neural Netors 7 Bac-Propagation BP Algorithm Training Algorithm For the sample training set, the eights of NN can be updated differently by presenting the training samples in different sequences There are to popular methods: Stochastic training Batch training Lec4: Multilayer Neural Netors 8

15 Bac-Propagation BP Algorithm Training Algorithm Stochastic training Patterns are chosen randomlyfrom the training set Netor eights are updated randomly Lec4: Multilayer Neural Netors 9 Bac-Propagation BP Algorithm Training Algorithm Batch training All patterns are presentedto the netor before learning eight update taes place Lec4: Multilayer Neural Netors 30

16 Bac-Propagation BP Algorithm Training Algorithm A classifier ith smaller training error is better? Most of the cases, NO! We have discussed this issue in Lecture 0 For eample: Stop training hen testing error reaches a minimum Test unseen samples Lec4: Multilayer Neural Netors 3 Regularization In Lec03, e mentioned that in most cases, the solution discriminant function is not uniqueill-posed problem Which one is the best? Enough to minimize training error? Too comple classifier? Good generalization ability? Overfitting problem -- Minimize Remp Empirical Ris Training Error Lec4: Multilayer Neural Netors 3

17 Regularization Regularizationis one of the methods to handle this problem ψ Add Regularization Term in obective function Measure the smoothness of the decision plane Tradeoff parameter λto control the importanceof training accuracy and regularization term See a smooth classifier ith good performance on a training set May sacrifice Training Error for the Simplicity of a classifier if necessary Minimize: emp Tradeoff R +λψ f Training Error Regularization Term Lec4: Multilayer Neural Netors 33 Regularization Minimize: emp R +λψ f λ: regularization parameter ; ψ regularization function λ= 0 > λ > 0 λ Similar to traditional training obective faction No effect on the regularization term If e can find suitable λ, e may find f ith a good generalization ability Dominated by the regularization term The most smooth classifier is found Lec4: Multilayer Neural Netors 34

18 Regularization Weight Decay It is a ell non regularization eample Regularization Term measures the value of eight ψ f = Smaller smoother The obective function becomes Minimize: Remp +λ Lec4: Multilayer Neural Netors 35 NN and Bayes Theory Recall, Bayes formula: ω p ω P, p ω p ω p p p ω = = c = i i i ω Suppose a netor is trained using the folloing target output setting: target = 0 if ω otherise Lec4: Multilayer Neural Netors 36

19 Lec4: Multilayer Neural Netors 37 NN and Bayes Theory When the number of training samples tends to infinity see p.304 in tet: We try to minimize J rt, the folloing term ill be minimized The trained netor ill approimate the posteriori probability [ ] = n n g n J n target ; lim lim [ ] + = d p g P d p g P i i ; ; ω ω ω ω + = d p d p g d p g,, ; ; ω ω [ ] + = d p P P d p P g i ; ω ω ω Independent of Dependent on [ ] d p P g ; ω ; P g ω Using mean square error for J = =0 Lec4: Multilayer Neural Netors 38 NN and Bayes Theory Thus hen MLPNNs are trained via bac propagation on a sum-squared errorcriterion, they provide a least squares fitto the Bayes discriminant function, i.e. ; P g ω

20 Practical Techniques Ho to design a MLPNN to handle a given classification problem?... n,,, y y... y n Training Set??? MLP Lec4: Multilayer Neural Netors 39 Practical Techniques Must consider folloing issues: Scaling input Target values Number of Hidden Layers Number of Hidden Units Initializing Weights Learning Rates Momentum Weight Decay Stochastic and Batch Training Stopped Training Lec4: Multilayer Neural Netors 40

21 Practical Techniques Scaling input Featuresith different natures ill have different propertiese.g. range, mean For eample: Fish Mass grams and Length meters Normally the value of the mass ill be orders of magnitude larger than that for length During the training, the netor ill adust eights from the mass input unit far more than for the length input The error ill hardly depend upon the tiny length values The situation ill be reversed hen Mass ilogram and length millimeter Lec4: Multilayer Neural Netors 4 Practical Techniques Scaling input Ho to reduce this influence? NormalizationStandardization Standardize the training samples have Same range e.g. 0 to or - to Same variance e.g. Same average e.g. 0 Lec4: Multilayer Neural Netors 4

22 Practical Techniques Target Value Usually a one-of-c representationfor the target vector is used For four-class problem: Four outputs ill be used ω =, -, -, - or, 0, 0, 0 ω = -,, -, - or 0,, 0, 0 ω 3 = -, -,, - or 0, 0,, 0 ω 4 = -, -, -,- or 0, 0, 0, Lec4: Multilayer Neural Netors 43 Practical Techniques Number of Hidden Layers BP algorithm ors ell for NN ith many hidden layers, as long as the units are differentiable Ho many hidden layers are enough? NN ith more hidden layers Easier to learn translations Some functions can be implemented more efficiently Hoever, more undesirable local minima and more comple Since any arbitrary function can be approimated by a MLP ith one hidden layer. Usually a 3-layer NN is recommended. Special problem conditions or requirements may ustify the use of more than 3- layers. Lec4: Multilayer Neural Netors 44

23 Practical Techniques Number of Hidden Units Governs the epressive poerof the NN for facial recognition, neuron for mouth, nose, ear, eye, face shape, etc. Well separated or linearly separable samples, fe hidden units Complicated problem, more hidden units One study shos minimum Error occurs for NN in range Of 4-5 hidden units & range of eights 7-. Number of eight n Number of hidden units n H Lec4: Multilayer Neural Netors 45 Practical Techniques Number of Hidden Units Ho to determine the number of hidden units n H? n H determines the total number of eights in the net, thus e should not have more eights than the total number of training points n Without further information, n H cannot be determined before training Eperimentally, Choose n H such that the total number of eights in the net is roughly n/0 Adust the compleityof the netor in response to the training data, for eample, Start ith a large value of n H Prune or eliminate eights Lec4: Multilayer Neural Netors 46

24 Practical Techniques Initializing Weights In setting eights in a given layer, e choose eights randomly from a single distribution to help insure uniform learning If set = 0 initially, learning can never start. Want standardize data so choose both positive & negative eights Lec4: Multilayer Neural Netors 47 Practical Techniques Initializing Weights If is initially too small the net activation of a hidden unit ill be small and the linear modelill be implemented If is initially too large the hidden unit may saturate sigmoid function is alays 0 or even before learning begins net = d m= m m y = f net Sigmoid Function f = / + e - Saturate Linear Saturate Lec4: Multilayer Neural Netors 48

25 Practical Techniques Initializing Weights We set such that the net activation at a hidden unit is in the range < net < +, since net ±are the limits to its linear range Sigmoid Function f = / + e - Input-to-Hidden d inputs < i <+ d d Hidden-to-Output the fan-in is n H < <+ n n H Lec4: Multilayer Neural Netors 49 H Practical Techniques Learning Rates In principle, small learning rate ensures convergence Its value determines only the learning speed but not the final eight values themselves Hoever, in practice, because netors are rarely fully trained to a training error minimum, the learning rate can affect the quality of the final netor Lec4: Multilayer Neural Netors 50

26 Practical Techniques Learning Rates Optimal learning rateis the one hich leads to the local error minimum in one learning step J = The optimal rate: J η opt = J Lec4: Multilayer Neural Netors 5 Practical Techniques Learning Rates Sloer convergence Optimal Converge by one step Oscillate but sloly converge Diverge Lec4: Multilayer Neural Netors 5

27 Practical Techniques Momentum What is Momentum? In physics, it means the moving obects tend to eep moving unless acted upon by outside forces Eample: To balls carry the same momentum In BP algorithm, the approach is to alter the learning rule to include some fraction αof the previous eight update Different faction Momentum m+ = m + α m + α m Current delta Previous delta Lec4: Multilayer Neural Netors 53 Practical Techniques Momentum ithout momentum ith momentum Using momentum Reduces the variationin overall gradient directions Increase speed of learning Lec4: Multilayer Neural Netors 54

28 Practical Techniques Stochastic and Batch Training Each training algorithm has strength and drabac: Batch learningis typically sloerthan stochastic learning. Stochastic trainingis preferred for large redundant training sets Lec4: Multilayer Neural Netors 55 Practical Techniques Stopped Training Stoppingthe training beforegradient descent is complete can help avoid overfitting A far more effective method is to stoptraining hen the error on a separate validation set reaches a minimum Algorithm. Separate the original training set into to set Ne Training Set Validation Set. Use Ne Training Set to train the classifier 3. Evaluate the classifier using Validation Set at the end of each epoch Validation Error Training Error Generalization Error Lec4: Multilayer Neural Netors 56

Multilayer Neural Networks

Multilayer Neural Networks Pattern Recognition Multilaer Neural Networs Lecture 4 Prof. Daniel Yeung School of Computer Science and Engineering South China Universit of Technolog Outline Introduction (6.) Artificial Neural Networ

More information

Multilayer Feedforward Networks. Berlin Chen, 2002

Multilayer Feedforward Networks. Berlin Chen, 2002 Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

Networks of McCulloch-Pitts Neurons

Networks of McCulloch-Pitts Neurons s Lecture 4 Netorks of McCulloch-Pitts Neurons The McCulloch and Pitts (M_P) Neuron x x sgn x n Netorks of M-P Neurons One neuron can t do much on its on, but a net of these neurons x i x i i sgn i ij

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward ECE 47/57 - Lecture 7 Back Propagation Types of NN Recurrent (feedback during operation) n Hopfield n Kohonen n Associative memory Feedforward n No feedback during operation or testing (only during determination

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Neural Networks: Basics. Darrell Whitley Colorado State University

Neural Networks: Basics. Darrell Whitley Colorado State University Neural Networks: Basics Darrell Whitley Colorado State University In the Beginning: The Perceptron X1 W W 1,1 1,2 X2 W W 2,1 2,2 W source, destination In the Beginning: The Perceptron The Perceptron Learning

More information

Artificial Neuron (Perceptron)

Artificial Neuron (Perceptron) 9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Machine Learning

Machine Learning Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10 CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

Machine Learning

Machine Learning Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Lecture 16: Introduction to Neural Networks

Lecture 16: Introduction to Neural Networks Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Networks Task Sheet 2. Due date: May

Neural Networks Task Sheet 2. Due date: May Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, pfeifer@ifi.unizh.ch Department of Informatics, AI Lab Matej Hoffmann, hoffmann@ifi.unizh.ch Andreasstrasse 15 Marc Ziegler,

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG CS4442/9542b Artificial Intelligence II prof. Olga Vesler Lecture 5 Machine Learning Neural Networs Many presentation Ideas are due to Andrew NG Outline Motivation Non linear discriminant functions Introduction

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Introduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng

Introduction to Artificial Neural Network - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng Introduction to Artificial Neural Netor - theory, application and practice using WEKA- Anto Satriyo Nugroho, Dr.Eng Center for Information & Communication Technology, Agency for the Assessment & Application

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories //5 Neural Netors Associative memory Lecture Associative memories Associative memories The massively parallel models of associative or content associative memory have been developed. Some of these models

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

epochs epochs

epochs epochs Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values

More information

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Neural Networks Le Song Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 5 CB Learning highly non-linear functions f:

More information

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 2. Perceptrons COMP9444 17s2 Perceptrons 1 Outline Neurons Biological and Artificial Perceptron Learning Linear Separability Multi-Layer Networks COMP9444 17s2

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Supervised Learning in Neural Networks

Supervised Learning in Neural Networks The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but

More information