Multi-layer Neural Networks

Size: px
Start display at page:

Download "Multi-layer Neural Networks"

Transcription

1 Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1

2 Overview Multi-layer neural networks Multi-layer perceptrons (MLPs) The credit assignment problem for hidden units Back-propagation of error (backprop) training Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 2

3 Limitations of single-layer neural networks Single-layer neural networks have many advantages: Easy to setup and train Explicit link to statistical models: Shared covariance Gaussian density functions Sigmoid output functions allow a link to posterior probabilities Outputs are weighted sum of inputs: interpretable representation Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3

4 Limitations of single-layer neural networks Single-layer neural networks have many advantages: Easy to setup and train Explicit link to statistical models: Shared covariance Gaussian density functions Sigmoid output functions allow a link to posterior probabilities Outputs are weighted sum of inputs: interpretable representation But some big limitations: Can only represent a limited set of functions Decision boundaries must be hyperplanes Can only perfectly separate linearly separable data Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3

5 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x) = M w kj φ j (x) j=0 The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

6 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x) = M w kj φ j (x) j=0 The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. This allows a much larger class of discriminant functions (in fact can approximate any continuous function to an arbitrary accuracy) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

7 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x) = M w kj φ j (x) j=0 The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. This allows a much larger class of discriminant functions (in fact can approximate any continuous function to an arbitrary accuracy) Multilayer neural networks employ adaptive basis functions with parameters (weights) that may be estimated from the training data Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

8 Multi-layer neural networks Construct more general networks by considering layers of processing units Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

9 Multi-layer neural networks Construct more general networks by considering layers of processing units Unlike generalised linear discriminants, each layer has a set of adaptive weights Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

10 Multi-layer neural networks Construct more general networks by considering layers of processing units Unlike generalised linear discriminants, each layer has a set of adaptive weights Layers that are not input or output are referred to as hidden Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

11 Multi-layer neural networks Construct more general networks by considering layers of processing units Unlike generalised linear discriminants, each layer has a set of adaptive weights Layers that are not input or output are referred to as hidden Often called multilayer perceptrons (MLPs) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

12 Multi-layer Perceptron w (2) 10 + Outputs + y1 y K w (2) KM Bias + Hidden + z 0 z 1 z M w (1) 10 w (1) Md Bias Inputs x 0 x 1 x d Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 6

13 Building up the MLP (1) First we take a M linear combinations of the d-dimension inputs: d b j = w (1) ji x i b j : activations w (1) ji : first layer of weights i=0 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 7

14 Building up the MLP (1) First we take a M linear combinations of the d-dimension inputs: d b j = w (1) ji x i i=0 b j : activations w (1) ji : first layer of weights Activations transformed by a nonlinear activation function h (e.g. a sigmoid): z j = h(b j ) = exp( b j ) z j : hidden unit outputs Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 7

15 Building up the MLP (2) Outputs of the hidden units are linearly combined to give activations of the output units: M a k = w (2) kj z j j=0 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

16 Building up the MLP (2) Outputs of the hidden units are linearly combined to give activations of the output units: M a k = w (2) kj z j j=0 The output units are transformed using an activation function (e.g. a sigmoid) y k = g(a k ) = exp( a k ) For multiclass problems, a softmax may be used Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

17 Building up the MLP (2) Outputs of the hidden units are linearly combined to give activations of the output units: M a k = w (2) kj z j j=0 The output units are transformed using an activation function (e.g. a sigmoid) y k = g(a k ) = exp( a k ) For multiclass problems, a softmax may be used Combine to give the overall forward propagation equation for the network: M y k = g j=0 w (2) kj h ( d i=0 ) w (1) ji x i Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

18 Multi-layer Perceptron w (2) 10 + Outputs + y1 y K w (2) KM Bias + Hidden + z 0 z 1 z M w (1) 10 w (1) Md Bias Inputs x 0 x 1 x d Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 9

19 Training MLPs: Credit assignment Hidden units make training the weights more complicated Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

20 Training MLPs: Credit assignment Hidden units make training the weights more complicated The gradients for a single layer neural network have a simple form: E n w ki = δ k x i Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

21 Training MLPs: Credit assignment Hidden units make training the weights more complicated The gradients for a single layer neural network have a simple form: E n w ki = δ k x i For a multi-layer neural network: what is the error of a hidden unit? how important is input-hidden weight w (1) ji to output unit k? Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

22 Training MLPs: Credit assignment Hidden units make training the weights more complicated The gradients for a single layer neural network have a simple form: E n w ki = δ k x i For a multi-layer neural network: what is the error of a hidden unit? how important is input-hidden weight w (1) ji to output unit k? Solution: we need to define derivatives of the error with respect to each weight Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

23 Training MLPs: Credit assignment Hidden units make training the weights more complicated The gradients for a single layer neural network have a simple form: E n w ki = δ k x i For a multi-layer neural network: what is the error of a hidden unit? how important is input-hidden weight w (1) ji to output unit k? Solution: we need to define derivatives of the error with respect to each weight Algorithm: back-propagation of error (backprop) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

24 Training MLPs: Credit assignment Hidden units make training the weights more complicated The gradients for a single layer neural network have a simple form: E n w ki = δ k x i For a multi-layer neural network: what is the error of a hidden unit? how important is input-hidden weight w (1) ji to output unit k? Solution: we need to define derivatives of the error with respect to each weight Algorithm: back-propagation of error (backprop) Backprop gives a way to compute the derivatives. These derivatives are used by an optimisation algorithm (e.g. gradient descent) to train the weights. Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

25 Training MLPs: Error function and required gradients Sum-of-squares error function, obtained by summing over a training set of N examples: E = E n = 1 2 N n=1 E n K (yk n tn k )2 k=1 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 11

26 Gradient descent training Operation of gradient descent: 1 Start with a guess for the weight matrix W (small random numbers) 2 Update the weights by adjusting the weight matrix in the direction of W E. 3 Recompute the error, and iterate The update for weight w ki at iteration τ + 1 is: w τ+1 ki = w τ ki η E w ki The parameter η is the learning rate Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 12

27 Training MLPs: Error function and required gradients Sum-of-squares error function, obtained by summing over a training set of N examples: N E = E n E n = 1 2 n=1 K (yk n tn k )2 k=1 To obtain the overall error gradients, we sum over the training examples: E N E n = w kj w kj E w ji = n=1 N n=1 E n w ji Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 13

28 Training MLPs: Hidden-to-output weights Write E n in terms of hidden-to-output weights: E n = 1 K (g(ak n 2 ) tn k )2 = 1 K M g w kj zj n tk n 2 k=1 k=1 j=0 2 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

29 Training MLPs: Hidden-to-output weights Write E n in terms of hidden-to-output weights: E n = 1 K (g(ak n 2 ) tn k )2 = 1 K M g w kj zj n tk n 2 k=1 k=1 Break down error derivatives E n w kj = E n a n k ak n w kj j=0 2 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

30 Training MLPs: Hidden-to-output weights Write E n in terms of hidden-to-output weights: E n = 1 K (g(ak n 2 ) tn k )2 = 1 K M g w kj zj n tk n 2 k=1 k=1 Break down error derivatives E n w kj = E n a n k ak n w kj j=0 E n / a n k is often referred to as the error signal, δn k δ n k = E n a n k = E n y n k y n k a n k = (y n k tn k )g (a n k ) 2 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

31 Training MLPs: Hidden-to-output weights Write E n in terms of hidden-to-output weights: E n = 1 K (g(ak n 2 ) tn k )2 = 1 K M g w kj zj n tk n 2 k=1 k=1 Break down error derivatives E n w kj = E n a n k ak n w kj j=0 E n / a n k is often referred to as the error signal, δn k δ n k = E n a n k = E n y n k y n k a n k Since an k w kj = z n j, we obtain E n w kj = δ n k zn j = (y n k tn k )g (a n k ) 2 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

32 Training MLPs: Hidden-to-output weights Write E n in terms of hidden-to-output weights: E n = 1 K (g(ak n 2 ) tn k )2 = 1 K M g w kj zj n tk n 2 k=1 k=1 Break down error derivatives E n w kj = E n a n k ak n w kj j=0 E n / a n k is often referred to as the error signal, δn k δ n k = E n a n k = E n y n k y n k a n k = (y n k tn k )g (a n k ) Since an k = zj n, we obtain E n = δk n w kj w zn j kj This is similar to single-layer neural networks with a nonlinear activation function. Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14 2

33 Training MLPs: Input-to-hidden weights To compute the error gradients for the input-to-hidden weights we must take into account all the ways in which hidden unit j (and hence weight w ji ) can influence the error. Consider δj n, the error signal for hidden unit j: δ n j = E n b n j = K k=1 E n a n k ak n bj n = K k=1 δ n k a n k b n j Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

34 Training MLPs: Input-to-hidden weights To compute the error gradients for the input-to-hidden weights we must take into account all the ways in which hidden unit j (and hence weight w ji ) can influence the error. Consider δj n, the error signal for hidden unit j: δ n j = E n b n j = K k=1 E n a n k ak n bj n = K k=1 δ n k a n k b n j Sum over all the output units contributions to δ n j : ak n bj n = an k z n j z n j b n j = w kj h (b n j ) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

35 Training MLPs: Input-to-hidden weights To compute the error gradients for the input-to-hidden weights we must take into account all the ways in which hidden unit j (and hence weight w ji ) can influence the error. Consider δj n, the error signal for hidden unit j: δ n j = E n b n j = K k=1 E n a n k ak n bj n = K k=1 δ n k a n k b n j Sum over all the output units contributions to δ n j : ak n bj n = an k z n j Substituting in we obtain: z n j b n j δ n j = h (b n j ) = w kj h (b n j ) K δk n w kj k=1 This is the famous back-propagation of error (backprop) equation. Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

36 Back-propagation of error: hidden unit error signal y 1 y l Outputs y K w (2) 1 j w (2) l j w (2) Kj Hidden units z j w (1) ji x i Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

37 Back-propagation of error: hidden unit error signal y 1 y l Outputs y K δ 1 δ l w (2) 1 j w (2) l j w (2) Kj δ K Hidden units z j w (1) ji x i Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

38 Back-propagation of error: hidden unit error signal y 1 y l Outputs y K δ 1 δ l w (2) 1 j w (2) l j w (2) Kj δ K Hidden units δ j = h (b j ) l δ l w lj z j w (1) ji x i Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

39 Back-propagation of error The derivatives of the input-to-hidden weights can thus be evaluated using: E n = E n bj n w ji bj n = δj n xi n w ji Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 17

40 Back-propagation of error The derivatives of the input-to-hidden weights can thus be evaluated using: E n = E n bj n w ji bj n = δj n xi n w ji The back-propagation of error algorithm is summarised as follows: 1 Apply the N input vectors from the training set, x n, to the network and forward propagate to obtain the set of output vectors y n 2 Using the target vectors t n compute the error E 3 Evaluate the error signals δk n for each output unit 4 Evaluate the error signals δk n for each hidden unit using back-propagation of error 5 Evaluate the derivatives for each training pattern, summing to obtain the overall derivatives Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 17

41 Gradient descent training Operation of gradient descent: 1 Start with a guess for the weight matrix W (small random numbers) 2 Update the weights by adjusting the weight matrix in the direction of W E. 3 Recompute the error, and iterate The update for weight w ki at iteration τ + 1 is: w τ+1 ki = w τ ki η E w ki The parameter η is the learning rate Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 18

42 MLP Example Netlab demmlp2 MLP trained as a classifier on data from a known distribution Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 19

43 Training data Training Data pdf Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 20

44 Training data Density p(x red) Density p(x yellow) Posterior Probability p(red x) Posterior Probability p(yellow x) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 21

45 MLP Example Netlab demmlp2 MLP trained as a classifier on data from a known distribution Train MLP: 6 hidden units, 2 output units Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 22

46 MLP Network Output Bayes Network 0.1 Training data Network output Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 23

47 MLP Example Netlab demmlp2 MLP trained as a classifier on data from a known distribution Train MLP: 6 hidden units, 2 output units Compare with single layer network Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 24

48 SLN Network Output Bayes SLN 0.1 Training data Network output Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 25

49 MLP Example Netlab demmlp2 MLP trained as a classifier on data from a known distribution Train MLP: 6 hidden units, 2 output units Compare with single layer network Classify test data Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 26

50 Decision boundaries Test Data Bayes decision boundary SLN decision boundary Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 27

51 MLP Example Netlab demmlp2 MLP trained as a classifier on data from a known distribution Train MLP: 6 hidden units, 2 output units Compare with single layer network Classify test data Confusion matrices: Optimal % correct MLP % correct SLN % correct Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 28

52 Summary Multi-layer perceptrons Multi-layer neural networks Back-propagation of error training Example Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 29

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Learning from Data: Multi-layer Perceptrons

Learning from Data: Multi-layer Perceptrons Learning from Data: Multi-layer Perceptrons Amos Storkey, School of Informatics University of Edinburgh Semester, 24 LfD 24 Layered Neural Networks Background Single Neurons Relationship to logistic regression.

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial basis () if its output depends on (is a non-increasing function of) the distance of the input from a given stored vector. s represent local receptors,

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

The Multi-Layer Perceptron

The Multi-Layer Perceptron EC 6430 Pattern Recognition and Analysis Monsoon 2011 Lecture Notes - 6 The Multi-Layer Perceptron Single layer networks have limitations in terms of the range of functions they can represent. Multi-layer

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning Lesson 39 Neural Networks - III 12.4.4 Multi-Layer Perceptrons In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Welcome to the Machine Learning Practical Deep Neural Networks MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Introduction to MLP; Single Layer Networks (1) Steve Renals Machine Learning

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learnin and Data Lecture 2 (Chapter 2): Multi-laer neural netorks (Credit: Hiroshi Shimodaira Iain Murra and Steve Renals) Centre for Speech Technolo Research (CSTR) School of Informatics Universit

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information