# Neural Networks and the Back-propagation Algorithm

Size: px
Start display at page:

## Transcription

1 Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely follow the presentation in [1]. We refer to [1, 2, 3] for further details. Throughout these notes, random variables are represented with upper-case letters, such as X or Z. A sample of a random variable is represented by the corresponding lower-case letter, such as x or z. When random variables are vector valued, we use subscripts to indicate specific components, as in X k or Z k. The corresponding samples are represented using bold-face letters, such as x or z, and individual components as x k or z k, respectively. When considering an indexed family of vector-valued data-points, we use indexed bold-face symbols to denote the elements in the family, as in x n or z n. 1 The Perceptron Artificial neural networks (ANNs) arose as an attempt to model mathematically the process by which information is handled by the brain. Learning methods based on neural networks are general and relatively simple to implement, making them a widely used class of methods when complex real-world data must be interpreted. Examples include recognition of handwritten digits, spoken words or faces. ANNs correspond to networks of densely connected nodes, known as neurons, each of which is a small processing unit. The simplest model of such x 0 x 1... x p w 0 w 1 Threshold Activation w p a ŷ Figure 1: Representation of the Perceptron. 1

2 Half-plane corresponding to positive class w Half-plane corresponding to negative class Decision-boundary Figure 2: Decision boundary for the Perceptron, given the weight vector w. a network consists of a single unit, known as perceptron, and represented in the diagram of Fig. 1. The perceptron takes as input a vector x = [x 1,..., x p ] of p real-valued inputs, from which it computes the activation, a, which is a linear combination of these inputs, p a = w 0 + w i x i = w x. i=1 Note that we included one additional weight, w 0, that is independent of the input and is known as the bias. However, to provide a uniform treatment of the weights in the perceptron, it is customary to consider one additional input, x 0, that is constant and equal to 1, i.e., x 0 1. We included this fictitious input in the representation of Fig. 1. The output of the perceptron, ŷ, is computed as the image of the activation a by a threshold function σ, { 1 if a > 0 ŷ(x) σ(a) = -1 otherwise. The perceptron can then be used for binary classification tasks, where the inputs x for which ŷ(x) = 1 correspond to the positive instances and those for which ŷ(x) = 1 correspond to the negative instances. Geometrically, the data-points classified by the perceptron as belonging to the positive class correspond to those data-points x whose inner product with the weight vector w is positive (see Fig. 2). 1.1 Perceptron Learning Rule To determine the process by which the perceptron is trained, it is necessary to define an error function with respect to which the performance of the 2

3 perceptron is to be measured (remember that this is one of the fundamental elements necessary to define a learning task). While the number of misclassified data-points is a natural candidate for error function, it is not amenable to an easy analytical treatment. Instead, we introduce the so-called perceptron criterion. Note, first of all, that a data-point x in class y (with y { 1, 1}) is properly classified by the perceptron if w xy > 0. Given a training data-set D, let M denote the set of misclassified data-points. The perceptron criterion tries to minimize the error E(w) = x M w x n y n. To minimize the error E, we adopt a general gradient descent approach, whereby the minimum of a general real-valued function F (z) is gradually approximated by the sequence { z (1), z (2),... } defined recursively by z (τ+1) = z (τ) η z F (z (τ) ), where η is a positive step-size. Specifically in the case of the perceptron, the weight vector w is adjusted as w w η w E(w) = w + η n M x n y n. (1) Interestingly, two modifications are generally considered to the learning rule in (1). The first is to consider incremental updates, where the weight vector is updated one data-point at a time. The second modification arises from noting that the output of the perceptron remains unchanged if w is multiplied by a constant, which allows us to consider a step-size η = 1. The training process for the perceptron can thus be summarized as follows. Given the data-set D = {(x n, y n ), n = 1,..., N}, 1. For each pair (x n, y n ) D, if w x n y n > 0, move to the next pair. 2. Otherwise, adjust w according to the learning rule: w w + x n y n. (2) While the training rule for perceptrons is straightforward to implement, perceptrons are restricted to linear decision boundaries, which means that they are unable to learn classifiers for data that is not linearly separable. 2 Multilayer Perceptron A multilayer perceptron (MLP), also known as feed-forward neural network or multilayer feedforward network, is a network of densely connected units 3

4 x 0 x 1 Inputs Input units Hidden units Output unit x 2 x 3 Figure 3: Example of a multilayer perceptron with two input units, four hidden units and one output unit. σ a Figure 4: Sigmoid threshold function σ(a) = 1 1+exp( a). similar to the perceptron discussed in Section 1 and having a non-linear threshold function. The units in a MLP are arranged in layers units connected directly to the inputs of the network constitute the input layer of the network, while those whose output corresponds to the output of the network constitute the output layer of the network. All other intermediate layers are referred as hidden layers. An example of a multilayer perceptron is depicted in Fig. 3. Multilayer perceptrons are able to represent a much richer set of functions than those representable using a single perceptron. In fact, the universal approximation theorem states that a multilayer perceptron with a single hidden layer that contains finite number of hidden neurons and with an arbitrary activation function can approximate with an arbitrarily small error any continuous function defined over any compact subset of R p. Each unit in a MLP is similar to the perceptron surveyed in Section 1 and depicted in Fig. 1. However, for purposes of training, it is convenient that the output(s) of the network are differentiable functions of the inputs, for which reason the neurons in a MLP usually are defined with differential threshold functions. A common threshold function is the logistic sigmoid function, 1 σ(a) = 1 + exp( a), depicted in Fig. 4. 4

5 x 0 w 0i1 Unit i 1 a i1 z i1 w i1 j 1 Unit j 1 a j1 z j1 w 0i2 w i1 j 2 w j1 k Unit k a k ŷ = z k w 1i1 x 1 w 1i2 Unit i 2 a i2 w i2 j 1 Unit j 2 z i2 w i2 j a 2 j2 w j2 k z j2 Figure 5: Artificial neural network with 1 hidden layer. The output of the network can be computed by propagating the input information throughout the network, in a process known as forward propagation. To illustrate this process, consider the ANN model depicted in detail in Fig. 5. We denote by w i the weights associated with unit i and by w ij the weight associated with the connection between the output of unit i and unit j. Given the input vector x to the network, the output of input units i 1 and i 2 is given by z i1 = σ(w i 1 x) z i2 = σ(w i 2 x), which corresponds to the forward propagation of the input x through the first layer in the network. The two outputs z i1 and z i2 now act as inputs for the second layer in the network. Letting z i = [z i1, z i2 ], it follows that z j1 = σ(w j 1 z i ) z j2 = σ(w j 2 z i ), which corresponds to the propagation of z i through the second layer in the network. Finally, the output of the network is given by ŷ(x) = z k = σ(w k z j) where, as before, we defined z j = [z j1, z j2 ]. 2.1 The Back-propagation Algorithm As before, to determine the process by which a MLP can be trained, it is necessary to define an error function with respect to which the performance of the MLP is to be measured. Given a data-set D = {(x n, y n ), n = 1,..., N}, we adopt the error function E(w) = 1 2 N (ŷ(xn ) 2. ) y n n=1 We have, for simplicity, considered the case where there is one single output ŷ to the network, but the reasoning can be trivially replicated to accommodate vector outputs. 5

6 z i... z l w ij w lj w jk. Unit k... a j z j ŷ(x) Figure 6: Unit j in the network. To minimize the error E, we again adopt a gradient descent approach, where weights in the networ, w, are adjusted according to the rule w w η w E(w), where w E(w) denotes the gradient of the error function with respect to the weights in the network. The back-propagation algorithm allows for a simple and efficient way of propagating the error information backwards in the network, allowing for successive updates of the weights from the output to the input. To derive the back-propagation learning rule, we start by writing the error function as E(w) = N E n (w), n=1 with E n (w) = 1 2(ŷ(xn ) y n ) 2. As with the perceptron, the updates to the weights can be done incrementally, using one data-point at a time, using instead the update rule w w η w E n (w). It remains to determine the gradient w E(w). Let us then focus on one particular unit in the network say, unit j and determine the components of w E n (w) comprising the derivatives of E n (w) with respect the weights in unit j, w ij (see Fig. 6). We start by writing the derivative with respect to w ij as E n = E n. w ij w ij To simplify the notation, we henceforth write δ j = E n. 6

7 Moreover, it follows from the definition of activation that Combining the two, we get w ij = z i. E n w ij = δ j z i. (3) Let us now compute the term δ j. If j is the output unit, then and we immediately get E n = 1 2 (σ(a j) y n ) 2, δ j = σ (a j )(σ(a j ) y n ) (4) which, in the case of the logistic sigmoid function, yields δ j = σ(a j )(1 σ(a j ))(σ(a j ) y n ). On the other hand, if j is not an output function, E n depends on a j through all units k to which unit j is connected (see Fig. 6). In other words, Finally, we have that and thus E n = K k=1 E n a k a k = K k=1 a k = w jk σ (a j ), δ j = σ (a j ) δ k a k. K w jk δ k. (5) Note that, as evidenced in (5), the derivative δ j for unit j can be computed by propagating the derivatives δ k of the subsequent nodes through the network. In conclusion, the back-propagation algorithm can be summarized as follows. Given the data-set D = {(x n, y n ), n = 1,..., N}, k=1 1. For each pair (x n, y n ) D forward propagate the input x n through the network to compute ŷ(x n ). In this process, compute the activations a j for all hidden and output units. 2. Evaluate δ j for the output units using (4). 3. Back-propagate the δ s using (5), determining δ j for all hidden units in the network. 7

8 4. For all nodes in the network, compute the derivatives En w ij using (3). 5. Update each weight w ij using the rule References w ij w ij η E n w ij. (6) [1] Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer Science, [2] Simon Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 2nd edition, [3] Tom M. Mitchell. Machine Learnin. McGraw-Hill,

### ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

### Artificial Neural Networks

Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

### Multilayer Perceptron

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

### Machine Learning

Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

### y(x n, w) t n 2. (1)

Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

### Artificial Neural Network

Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

### The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

### Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

### Neural networks. Chapter 20. Chapter 20 1

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

### Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

### Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

### AI Programming CS F-20 Neural Networks

AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

### Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

### Neural networks. Chapter 19, Sections 1 5 1

Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

### What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

### Revision: Neural Network

Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

### 4. Multilayer Perceptrons

4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

### Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

### Unit III. A Survey of Neural Network Model

Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

### Artificial Intelligence

Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

### Machine Learning

Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

### Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

### SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

### Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

### Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

### Neural Networks DWML, /25

DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

### Simple Neural Nets For Pattern Classification

CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

### From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

### COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

### Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

### Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

### Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

### 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

### Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

### Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

### ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

### CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

### (Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

### Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

### Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

### CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

### Neural networks. Chapter 20, Section 5 1

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

### Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

### Multi-layer Neural Networks

Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

### Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

### Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

### Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

### PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

PMR5406 Redes Neurais e Aula 3 Single Layer Percetron Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrije Unviersity Architecture We consider

### Deep Feedforward Networks

Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

### The Perceptron. Volker Tresp Summer 2014

The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

### Linear & nonlinear classifiers

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

### Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

### Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

### Nonlinear Classification

Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

### Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

### NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

### CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

### Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

### Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

### Pattern Classification

Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

### Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

### Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

### Neural Network Training

Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

### Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

### Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

### Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

### ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

### Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

### Multilayer Perceptron

Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

### AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

### Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

### Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

### Introduction to Neural Networks

Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

### Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

### Linear Models for Classification

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

### Introduction to Machine Learning

Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

### Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

### Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

### Linear discriminant functions

Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

### Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

### Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

### Learning and Neural Networks

Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

### CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

### Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

### In the Name of God. Lecture 11: Single Layer Perceptrons

1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

### Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

### Machine Learning. Neural Networks

Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

### Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

### Artificial Neural Networks

Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

### The Perceptron. Volker Tresp Summer 2016

The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

### Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

### Lecture 17: Neural Networks and Deep Learning

UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

### DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

### Artificial Neural Networks

Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

### COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

### Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors