Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Similar documents
ECE521 Lectures 9 Fully Connected Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Course 395: Machine Learning - Lectures

Feedforward Neural Nets and Backpropagation

Multilayer Perceptron = FeedForward Neural Network

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 20, Section 5 1

Machine Learning Lecture 12

CSC321 Lecture 5: Multilayer Perceptrons

Neural networks. Chapter 19, Sections 1 5 1

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Sections 18.6 and 18.7 Artificial Neural Networks

ECE521 Lecture 7/8. Logistic Regression

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks. Chapter 20. Chapter 20 1

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning Lecture 10

Neural Networks and Deep Learning

Learning Deep Architectures for AI. Part I - Vijay Chakilam

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Introduction to Neural Networks

Machine Learning. Neural Networks

Lecture 4: Feed Forward Neural Networks

Unit III. A Survey of Neural Network Model

From perceptrons to word embeddings. Simon Šuster University of Groningen

Neural Networks (Part 1) Goals for the lecture

Introduction to Neural Networks

Multilayer Perceptron

Intelligent Systems Discriminative Learning, Neural Networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Lecture 17: Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks and the Back-propagation Algorithm

Computational Intelligence Winter Term 2017/18

Multilayer Perceptron

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Introduction to Machine Learning

Artificial Intelligence

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Computational Intelligence

STA 414/2104: Lecture 8

Introduction To Artificial Neural Networks

Simple Neural Nets For Pattern Classification

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Introduction to Convolutional Neural Networks (CNNs)

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Introduction to Artificial Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

CS 4700: Foundations of Artificial Intelligence

y(x n, w) t n 2. (1)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Artificial Neural Networks The Introduction

Deep Neural Networks (1) Hidden layers; Back-propagation

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Deep Feedforward Networks. Sargur N. Srihari

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

EEE 241: Linear Systems

Artificial Neural Networks. Historical description

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Neural Networks: Introduction

Nonlinear Classification

CS 4700: Foundations of Artificial Intelligence

Statistical NLP for the Web

CSC242: Intro to AI. Lecture 21

Artificial Neural Networks. MGS Lecture 2

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

CSC 411 Lecture 10: Neural Networks

Introduction to Deep Learning

Deep Learning Lab Course 2017 (Deep Learning Practical)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Pattern Classification

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Grundlagen der Künstlichen Intelligenz

The Multi-Layer Perceptron

CSC321 Lecture 4: Learning a Classifier

STA 414/2104: Lecture 8

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Feed-forward Network Functions

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Transcription:

Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1

Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost function Optimization of NN parameters Back-propagation (a gradient descent method) Present and future 2

A Formal Neuron / Perceptron (1) Binary-valued threshold neuron (McCulloch and Pitts 49) 1 0 1 0,, input,, weights bias 1, 1 output Given the weights and the bias, theneuron produces an output 1, 1for any input. Note: This is a linear classifier, can be learned by the Perceptron Algorithm or SVM methods. 3

A Formal Neuron / Perceptron (2) As usual, put the bias term into the weights :! " 1#! # net activation!# activation 1,,, % input ",,, % weights 1, 1 modified sign function 1, 1 output 4

Neuron Why That Name? A single neuron combinesseveral inputs to an output Neurons are layered (outputs of neurons are used as inputs of other neurons) A simple neuron model: inputs output 5

Historical perspective Perceptron (Rosenblatt, 1956) with its simple learning algorithm generated a lot of excitement Minsky and Papert(1969) showed that even a simple XOR cannot be learnt by a perceptron, this lead to skepticism The problem was solved by layering the perceptronsto a network (Multi-Layer Perceptron, MLP) 6

Graded Activation Function ( Historically, the commonly used activation function( )is the sigmoid (cf. logistic regression) = 1 1+) *+ Its crucial properties are: It is non-linear : if the activation function were linear, the multi-layer network could be rewritten (and would work the same as) a single-layer one Differentiable : useful for fitting the coefficients of NN by gradient optimization 7

Three-Layer Neural Network (1/2) Input Layer 1, input Layer 2, hidden Layer 3, out Output Each neuron is a lin. combination of its inputs (incl. the bias term), followed by a non-linear transformation. A 2-4-1 net, (-),-,- Layer 2 Weights between Layer 1 and 2 8

Three-Layer Neural Network (2/2) Layer 1, input Layer 2, hidden Layer 3, out Generalization: multidimensional output. Notation: / =01, 1 / 2. 9

Three-Layer Neural Network (2/2) Layer 1, input Generalization: multidimensional output. Notation: / =01, 1 / 2. All just works: Given / (input) 3 - =4,- / / - =01, 3-1 3 2 4 -,2 / - / 2!3 2 # ( output) Layer 2, hidden / 4,- 3 - / - 4 -,2 Layer 3, out 3 2 / 2 Note: 3, -, ( is applied element-wise) 10

Layer 1, input Multilayer perceptron (MLP) Feed-forward computation Init: / () =01, 1 Loop: for == 1: K 1 3!?%# 4?,?% /!?# /!?%# 01, 3?% 1 End:.= / 8 6-Layer Neural Network Layer 2, hidden / 4,- 3 - / - Operator : ;% ;!< ",, < ; #!<,, < ; # Layer 9, out 3 8 / 8 11

Function approximation by a MLP Consider a simple case of 9-layer NN with a single output neuron Such NN partitions space to two subsets R and R - 2-Layer NN: linear boundary 9-Layer NN: can approximate between R and R - increasingly more complex functions with increasing 9 Images taken from Duda, Hart, Stork: Pattern Classification Note: Remember the Adaboostexample with weak linearclassifiers? The strong classifier has beenconstructed as a linear combination of these. Thisis similar to what happens inside a 3-layer NN. 12

Regression, Classification, Learning (1) NNs can be employed for function approximation. Approximation from sample (training) points is the regression problem. Classification can be approached as a special case of regression. So far, the weight matrices 4have been assumed to be already known. Learning the weight matrices is formulated as an optimization problem. Given the training set A={,.,=1..B}, we optimize H C DEDFG ({4})= C(.,.( 4, )), where.( 4, )is the output of NN for, and C(, )is the cost function. 13

Regression, Classification, Learning (2) For a 2-class classification, the last layer has one neuron, and the output.( 4, )is thus 1-dimensional. For 9-class classification, a common choice is to encode the class by an I-dimensional vector: = 0,0,,1,, 0 J, 1at =-thcoordinate if belongs to =-thclass. 14

Regression, Classification, Learning (3) A frequent choice for C(, )is the quadratic loss: C.,.( 4,) = 1 2. 4,. - Other possibility: cross entropy, etc. 15

Graded Activation Function ( Ready to optimize C DEDFG? H C DEDFG ({4})=KC(.,.( 4, )) C(, ) is a quadratic loss (no problem). 8 is a composition of two types of functions: Linear combination (no problem) Activation function ( ) must be differentiable (modified signum function is not) 16

Learning: Minimize L 4 M =argmin C DEDFG({4})= argmin KC.,.( 4, ) 4 4 Apply gradient descent. H Compute gradient / partial derivatives w.r.t. all weights: J total N J(x i ) = w pq (k,k+1) w pq (k,k+1) i=0 17

Gradient of L(1/4) Example for NN with number of layers 9=3, output dimensionality U, and quadratic loss function: J(x) w (k,k+1) pq = = D j=1 D j=1 [y(w,x) y] j [y(w,x)] j w (k,k+1) pq [y(w,x) y] j D j a (3) j (x) w (k,k+1) pq = Note: V is W-thcomponent. Output discrepancy Dep. of W-th output neuron on that weight 18

So, we have that: Thus, for 4 -,2 : Gradient of L(2/4) J(x) w (k,k+1) pq Let us have a look at the gradient patterns, based on some examples (note: is the derivative of, is element-wise multiplication): a (3) j = a(3) j z (3) j f = (z (3) w (2,3) 14 z (3) j w (2,3) 1 )a(2) 4 ifj=1 0 otherwise 14 = D j=1 D j a (3) j (x) w (k,k+1) pq J(x) w (2,3) pq =D p f (z p (3) )a (2) q In vector notation: J(x) W = D f (z (3) ) a (2)T (2,3) 19

So, we have that: a (3) j w (1,2) 30 J(x) w (1,2) pq Gradient of L(3/4) In vector notation: J(x) W = W (2,3)T D f (z (3) ) (1,2) Cf. = a(3) j z (3) j = D j=1 z (3) j a (2) 3 a (2) 3 z (2) 3 J(x) w (k,k+1) pq z (2) 3 w (1,2) 30 = D j=1 D j f (z (3) j )w (2,3) jp f (z p (2) )a (1) q f (z (2) )a (1)T J(x) W = D f (z (3) ) a (2)T (2,3) D j a (3) j (x) w (k,k+1) pq =f (z (3) j )w (2,3) j3 f (z (2) 3 )a(1) 0 20

Define: Y?,?% = Z[ Z4 (\,\]^) output from feed-forward Compute: _ ` =( / `.) M 3 ` _ a = 4 a,` J _ ` M 3 a _ b = 4 b,a J _ a M 3 b _ - 4 -,2 J _ 2 M 3 - Gradient of L(4/4) desired output Notes: 9=9used as an example d= transposition = elementwise multiplication : remove the first vector component Compute gradient of C: Y a,` =_ ` / a J Y b,a =_ a / b J Y,- _ - / J / 4,- 3 - / - 3 8 / 8 21

Back-propagation algorithm (1/2) Given,. A Do forward propagation. compute predicted output for Compute the gradient. Update the weights: 4?,?% 4?,?% +fy?,?% f learning rate Repeat until convergence. Notes: 9=9used as an example d= transposition = elementwise multiplication / 4,- 3 - / - 3 8 / 8 22

Back-propagation algorithm (2/2) Update computation was shown for 1 training sample only for the sake of clarity This variant of weight updates can be used (loop over the training set like in the Perceptron algorithm) Back-propagation is a gradient-based minimization method. Variants: construct the weight update using the entire batch of training data, or use mini-batches as a compromise between exact gradient computation and computational expense The step size (learning rate) could be found by line search algorithm as in standard gradient-based optimization Many variants for the cost function logistic regression-type, regularization term, etc. This will lead to different update rules. 23

NN by back-propagation - properties Advantages: Handles well the problem with multiple classes Can do both classification and regression After normalization, output can be treated as aposteriori probability Disadvantages: No guarantee to reach the global minimum Notes: Ways to choose network structure? Note that we assumed the activation functions to be identical throughout the NN. This is not a requirement though. 24

Deep NNs Deep learning hot topic, unsupervised discovery of features Renaissance of NNs What is different from the past? Massive amounts of data, regularization, sparsity enforcement, drop-out Used in computer vision, speech recognition, general classification problems 25

Deep NNs A common alternative to the sigmoid: RELU (rectified linear unit) 26