What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Similar documents
Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation

Multi-layer Neural Networks

Introduction to Neural Networks

Stochastic gradient descent; Classification

ECE521 Lectures 9 Fully Connected Neural Networks

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks and the Back-propagation Algorithm

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Lecture 17: Neural Networks and Deep Learning

Learning from Data: Multi-layer Perceptrons

Machine Learning Lecture 12

From perceptrons to word embeddings. Simon Šuster University of Groningen

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artificial Neural Networks

Multilayer Perceptron

Machine Learning Lecture 10

Jakub Hajic Artificial Intelligence Seminar I

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 9: Generalization

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning Basics

Intro to Neural Networks and Deep Learning

Based on the original slides of Hung-yi Lee

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural networks (NN) 1

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CSC321 Lecture 9: Generalization

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Artificial Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Multilayer Perceptrons (MLPs)

Modularity Matters: Learning Invariant Relational Reasoning Tasks

Course 395: Machine Learning - Lectures

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Intelligent Systems Discriminative Learning, Neural Networks

Understanding How ConvNets See

CSC321 Lecture 15: Exploding and Vanishing Gradients

Neural Networks and Deep Learning

Neural networks. Chapter 20. Chapter 20 1

How to do backpropagation in a brain

Machine Learning Lecture 5

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Deep Learning II: Momentum & Adaptive Step Size

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Deep Learning (CNNs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural networks and support vector machines

Linear and Logistic Regression. Dr. Xiaowei Huang

Logistic Regression & Neural Networks

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

y(x n, w) t n 2. (1)

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Machine Learning. Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

Artificial Neural Networks. MGS Lecture 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Inf2b Learning and Data

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

MLPR: Logistic Regression and Neural Networks

Neural Network Training

Deep learning attracts lots of attention.

Supervised Learning. George Konidaris

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Classification with Perceptrons. Reading:

Multilayer Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

CS 4700: Foundations of Artificial Intelligence

Convolutional Neural Networks. Srikumar Ramalingam

Multilayer Neural Networks

CSCI567 Machine Learning (Fall 2018)

Linear & nonlinear classifiers

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Deep Learning Lab Course 2017 (Deep Learning Practical)

Restricted Boltzmann Machines

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Ch.6 Deep Feedforward Networks (2/3)

Input layer. Weight matrix [ ] Output layer

Convolutional Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Logistic Regression. COMP 527 Danushka Bollegala

Advanced statistical methods for data analysis Lecture 2

Neural networks. Chapter 20, Section 5 1

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Transcription:

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2

What Do Single Layer Neural Networks Do? MLP Lecture 3 Multi-layer networks 3

Single layer network Single-layer network, 1 output, 2 inputs + bias + x 0 x 1 x 2 MLP Lecture 3 Multi-layer networks 4

Geometric interpretation Single-layer network, 1 output, 2 inputs + bias x 2 w y(w; x) =0 x 1 w 0 w Bishop, sec 3.1 MLP Lecture 3 Multi-layer networks 5

Example data (three classes) 5 Data 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4 5 MLP Lecture 3 Multi-layer networks 6

Classification regions with single-layer network 5 Plot of Decision regions 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4 5 Single-layer networks are limited to linear classification boundaries MLP Lecture 3 Multi-layer networks 7

Single layer network trained on MNIST Digits 10 Outputs 0 1 2 3 4 5 6 7 8 9 Bias 785x10 weight matrix 784 Inputs + bias 28x28 Output weights define a template for each class MLP Lecture 3 Multi-layer networks 8

Hinton Diagrams Visualise the weights for class k 400 (20x20) inputs MLP Lecture 3 Multi-layer networks 9

2 3 MLP Lecture 3 Multi-layer networks 10 Hinton diagram for single layer network trained on MNIST Weights for each class act as a discriminative template Inner product of class weights and input to measure closeness to each template Classify to the closest template (maximum value output) 0 1

Multi-Layer Networks MLP Lecture 3 Multi-layer networks 11

From templates to features Good classification needs to cope with the variability of real data: scale, skew, rotation, translation,... Very difficult to do with a single template per class Could have multiple templates per task... this will work, but we can do better Use features rather than templates (images from: Michael Nielsen, Neural Networks and Deep Learning, http://neuralnetworksanddeeplearning.com/chap1.html) MLP Lecture 3 Multi-layer networks 12

Incorporating features in neural network architecture Layered processing: inputs - features - classification How to obtain features - learning! 0 1 2 3 4 5 6 7 8 9 Bias MLP Lecture 3 Multi-layer networks 13

Incorporating features in neural network architecture Layered processing: inputs - features - classification How to obtain features - learning! 0 1 2 3 4 5 6 7 8 9 Bias MLP Lecture 3 Multi-layer networks 14

Incorporating features in neural network architecture Layered processing: inputs - features - classification How to obtain features - learning! 0 1 2 3 4 5 6 7 8 9 Bias MLP Lecture 3 Multi-layer networks 15

Multi-layer network y f f k f f Outputs Softmax + + + + a (2) k w (2) k g g h (1) g g Hidden layer + + Sigmoid + + w (1) i a (1) x i Inputs y k = softmax( H r=0 w (2) kr h(1) r ) ; h (1) = sigmoid( d s=0 w (1) s x s ) MLP Lecture 3 Multi-layer networks 16

Multi-layer network for MNIST (image from: Michael Nielsen, Neural Networks and Deep Learning, http://neuralnetworksanddeeplearning.com/chap1.html) MLP Lecture 3 Multi-layer networks 17

Training MLPs: Credit assignment Hidden units make training the weights more complicated, since the hidden units affect the error function indirectly via all the outputs The Credit assignment problem: what is the error of a hidden unit? how important is input-hidden weight w (1) i to output unit k? Solution: Gradient descent requires derivatives of the error with respect to each weight Algorithm: back-propagation of error (backprop) Backprop gives a way to compute the derivatives. These derivatives are used by an optimisation algorithm (e.g. gradient descent) to train the weights. MLP Lecture 3 Multi-layer networks 18

Training MLPs: Error function and required gradients Cross-entropy error function: Required gradients: E n = E n w (2) k C tk n ln y k n k=1 ; E n w (1) i Gradient for hidden-to-output weights similar to single-layer network: ( E n w (2) = E n C ) k a (2) a(2) k E n y c = w k y k c=1 c a (2) a(2) k w k k = (y k t k ) }{{} δ (2) k h (1) MLP Lecture 3 Multi-layer networks 19

Multi-layer network y f f k f f Outputs Softmax + + + + a (2) k w (2) k g g h (1) g g Hidden layer + + Sigmoid + + w (1) i a (1) x i Inputs y k = softmax( H r=0 w (2) kr h(1) r ) ; h (1) = sigmoid( d s=0 w (1) s x s ) MLP Lecture 3 Multi-layer networks 20

Training MLPs: Input-to-hidden weights E n w (1) i = E n a (1) }{{} δ (1) a(1) w (1) i }{{} x i To compute δ (1) = E n / a (1), the error signal for hidden unit, we must sum over all the output units contributions to δ (1) : K δ (1) = c=1 K = = c=1 ( K E n a (2) c δ (2) c a(2) c a (1) a(2) c h (1) δ (2) c w (2) c ) h(1) a (1) h (1) (1 h (1) ) c=1 MLP Lecture 3 Multi-layer networks 21

Training MLPs: Gradients E n w (2) k E n w (1) i = (y k t k ) }{{} = δ (2) k ( k c=1 h (1) δ (2) c w (2) c ) h (1) (1 h (1) ) } {{ } δ (1) x i MLP Lecture 3 Multi-layer networks 22

Back-propagation of error: hidden unit error signal y 1 y Outputs y K w (2) 1 w (2) l w (2) K Hidden units h w (1) i x i MLP Lecture 3 Multi-layer networks 23

Back-propagation of error: hidden unit error signal y 1 y Outputs y K (2) 1 (2) ` w (2) 1 w (2) l w (2) K (2) K Hidden units h w (1) i x i MLP Lecture 3 Multi-layer networks 23

Back-propagation of error: hidden unit error signal y 1 y Outputs y K (2) 1 (2) ` w (2) 1 w (2) l w (2) K (2) K Hidden units h (1) = X` w (1) i (2) l w`! h (1 h ) x i MLP Lecture 3 Multi-layer networks 23

Back-propagation of error The back-propagation of error algorithm is summarised as follows: 1 Apply an input vectors from the training set, x, to the network and forward propagate to obtain the output vector y 2 Using the target vector t compute the error E n 3 Evaluate the error signals δ (2) k for each output unit 4 Evaluate the error signals δ (1) for each hidden unit using back-propagation of error 5 Evaluate the derivatives for each training pattern, summing to obtain the overall derivatives Back-propagation can be extended to multiple hidden layers, in each case computing the δ (l) s for the current layer as a weighted sum of the δ (l+1) s of the next layer MLP Lecture 3 Multi-layer networks 24

Training with multiple hidden layers y 1 y` Outputs yk (3) 1 w (3) 1k (3) ` w (3) `k w (3) Kk (3) K h (2) 1 (2) 1 w (2) 1 h (2) k w (2) k (2) k w (2) H Hidden (2) H h (2) H h (1) w (1) i (1) Hidden x i Inputs MLP Lecture 3 Multi-layer networks 25

Summary Understanding what single-layer networks compute How multi-layer networks allow feature computation Training multi-layer networks using back-propagation of error Reading: Michael Nielsen, chapter 1 of Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/chap1.html Chris Bishop, Sections 3.1, 3.2, and Chapter 4 of Neural Networks for Pattern Recognition MLP Lecture 3 Multi-layer networks 26