Inf2b Learning and Data

Similar documents
Inf2b Learning and Data

Speech and Language Processing

Multi-layer Neural Networks

Introduction to Neural Networks

Inf2b Learning and Data

y(x n, w) t n 2. (1)

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Reading Group on Deep Learning Session 1

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Neural Networks (Part 1) Goals for the lecture

Artificial Neural Networks. Historical description

Networks of McCulloch-Pitts Neurons

Neural Networks and Deep Learning

Stochastic gradient descent; Classification

Neural networks. Chapter 20. Chapter 20 1

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Course 395: Machine Learning - Lectures

Introduction Biologically Motivated Crude Model Backpropagation

Machine Learning Lecture 5

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Artificial Neural Networks 2

Artificial Neural Networks. MGS Lecture 2

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning Lecture 10

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Introduction to Deep Learning

Introduction to Deep Learning

Introduction to Neural Networks

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks and Deep Learning.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Deep Learning: a gentle introduction

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Network Training

NEURAL NETWORKS

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Introduction to Convolutional Neural Networks (CNNs)

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Deep Learning II: Momentum & Adaptive Step Size

Pattern Recognition and Machine Learning. Artificial Neural networks

Machine Learning. Neural Networks

Deep neural networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Artificial Neural Networks

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Deep Neural Networks

Artificial Neural Networks The Introduction

Introduction to Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Perceptron

Deep Learning. Ali Ghodsi. University of Waterloo

Multilayer Perceptrons and Backpropagation

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

ECE521 Lecture 7/8. Logistic Regression

4. Multilayer Perceptrons

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning Lecture 12

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

DEEP LEARNING PART ONE - INTRODUCTION CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 7

Supervised Learning. George Konidaris

Deep Learning & Neural Networks Lecture 4

Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

STA 414/2104: Lecture 8

Neural Networks: Introduction

CSC Neural Networks. Perceptron Learning Rule

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CSE 546 Midterm Exam, Fall 2014

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Lecture 4: Feed Forward Neural Networks

Neural networks and support vector machines

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning

Part 8: Neural Networks

Multilayer Neural Networks

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Transcription:

Inf2b Learnin and Data Lecture 2 (Chapter 2): Multi-laer neural netorks (Credit: Hiroshi Shimodaira Iain Murra and Steve Renals) Centre for Speech Technolo Research (CSTR) School of Informatics Universit of Edinburh http://.inf.ed.ac.uk/teachin/courses/inf2b/ https://piazza.com/ed.ac.uk/sprin28/infr89learnin Office hours: Wednesdas at 4:-5: in IF-3.4 Jan-Mar 28 Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks

Toda s Schedule Sinle-laer netork ith a sinle output node (recap) 2 Sinle-laer netork ith multiple output nodes 3 Activation functions 4 Multi-laer neural netork 5 Overfittin and eneralisation 6 Deep Neural Netorks Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 2

Sinle-laer netork ith a sinle output node (recap) Activation function: D = (a) = ( i i ) (a) = i= + ep( a) Trainin set : D = {( n, t n )} N n= here t n {, } Error function: E() = N ( n t n ) 2 2 n= Optimisation problem (trainin) min E() D D Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 3

Trainin of sinle laer neural netork (recap) Optimisation problem: min E() No analtic solution (no closed form) Emplo an iterative method (requires initial values) e.. Gradient descent (steepest descent), Neton s method, Conjuate radient methods Gradient descent (scalar rep.) (ne) i i η E(), i (η > ) (vector rep.) (ne) η E(), (η > ) Online/stochastic radient descent (cf. Batch trainin) Update the eihts one pattern at a time. (See Note ) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 4

Trainin of the sinle-laer neural netork (cont.) E() = N ( n t n ) 2 = N ( (a n ) t n ) 2 2 2 n= E() i n= here n = (a n ), a n = = E() n N = = n= n a n a n i ( n t n ) (a n) a n N ( n t n ) (a n ) ni n= D i ni, i= a n i a n i = ni Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 5

Sinle-laer netork ith multiple output nodes k k kd K KD i D K output nodes:,..., K. For n = ( n,..., nd ) T, ( D ) = ki ni = (a nk ) nk i= D a nk = ki ni i= Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 6

Sinle-laer netork ith multiple output nodes Trainin set : D = {(, t ),..., ( N, t N )} here t n = (t n,..., t nk ) and t nk {, } Error function: E() = N n t n 2 = N K ( nk t nk ) 2 2 2 n= n=k= N = E n, here E n = K ( nk t nk ) 2 2 n= Trainin b the radient descent: k= ki ki η E ki, (η > ) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 7

The derivatives of the error function (sinle-laer) E n = K ( nk t nk ) 2 k K 2 k= nk = (a nk ) d KD a nk = ki ni i= E n ki = E n nk nk a nk a nk ki = ( nk t nk ) (a nk ) ni i D Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 8

Notes on Activation functions k k kd K KD i D Interpretation of output values Normalisation of the output values Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 9

Output of loistic simoid activation function Consider a sinle-laer netork ith a sinle output node loistic simoid activation function: ( D = (a) = + ep( a) = ) i i i= = + ep ( D ) i= i i D D Consider a to class problem, ith classes C and C 2. The posterior probabilit of C : P(C ) = p( C ) P(C ) = p() = + p( C 2) P(C 2 = ) p( C ) P(C ) p( C ) P(C ) p( C ) P(C ) + p( C 2 ) P(C 2 ) + ep ( ln p( C ) P(C )) p( C 2 ) P(C 2 ) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks

Approimation of posterior probabilities P(S)=.5, P(T)=.5.8.8 p(s ) p(t ) h(z) = / (+ep(-z)).6.4.2 p().6.4.2-6 -4-2 2 4 6 z Loistic simoid function (a) = + ep( a) 5 5 2 Posterior probabilities of to classes ith Gaussian distributions: Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks

Normalisation of output nodes Outputs ith simoid activation funtion: K k k= k = (a k ) = + ep( a k ), a k = D ki i i= k k kd K KD Softma activation function for (): k = ep(a k) K l= ep(a l ) i D Properties of the softma function (i) k K (ii) k = k= (iii) differentiable (iv) k P(C k ) = p( C k)p(c k ) K l= p( C k)p(c k ) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 2

Some questions on activation functions Is the loistic simoid function necessar for sinle-laer sinle-output-node netork? No, in terms of classification. We can replace it ith (a) = a. Hoever, decision boundaries can be different. (NB: A linear decision boundar (a =.5) is formed in either case.) What benefits are there in usin the loistic simoid function in the case above? The output can be rearded as a posterior probabilit. Compared ith a linear output node ((a) = a), loistic reression normall forms a more robust decision boundar aainst noise. What benefits are there in usin nonlinear activation functions in multi-laer neural netorks? Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 3

Loistic simoid vs a linear output node Binar classification problem ith the least squares error (LSE): (a) = vs (a) = a + ep( a) 4 2 2 4 6 8 4 2 2 4 6 8 (after Fi 4.4b in PRML C. M. Bishop (26)) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 4

Multi-laer neural netorks Multi-laer perceptron (MLP) Hidden-to-output eihts: (2) kj (2) kj η E (2) kj Input-to-hidden eihts: () ji () ji η E () ji (2) z z h z h (2) KM () () MD k z h j i K M D Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 5

Trainin of MLP 94s Warren McCulloch and Walter Pitts : threshold loic Donald Hebb : Hebbian learnin 957 Frank Rosenblatt : Perceptron 969 Marvin Minsk and Semour Papert : limitations of neural netorks 98 Kunihiro Fukushima: Neoconitoron 986 D. Rumelhart, G. Hinton, and R. Williams, Learnin representations b back-propaatin errors (974, Paul Werbos) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 6

The derivatives of the error function (to-laers)(ne) E n = K ( nk t nk ) 2 k K 2 k= M nk = (a nk ), a nk = (2) (2) kj z (2) nj KM j= z z zj zm D z nj = h(b nj ), b nj = () ji ni h h h i= () () E n ji = E n nk a nk (2) nk a nk kj (2) kj E n () ji = ( nk t nk ) (a nk ) z nj = E n z nj b ( nj K = b nj () ji k= = z nj ( K k= ( nk t nk ) (a nk ) (2) kj ( nk t nk ) nk z nj ) h (b nj ) ni ) h (b nj ) ni Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 7 i () MD D

Error back propaation (NE) E n (2) kj E n () ji = E n nk nk a nk a nk (2) kj = ( nk t nk ) (a nk ) z nj = δ (2) nk z nj, δ (2) nk = E n = E n = z nj ( K k= z nj b nj b nj () ji ( nk t nk ) (a nk ) (2) kj ( K ) = δ (2) nk (2) kj h (b nj ) ni k= (2) z () a () nk ji ) h (b nj ) ni z h k z h j i K z h (2) KM M () MD D Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 8

Problems ith multi-laer neural netorks Still difficult to train Computationall ver epensive (e.. eeks of trainin) Slo converence ( vanishin radients ) Difficult to find the optimal netork topolo Poor eneralisation (under some conditions) Ver ood performance on the trainin set Poor performance on the test set Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 9

Overfittin and eneralisation Eample of curve fittin b a polnomial function: M (, ) = + + 2 2 +... + M M = k k k= M = M = 3 M = 9 t t t (after Fi.4 in PRML C. M. Bishop (26)) cf. memorisin the trainin data Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 2

Breakthrouh ( ) 957 Frank Rosenblatt : Perceptron 986 D. Rumelhart, G. Hinton, and R. Williams: Backpropaation 26 G. Hinton etal (U. Toronto) Reducin the dimensionalit of data ith neural netorks, Science. 29 J. Schmidhuber (Siss AI Lab IDSIA) Winner at ICDAR29 handritin reconition competition 2- man papers from U.Toronto, Microsoft, IBM, Goole,... What s the ideas? Pretrainin A sinle laer of feature detectors Stack it to form several hidden laers Fine-tunin, dropout GPU Convolutional netork (CNN), Lon short-term memor (LSTM) Rectified linear unit (ReLU) Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 2

Breakthrouh ( ) Phone error rate [%] Speaker-independent phonetic reconition on TIMIT 3 28 26 24 22 2 8 99 995 2 25 2 25 Year Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 22

Summar Trainin of sinle-laer netork Activation functions Approimation of posterior probabilities Simoid function (for sinle output node) Softma function (for multiple output nodes) Trainin of multi-laer netork ith error back propaation A ver ood reference: http://neuralnetorksanddeeplearnin.com/ Inf2b Learnin and Data: Lecture 2 (Chapter 2) Multi-laer neural netorks 23