Introduction to Neural Networks

Similar documents
Multi-layer Neural Networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Stochastic gradient descent; Classification

Reading Group on Deep Learning Session 1

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning Lecture 5

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

y(x n, w) t n 2. (1)

Artificial Neural Networks. MGS Lecture 2

Deep Neural Networks (1) Hidden layers; Back-propagation

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Network Training

Neural Networks and Deep Learning

Feed-forward Network Functions

Deep Neural Networks (1) Hidden layers; Back-propagation

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Logistic Regression & Neural Networks

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Inf2b Learning and Data

Neural Networks and the Back-propagation Algorithm

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Why DNN Works for Acoustic Modeling in Speech Recognition?

Course 395: Machine Learning - Lectures

Speech and Language Processing

Lecture 17: Neural Networks and Deep Learning

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models

Multilayer Perceptron

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Logistic Regression. COMP 527 Danushka Bollegala

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning. 7. Logistic and Linear Regression

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Artificial Neural Networks

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

ECE521 Lectures 9 Fully Connected Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Artificial Neural Networks

Speech recognition. Lecture 14: Neural Networks. Andrew Senior December 12, Google NYC

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Multiclass Logistic Regression

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Pattern Recognition and Machine Learning

Multilayer Perceptrons (MLPs)

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Ch 4. Linear Models for Classification

Deep Neural Networks

Linear Models for Classification

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

4. Multilayer Perceptrons

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Computational statistics

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Machine Learning Lecture 10

Lecture 2: Logistic Regression and Neural Networks

Automatic Speech Recognition (CS753)

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Artificial Neural Networks

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Machine Learning. Neural Networks

Artificial Neural Networks 2

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Neural networks. Chapter 20, Section 5 1

Multilayer Neural Networks

Multilayer Perceptrons and Backpropagation

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

NEURAL NETWORKS

Lecture 4: Perceptrons and Multilayer Perceptrons

Intro to Neural Networks and Deep Learning

Linear & nonlinear classifiers

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Advanced statistical methods for data analysis Lecture 2

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Introduction to Machine Learning

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

CSC242: Intro to AI. Lecture 21

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Statistical NLP for the Web

Neural Networks Lecture 4: Radial Bases Function Networks

Lecture 5 Neural models for NLP

Transcription:

Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1

Neural networks for speech recognition Introduction to Neural Networks Training feed-forward networks Hybrid neural network / HMM acoustic models Neural network features Tandem, posteriorgrams Deep neural network acoustic models Neural network language models ASR Lecture 10 Introduction to Neural Networks 2

Neural networks for speech recognition Introduction to Neural Networks Training feed-forward networks Hybrid neural network / HMM acoustic models Neural network features Tandem, posteriorgrams Deep neural network acoustic models Neural network language models ASR Lecture 10 Introduction to Neural Networks 3

Neural network acoustic models Output layer Hidden layer H Hidden layer H-1... Hidden layer 1 Input layer Input layer takes several consecutive frames of acoustic features Output layer corresponds to classes (e.g. phones, HMM states) Multiple non-linear hidden layers between input and output Neural networks also called multi-layer perceptrons ASR Lecture 10 Introduction to Neural Networks 4

NN vs GMM Output layer Hidden layer H Hidden layer H-1... Hidden layer 1 Input layer Potential deep structure through multiple hidden layers (rather than a single layer of GMMs) Operates on multiple frames of input (rather than a single frame) One big network for everything (rather than one HMM per phone) ASR Lecture 10 Introduction to Neural Networks 5

Layered neural networks: structure (1) w (2) 10 + Outputs + y1 y K w (2) KM Bias + Hidden + z 0 z 1 z M w (1) 10 w (1) Md Bias Inputs x 0 x 1 x d Neural network with one hidden layer ASR Lecture 10 Introduction to Neural Networks 6

Layered neural networks: structure (2) d input units, M hidden units, K output units Hidden layer: each of M units takes a linear combination of the inputs x i : d b j = w (1) ji x i b j : activations w (1) ji : first layer of weights i=0 ASR Lecture 10 Introduction to Neural Networks 7

Layered neural networks: structure (2) d input units, M hidden units, K output units Hidden layer: each of M units takes a linear combination of the inputs x i : d b j = w (1) ji x i i=0 b j : activations w (1) ji : first layer of weights Activations transformed by a nonlinear activation function h (e.g. a sigmoid): z j = h(b j ) = 1 1 + exp( b j ) z j : hidden unit outputs ASR Lecture 10 Introduction to Neural Networks 7

Logisitic sigmoid activation function g(a) = 1 (1 + exp( a)) 1 Logistic sigmoid activation function g(a) = 1/(1+exp( a)) 0.9 0.8 0.7 0.6 g(a) 0.5 0.4 0.3 0.2 0.1 0 5 4 3 2 1 0 1 2 3 4 5 a ASR Lecture 10 Introduction to Neural Networks 8

Layered neural networks: structure (3) Outputs of the hidden units are linearly combined to give activations of the output units: a k = M j=0 w (2) kj z j ASR Lecture 10 Introduction to Neural Networks 9

Layered neural networks: structure (3) Outputs of the hidden units are linearly combined to give activations of the output units: a k = M j=0 w (2) kj z j Output units can be sigmoids, but it is better if they use the softmax activation function: y k = g(a k ) = exp(a k ) K l=1 exp(a l) ASR Lecture 10 Introduction to Neural Networks 9

Layered neural networks: structure (3) Outputs of the hidden units are linearly combined to give activations of the output units: a k = M j=0 w (2) kj z j Output units can be sigmoids, but it is better if they use the softmax activation function: y k = g(a k ) = exp(a k ) K l=1 exp(a l) Softmax enforces sum-to-one across outputs ASR Lecture 10 Introduction to Neural Networks 9

Layered neural networks: structure (3) Outputs of the hidden units are linearly combined to give activations of the output units: a k = M j=0 w (2) kj z j Output units can be sigmoids, but it is better if they use the softmax activation function: y k = g(a k ) = exp(a k ) K l=1 exp(a l) Softmax enforces sum-to-one across outputs If output unit k corresponds to class C k, then interpret outputs of trained network as posterior probability estimates P(C k x) = y k ASR Lecture 10 Introduction to Neural Networks 9

Layered neural networks: training Weights w ji are the trainable parameters Train the weights by adjusting them to minimise a cost function which measures the error the network outputs y n k (for the nth frame) compared with the target output t n k Sum squared error E n = 1 2 K tk n yk n 2 k=1 For multiclass classification, the natural cost function is (negative) log probability of the correct class E n = K tk n log yk n k=1 E = n E n Optimise the cost function using gradient descent (back-propagation of error backprop) ASR Lecture 10 Introduction to Neural Networks 10

Gradient descent Gradient descent can be used whenever it is possible to compute the derivatives of the error function E with respect to the parameters to be optimized W Basic idea: adjust the weights to move downhill in weight space Weight space: space defined by all the trainable parameters (weights) Operation of gradient descent: 1 Start with a guess for the weight matrix W (small random numbers) 2 Update the weights by adjusting the weight matrix in the direction of W E. 3 Recompute the error, and iterate The update for weight w ki at iteration τ + 1 is: w τ+1 ki = w τ ki η E w ki The parameter η is the learning rate ASR Lecture 10 Introduction to Neural Networks 11

Using neural networks for acoustic modelling Hybrid NN/HMM systems Basic idea: in an HMM, replace the GMMs used to estimate output pdfs with the outputs of neural networks Transform NN posterior probability estimates to scaled likelihoods by dividing by the relative frequencies in the training data of each class P(x t C k ) P(C k x t ) P train (C k ) = y k P train (C k ) NN outputs correspond to phone classes or HMM states Tandem features Use NN probability estimates as an additional input feature stream in an HMM/GMM system (posteriograms) ASR Lecture 10 Introduction to Neural Networks 12

End of part 1 Next two lectures Hybrid NN/HMM systems Tandem features Deep neural networks Neural network language models Reading N Morgan and H Bourlard (May 1995). Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Magazine, 12(3), 24-42. Christopher M Bishop (2006). Pattern Recognition and Machine Learning, Springer. Chapters 4 and 5. ASR Lecture 10 Introduction to Neural Networks 13