Speech and Language Processing

Similar documents
Introduction to Neural Networks

Lecture 17: Neural Networks and Deep Learning

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Slide credit from Hung-Yi Lee & Richard Socher

Inf2b Learning and Data

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Long-Short Term Memory and Other Gated RNNs

Machine Learning Basics III

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Convolutional Neural Networks

arxiv: v3 [cs.lg] 14 Jan 2018

Neural Architectures for Image, Language, and Speech Processing

Intro to Neural Networks and Deep Learning

Unsupervised Learning

Artificial Neural Networks 2

Lecture 11 Recurrent Neural Networks I

Sequence Modeling with Neural Networks

NLP Programming Tutorial 8 - Recurrent Neural Nets

Deep Learning Recurrent Networks 2/28/2018

Lecture 11 Recurrent Neural Networks I

Recurrent Neural Network

Lecture 3: Pattern Classification. Pattern classification

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Lecture 5 Neural models for NLP

Deep Learning Architectures and Algorithms

Neural Networks Language Models

Neural networks. Chapter 20. Chapter 20 1

Logistic Regression & Neural Networks

NEURAL LANGUAGE MODELS

y(x n, w) t n 2. (1)

Artificial Neuron (Perceptron)

Introduction to Deep Learning

Recurrent and Recursive Networks

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Introduction to RNNs!

STA 414/2104: Machine Learning

CSC321 Lecture 15: Exploding and Vanishing Gradients

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks. Intro to AI Bert Huang Virginia Tech

RECURRENT NETWORKS I. Philipp Krähenbühl

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Final Examination CS 540-2: Introduction to Artificial Intelligence

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Intelligent Systems Discriminative Learning, Neural Networks

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Feedforward Neural Networks

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Reading Group on Deep Learning Session 1

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Artificial Neural Networks. MGS Lecture 2

Introduction to Deep Neural Networks

SGD and Deep Learning

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

STA 4273H: Statistical Machine Learning

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Introduction to Deep Learning

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Backpropagation and Neural Networks part 1. Lecture 4-1

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Recurrent Neural Networks

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

CSE 546 Midterm Exam, Fall 2014

Deep Learning for Automatic Speech Recognition Part II

Deep Recurrent Neural Networks

Pattern Recognition and Machine Learning

Neural networks. Chapter 20, Section 5 1

Automatic Speech Recognition (CS753)

EE-559 Deep learning Recurrent Neural Networks

Hidden Markov Models and Gaussian Mixture Models

Natural Language Processing

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Lab 5: 16 th April Exercises on Neural Networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Neural Networks and Deep Learning

Recurrent Neural Networks. Jian Tang

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

CSC321 Lecture 16: ResNets and Attention

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks 2. 2 Receptive fields and dealing with image inputs

INF Introduction to classifiction Anne Solberg

Machine Learning Lecture 12

CSC 578 Neural Networks and Deep Learning

Deep Feedforward Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Introduction to Convolutional Neural Networks (CNNs)

Transcription:

Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6

Lecture Plan (Shinoaki s part) I gives the first 6 lectures about speech recognition. Through these lectures, the backbone of the latest speech recognition techniques is eplained.. 0/9 (remote) Speech recognition based on GMM, HMM, and N gram. 0/6 (remote) Maimum likelihood estimation and EM algorithm. /5 (@TAIST) Baesian network and Baesian inference 4. /5 (@TAIST) Variational inference and sampling 5. /6 (@TAIST) Neural network based acoustic and language models 6. /6 (@TAIST) Weighted finite state transducer (WFST) and speech decoding

Toda s Topic Answers for the previous eercises Neural network based acoustic and language models

Answers for the Previous Eercises 4

Eercise 4. When p() and = f() are given as follows, obtain distribution q() p( ) 0,, log 0 0, ep d d, ep d q( ) p( ) ep( ) d # of occurrence Histogram of # of occurrence Histogram of 5

Eercise 4. When p() and = f() are given as follows, obtain distribution q() 6 4,, 0, ep ) ( N p, 4 d d 4, 4 ep ) ( ) ( N d d p q # of occurrence Histogram of Histogram of # of occurrence

Eercise 4. Show that N( A B, ) = N( B A,), where N( m,v) is the Gaussian distribution with mean m and variance v 7, ep ep, A B A B B A B A N N

Neural network 8

Multi Laer Perceptron (MLP) Unit of MLP MLP consists of multiple laers of the units i m h i w i i b h: activation function w: weight b:bias Output laer Multiple laers Hidden laers n Input laer 9

Activation Functions Linear function Unit step function h h if 0 0 otherwise hinge function ma0 h, Sigmoid function h ep 0

Softma Function For N variables i, softma function is: h Properties of softma Positive Sum is one Eample i ep( i ) ep( j j 0 h N i h ) i. 0 i Epresses a probabilit distribution Z,, -,, h Z h h, h 0.05, 0.7054, 0.595, Z, 6,8, hz h, h, h 0.987, 0.000, 0.080,

Eercise 5. Let h be a softma function having inputs,,, N. h i ep( i ) ep( j j ) Prove that N i h. 0 i N i h N i j ep( ) ep( ep( i i i j ) j ep( i j ) )

Forward Propagation Compute the output of MLP step b step from the input laer to the output laer E.g. softma laer E.g. sigmoid laer E.g. sigmoid laer Input vector

Parameters of Neural Network The weights and a bias of each unit need training before the network is used w b w w N N h wi i i hw b h: activation function w: weight vector w=(w,w,,w N,b) X: input vector =(,,, N,) The bias b can be regarded as one of the weights whose input takes a constant value.0 4

Principle of NN Training Training set Reference output vector Adjust parameters of MLP so as to minimie the error Output b MLP Input vector 5

Definitions of Errors Sum of square error Used when output laer uses linear functions X, W E W n n t n Cross entrop Used when the output laer is a softma E W t ln X, W n k nk nk n W X t n n n :Set of weights in MLP :Vector of a training sample (input) :Vector of a training sample (output) :Inde of training samples t :Reference output (Takes if unit k nk corresponds to correct output, 0 otherwise) k :Inde of output unit 6

Gradient Descent An iterative optimiation method f() f 0 t t N f t 0 Initial value :Learning rate (small positive value) 7

MLP Training b Gradient Descent Define an error measure E(W) for training samples e.g. EW X, W Initialie parameters W={w, w,, w M } Repeatedl update the parameter set using gradient descent n n t n w i t w t i E W w i w i w i t 8

Chain Rule of Differentiation 9 ) ( ) ( g f g f When,, are scalars: When,, are vectors:,,,,,, Jacobian matri The same rule holds using Jacobian matri

When There Are Branches 0 ) ( ) ( ), ( g g f g f g g f f g Variations: g ) ( C g ) ( (independent of )

Back Propagation(BP) ref Out put Input r 4 f 4, w4 f, w f, w f, w Err E, E.: E.: 4 r 4 soft ma w4 sigmoid w Err 4 obtain value of each node b forward propagation E f 4 f f f r w 4 Err f w Err f w w Obtain derivatives b backward propagation Err f 4 Err f 4 f4 f Err f Err f f f Err f f f Err w Err w Err w Err w Err f 4 4 f4 w4 Err f f w Err f f w Err f f w

Feed Forward Neural Network When the network structure is a DAG, it is called feed forward network The nodes are ordered in a line so that all connections have the same direction The forward/backward propagation can be efficientl applied 4

Eercise 5. When h() and () are given as follows, obtain h ep b a h b a h b a h a b a b a a a b a h h ep ep ep ep ep

Recurrent Neural Network (RNN) Neural network having a feedback Epected to be more powerful modeling performance than feed forward MLP, but the training is more difficult Output Output laer Hidden laers Input laer Dela Input 4

Unfolding of RNN to Time Ais Reference vector sequence D Unfold Through Time Input feature sequence Time 5

Training of RNN b BP Through Time (BPTT) Appl BP to the unfolded network Output (Regard the output sequence as an output) 4 Output sequence 4 h h h h 4 4 h 4 h h h Back propagation Input sequence 4 Input (Regard the input sequence as an input) 6

Long Short Term Memor (LSTM) A tpe of RNN addressing the gradient vanishing problem t c t tanh t Output gate σ tanh σ Tanh laer with affine transform Sigmoid laer with affine transform Dela Dela c t LSTM forget gate tanh Input gate σ c t σ t t c t t t Pointwise multiplication Sum 7

Convolutional Neural Network (CNN) A filter is shifted and applied at different positions Filter () Activation map () 4 5 5 Pooling 5 4 5 Filter () Activation map () Input A tpe of feed forward neural network with parameter sharing and connection constraint Filter () Activation map (N) Net convolution laer etc. Convolution Laer Pooling Laer 8

Deep Neural Network (Just a) Neural network with man hidden laers 5 < # of laers Training was difficult until recentl Improvements in training algorithms: Pre training, Dropout Improvements in computer hardware: GPGPU Year 0: Large performance gains have been reported for large vocabular speech recognition Deep Learning Fever! Cf: G. Hinton, A Practical Guide to Training Restricted Boltmann Machines http://www.cs.toronto.edu/~hinton/absps/guidetr.pdf 9

Neural network based acoustic model 0

Frame Level Vowel Recognition Using MLP Softma function 0. 0.4 0.5 0. 0.5 pあ pい pう え p pお Sigmoid function Sigmoid function Input: Speech feature vector (e.g. MFCC)

Eercise 5. Obtain recognition result (es or no). You ma use a calculator. P(es) P(no) Softma.5 Sigmoid.5.5 4.0

Combination of HMM and MLP p X s GMM X s p X s p s X s p MLP p s s X s 0 s s s s 4 s 0 s s s s 4 Softma laer GMM HMM MLP HMM

MLP HMM based Phone Recognier Start /a/ /i/ /N/ End Softma Sigmoid Sigmoid Input speech feature 4

Neural network based language model 5

Word Vector One of K representation of a word for a fied vocabular word ID of K Apple <,0,0,0,0,0,0> Banana <0,,0,0,0,0,0> Cherr <0,0,,0,0,0,0> Durian 4 <0,0,0,,0,0,0> Orange 5 <0,0,0,0,,0,0> Pineapple 6 <0,0,0,0,0,,0> Strawberr 7 <0,0,0,0,0,0,> 6

Word Prediction Using RNN (Distribution of) Word t <0.0,0.65, 0.4,0., 0.05, 0.0,0.0> D Word t <0, 0, 0,, 0, 0, 0> 7

RNN Language Model (Unfolded) P(<s>, Delicious, Big, Red, Apple, </s>) Delicious Big Red Apple </s> <s> 8

Dialogue Sstem Using SeqSeq Network Sampling from posterior Output M name is TS 800 </s> Encoder network What is our name <s> Input Decoder network 9

Evolution of Compute Hardware 00 Earth simulator 40.96TFLOPS 07 GeForce GTX 080Ti 0.609TFLOPS 699USD Picture is from wikipedia Picture is from Nvidia.com 40