End-to-end Automatic Speech Recognition
|
|
- Ashlee Logan
- 5 years ago
- Views:
Transcription
1 End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson Research Center 1 February 22, 2017
2 Contents 1. Introduction 2. Connectionst Temporal Classification (CTC) 3. Attention Model 4. References Nussbaum-Thom: IBM Thomas J. Watson Research Center 2 February 22, 2017
3 Terminology Features: x, x t, x T 1 := x 1,..., x T. Words: w, u, v, w m, w M 1 := w 1,..., w M. Word sequences: W, W n, V. States: s, s t, s T 1 := s 1,..., s T. Class conditional posterior probability: p(s t x t ), p(w, s T 1 x T 1 ). Nussbaum-Thom: IBM Thomas J. Watson Research Center 3 February 22, 2017
4 Bayes Decision Rule Nussbaum-Thom: IBM Thomas J. Watson Research Center 4 February 22, 2017
5 Towards End-to-End Automatic Speech Recognition Nussbaum-Thom: IBM Thomas J. Watson Research Center 5 February 22, 2017
6 Towards End-to-End Automatic Speech Recognition Nussbaum-Thom: IBM Thomas J. Watson Research Center 6 February 22, 2017
7 End-to-End Automatic Speech Recognition Nussbaum-Thom: IBM Thomas J. Watson Research Center 7 February 22, 2017
8 End-to-End Approach End-to-end: Training all modules to optimize a global performance criterion. (LeCun et al., 98) Easy: Classes do not have a sub-structure e.g. image classification. Difficult: Classes have a sub-structure (sequences, graphs) e.g. automatic speech recognition, automatic handwriting recognition, machine translation. Segmentation problem: Which part of the input related to which part of the sub-structure? Nussbaum-Thom: IBM Thomas J. Watson Research Center 8 February 22, 2017
9 Towards End-to-end Automatic Speech Recognition End-to-end acoustic model: Using characters instead of phonemes. Connectionst temporal classification using recurrent or convolutional neural networks. Purely neural attention model. End-to-end feature extraction: Feature extraction integrated into the acoustic model. Using the raw time signal. Learning a specific type of filter. Towards real end-to-end modeling: Using word as targets instead of characters or phonemes and a massive amount of data. Nussbaum-Thom: IBM Thomas J. Watson Research Center 9 February 22, 2017
10 Basic Problem Input: X = x T 1 = (x 1,..., x T ) Neural network: p( x 1 ),..., p( x T ). Target: W = w M 1 = (w 1,..., w M ) but M << T How do we solve this? Connectionist Temporal Classification (CTC). [Graves et al., 2006, Graves et al., 2009, CTC] Attention Models. [Bahdanau et al., 2016, Chorowski et al., 2015, Chorowski et al., 2015, Attention] Inverted Hidden Markov Models. [Doetsch et al., 2016, Inverted HMM - a Proof of Concept] Nussbaum-Thom: IBM Thomas J. Watson Research Center 10 February 22, 2017
11 Overview CTC Concept. Training. Recognition. Nussbaum-Thom: IBM Thomas J. Watson Research Center 11 February 22, 2017
12 Connectionist Temporal Classification (CTC) Given X = (x 1,..., x 5 ) and W = (a, b, c) Introduce blank state and allow word repititions: x 1 x 2 x 3 x 4 x 5 a b c a b c a a b c..... a b c Blank and repitition removal B: B(a,, b, c,, ) = (a, b, c) Nussbaum-Thom: IBM Thomas J. Watson Research Center 12 February 22, 2017
13 Connectionist Temporal Classification (CTC) Posterior for sentence W = w1 M and features X = x1 T : p(w X ) = p(s1 T X ) s T 1 B 1 (W ) T := p(s t x t ) s1 T B 1 (W ) t=1 Training criterion for training samples (X n, W n ), n = 1,..., N: F CTC (Λ) = 1 N N log p Λ (W n X n ) n=1 Nussbaum-Thom: IBM Thomas J. Watson Research Center 13 February 22, 2017
14 Overview CTC Concept. Training. Recognition. Nussbaum-Thom: IBM Thomas J. Watson Research Center 14 February 22, 2017
15 Forward-Backward Decomposition (CTC) α(t, m, v) : Sum over s t 1 B(w m 1 ) for given x t 1 ending in v. β(t, m, v) : Sum over s T t B(w M m ) for given x T t starting in v. Choose t 1,..., T : p(w M 1 x T 1 ) = =... M = m=1 v {w m, } s T 1 B 1 (w M 1 ) p(s T 1 x T 1 ) α(t, m, v) p(v x t ) p(v x t ) β(t, m, v) p(v x t ) Nussbaum-Thom: IBM Thomas J. Watson Research Center 15 February 22, 2017
16 Forward Algorithm (CTC) Nussbaum-Thom: IBM Thomas J. Watson Research Center 16 February 22, 2017
17 Forward Algorithm (CTC) Compute α(1, m, v) = p(v x 1 ) Nussbaum-Thom: IBM Thomas J. Watson Research Center 17 February 22, 2017
18 Forward Algorithm (CTC) Compute α(2, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 18 February 22, 2017
19 Forward Algorithm (CTC) Compute α(3, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 19 February 22, 2017
20 Forward Algorithm (CTC) Compute α(4, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 20 February 22, 2017
21 Forward Algorithm (CTC) Compute α(5, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 21 February 22, 2017
22 Forward Algorithm (CTC) Compute α(6, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 22 February 22, 2017
23 Forward Algorithm (CTC) Compute α(7, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 23 February 22, 2017
24 Forward Algorithm (CTC) Compute α(8, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 24 February 22, 2017
25 Forward Algorithm (CTC) Compute α(9, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 25 February 22, 2017
26 Forward Algorithm (CTC) Compute α(10, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 26 February 22, 2017
27 Forward Algorithm (CTC) Compute α(11, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 27 February 22, 2017
28 Forward Algorithm (CTC) Compute α(12, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 28 February 22, 2017
29 Backward Algorithm (CTC) Nussbaum-Thom: IBM Thomas J. Watson Research Center 29 February 22, 2017
30 Backward Algorithm (CTC) Compute β(12, M, v) = p(v x 1 2) Nussbaum-Thom: IBM Thomas J. Watson Research Center 30 February 22, 2017
31 Backward Algorithm (CTC) Compute β(11, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 31 February 22, 2017
32 Backward Algorithm (CTC) Compute β(10, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 32 February 22, 2017
33 Backward Algorithm (CTC) Compute β(9, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 33 February 22, 2017
34 Backward Algorithm (CTC) Compute β(8, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 34 February 22, 2017
35 Backward Algorithm (CTC) Compute β(7, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 35 February 22, 2017
36 Backward Algorithm (CTC) Compute β(6, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 36 February 22, 2017
37 Backward Algorithm (CTC) Compute β(5, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 37 February 22, 2017
38 Backward Algorithm (CTC) Compute β(4, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 38 February 22, 2017
39 Backward Algorithm (CTC) Compute β(3, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 39 February 22, 2017
40 Backward Algorithm (CTC) Compute β(2, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 40 February 22, 2017
41 Backward Algorithm (CTC) Compute β(1, m, v). Nussbaum-Thom: IBM Thomas J. Watson Research Center 41 February 22, 2017
42 Posterior Decomposition (CTC) Choose t 1,..., T : p(w1 M X ) = p(s1 T X ) s1 T B 1 (w1 M) = = = s1 T B 1 (w1 M) τ=1 M T p(s τ x τ ) m=1 v {w m, } s1 T B 1 (w1 M s ) τ=1 t=v M T p(s τ x τ ) t 1 m=1 v {w m, } s1 T B 1 (w1 M s ) τ=1 t=v definition model assumption decomposition around t T p(s τ x τ ) p(s v x t ) p(s ρ x ρ ) ρ=t+1 Nussbaum-Thom: IBM Thomas J. Watson Research Center 42 February 22, 2017
43 Posterior Decomposition (CTC) = = M t 1 m=1 v {w m, } s1 T B 1 (w1 M s ) τ=1 t=v M m=1 v {w m, } s1 T B 1 (w1 M s ) t=v p(s τ x τ ) p(s v x t ) t p(s τ x τ ) τ=1 p(v x t ) p(v x t ) T ρ=t+1 p(s ρ x ρ ) T p(s ρ x ρ ) ρ=t p(v x t ) Nussbaum-Thom: IBM Thomas J. Watson Research Center 43 February 22, 2017
44 Posterior Decomposition (CTC) = = M m=1 v {w m, } M m=1 v {w m, } s t 1 B 1 (w m 1 ) s t=v α(t, m, v) p(v x t ) t p(v x τ ) τ=1 p(v x t ) p(v x t ) p(v x t ) β(t, m, v) p(v x t ) s T t B 1 (w M m ) s t=v T p(s ρ x ρ ) ρ=t p(v x t ) α(t, m, v) : Sum over s t 1 B(w m 1 ) for given x t 1 ending in v. β(t, m, v) : Sum over s T t B(w M m ) for given x T t starting in v. Nussbaum-Thom: IBM Thomas J. Watson Research Center 44 February 22, 2017
45 Forward Path Decomposition (CTC) Consider a path s t 1 B 1 (w m 1 ), s t = v: st = w m s t 1 1 s t 1 s t w 1... w?? w m w 1... w m w m w m w 1... w m 1 w m w 1... w m 1 w m 1 w m st = s t 1 1 s t 1 s t w 1... w?? w 1... w m w m w 1... w m w m Nussbaum-Thom: IBM Thomas J. Watson Research Center 45 February 22, 2017
46 Forward Probablities (CTC) α(t, m, v) = + = p(v x t ) s1 t B 1 (w1 m w ) τ=1 m=v t p(s τ x τ ) t 1 u {w m 1, } s t 1 1 B 1 (w m 1 1 ) τ=1 s t 1 =u t 1 s t 1 1 B 1 (w1 m s ) τ=1 t 1 =w m p(s τ x τ ), t 1 u {w m, } s t 1 1 B 1 (w1 m s ) τ=1 t 1 =u p(s τ x τ ) p(s τ x τ ), v = w m v = Nussbaum-Thom: IBM Thomas J. Watson Research Center 46 February 22, 2017
47 Forward Probabilities (CTC) = p(v x t ) α(t 1, m 1, u) + α(t 1, m, w m ) u {w m 1, } u {w m, } α(t 1, m, u), v = w m, v = Nussbaum-Thom: IBM Thomas J. Watson Research Center 47 February 22, 2017
48 Backward Probablities (CTC) β(t, m, v) = Sum over all pathes st T starting in a word v. B(w M m ) for given x T t T β(t, m, v) = p(s τ x τ ) p(v x t ) = p( x t ) st T B 1 (wm M ) τ=t w m=v u {w m+1, } u {w m, } β(t + 1, m + 1, u) + β(t + 1, m, w m ), v = w m β(t + 1, m, u), v = Nussbaum-Thom: IBM Thomas J. Watson Research Center 48 February 22, 2017
49 Derivatives CTC Derivative posterior: p(s xt)p(w X ) M = p(s xt) = M m=1 v {w m, } m=1 v {w m, } α(t, m, v) p(v x t ) p(v x t ) α(t, m, s) β(t, m, s) δ(v, s) p 2 (s x t ) β(t, m, v) p(v x t ) log P(W X ) = Derivative training criterion: F CTC (Λ) = 1 N 1 P(W X ) P(W X ) N log p(w n X n ) Nussbaum-Thom: IBM Thomas J. Watson Research Center n=1 49 February 22, 2017
50 CTC Architectures What kind of encoders? DNNs, (bidirectional) LSTMs, CNNs. Subsampling: Reducing framerate through the network. Nussbaum-Thom: IBM Thomas J. Watson Research Center 50 February 22, 2017
51 Subsampling Join input frames. Reshape input to next layer: Return every 2nd frame to the next layer. CNNs: Use strides. Nussbaum-Thom: IBM Thomas J. Watson Research Center 51 February 22, 2017
52 Peaking Behavior [Graves and Jaitly, 2014, citation] Nussbaum-Thom: IBM Thomas J. Watson Research Center 52 February 22, 2017
53 Overview CTC Concept. Training. Recognition. Nussbaum-Thom: IBM Thomas J. Watson Research Center 53 February 22, 2017
54 Hybrid Recognition (CTC) Hybrid: Model: p(x s) p(s x) p(s) = p(s x) p(x)p(s) = Decoding: ŵ N 1 = arg max w N 1 { p(w N 1 ) max s T 1 } T p(x t s t ) τ=1 Nussbaum-Thom: IBM Thomas J. Watson Research Center 54 February 22, 2017
55 Weighted Finite State Transudcer Recognition (WFST) Token T : Language Model G: Lexicon L: Search Space: S = T min(det(l G)) Nussbaum-Thom: IBM Thomas J. Watson Research Center 55 February 22, 2017
56 Resources for CTC Keras: Tensorflow: master/keras/backend/tensorflow_backend.py Theano: master/keras/backend/theano_backend.py Example: master/examples/image_ocr.py Baidu: https: //github.com/baidu-research/ba-dls-deepspeech Eesen: Kaldi: Nussbaum-Thom: IBM Thomas J. Watson Research Center 56 February 22, 2017
57 Further Literature on CTC [Miao et al., 2015, EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding] [Collobert et al., 2016, Wav2Letter: an End-to-End ConvNet-based Speech Recognition System] [Zhang et al., 2017, Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks] [Senior et al., 2015, Acoustic modelling with CD-CTC-SMBR LSTM RNNS] [Soltau et al., 2016, Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition] Nussbaum-Thom: IBM Thomas J. Watson Research Center 57 February 22, 2017
58 Attention Model Encoder-Decoder architecture: Encoder: performs a feature extracation/encoding based on the input. Decoder: Produces output sequence output labels from the encoded features. Nussbaum-Thom: IBM Thomas J. Watson Research Center 58 February 22, 2017
59 Attention Encoder-Decoder Architecture What kind of encoders? DNNs, (bidirectional) LSTMs, CNNs. output decoder MLP glimpse encoder Nussbaum-Thom: IBM Thomas J. Watson Research Center 59 February 22, 2017
60 Attention Encoder-Decoder Architecture Encoder-Decoder: Input: x T 1 = x 1,..., x T Encoder: h T /4 1 = h 1,..., h T /4 = Encoder(x1 T ) Decoder: For m = 1,..., M: Attention: α m = Attend(h T /4 1, s m 1, α m 1) T /4 Glimpse: g m = τ=1 α m,th t Generator: y m = Generator(g m, s m 1) c m = RNN(c m 1, g m, s m 1) y m = Softmax(c m) Transition: s m = RNN(s m 1, y m, g m) Nussbaum-Thom: IBM Thomas J. Watson Research Center 60 February 22, 2017
61 Attention Mechanism Content based: (weights: E, W, V and bias: b) ɛ m,t = E tanh(w s m 1 + V h t + b) Location based: (weights: E, W, V, U and bias: b) f = F α m 1 ɛ m,t = E tanh(w s m 1 + V h t + U f m,t + b) Renormalization: (sharpening: γ) α m,t = T /4 exp(γ ɛ m,t) exp(γ ɛ m,t ) t=1 Nussbaum-Thom: IBM Thomas J. Watson Research Center 61 February 22, 2017
62 Window Around Median Compute median: τ m = arg min k=1,...,t /4 k α m 1,ρ ρ=1 Compute attention around median: k θ=k+1 α m 1,θ T m = {τ m ω left,..., τ m + ω right } exp(γ ɛ m,t ), t T m exp(γ ɛ α m,t = m,τ ) τ T m 0, otherwise Nussbaum-Thom: IBM Thomas J. Watson Research Center 62 February 22, 2017
63 Other Techniques (Attention) Monotonic regularization: ( T /4 r m = max 0, τ α m,i τ=1 i=1 i=1 τ ) α m 1,i Curriculum learning: Starting with shorter sequences and gradually increase sequence length. Flatstart: Initial positions are chosen according to speaker speed. Nussbaum-Thom: IBM Thomas J. Watson Research Center 63 February 22, 2017
64 Resources for Attention Theano+Bricks+Blocks: Tensorflow: Keras: Nussbaum-Thom: IBM Thomas J. Watson Research Center 64 February 22, 2017
65 Further Literature on Attention [Bahdanau et al., 2016, End-to-end attention-based large vocabulary speech recognition] [Chorowski et al., 2015, Attention-Based Models for Speech Recognition] [Chorowski et al., 2015, End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results] [Kim et al., 2016, Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning] Nussbaum-Thom: IBM Thomas J. Watson Research Center 65 February 22, 2017
66 Evaluation Framework Development and evaluation set different from training set. Levenshtein: Minimum insertions, deletions and substitutions Spoken Referenz Alignment Recognized Hypothese S P E A K E R T E A C H E R correct substitution insertion deletion Nussbaum-Thom: IBM Thomas J. Watson Research Center 66 February 22, 2017
67 Evaluation Framework Levenshtein: Minimum insertions, deletions and substitutions { λ L(w1 N, v1 M ( ) = min 1 δ(ws(i), v t(i) ) )} s,t i=1 { 1, v = w with dem Kronecker delta δ(w, v) = 0, v w Word Error Rate (WER): WER(Spoken R 1, Recognized R 1 ) = R L(Spoken r, Recognized r ) r=1 R Spoken r r=1 Nussbaum-Thom: IBM Thomas J. Watson Research Center 67 February 22, 2017
68 Experimental Results Model CER WER Bhadanau et al. (2015) Attention Attention + bigram LM Attention + trigram LM Attention + extended trigram LM Graves and Jaitly (2014) CTC Hannun et al. (2014) CTC + bigram LM n/a 14.1 Miao et al. (2015) CTC + bigram LM n/a 26.9 CTC for phonemes + lexicon n/a 26.9 CTC for phonemes + trigram LM n/a 7.3 CTC + trigram LM n/a 9.0 Hybrid BGRU (15 h) n/a 2.0 Nussbaum-Thom: IBM Thomas J. Watson Research Center 68 February 22, 2017
69 Attention Modeling Example from Handwriting Nussbaum-Thom: IBM Thomas J. Watson Research Center 69 February 22, 2017
70 Inverted Hidden Markov Model (HMM) Traditional HMM: p(w N 1, x T 1 ) = p(w N 1 ) p(x T 1 w N 1 ) = p(w1 N ) s1 T N = p(w n w1 n 1 ) n=1 s1 T Inverted HMM: p(w N 1 x T 1 ) = t N 1 = t N 1 p(s T 1, x T 1 w N 1 ) T t=1 p(w N 1, t N 1 x T 1 ) N n=1 p(s t, x t s t 1 1, x t 1 1, w N 1 ) p(w n, t n w n 1 1, t n 1 1, x T 1 ) [Doetsch et al., 2016, Inverted HMM - a Proof of Concept] Nussbaum-Thom: IBM Thomas J. Watson Research Center 70 February 22, 2017
71 Unsolved Problems for End-to-End ASR Error rates: Still higher than traditional HMM-based system (one exception). Global search: Still a transducer-based or HMM-based search. Acoustic model: Word and character-based End-to-End learning. Language model: No integration with the language model in training yet. Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
72 Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., and Bengio, Y. (2016). End-to-end attention-based large vocabulary speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages Collobert, R., Puhrsch, C., and Synnaeve, G. (2016). Wav2letter: an end-to-end convnet-based speech recognition system. Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
73 CoRR, abs/ Doetsch, P., Hegselmann, S., Schlà 1 4ter, R., and Ney, H. (2016). Inverted hmm - a proof of concept. In Neural Information Processing Systems Workshop, Barcelona, Spain. Graves, A., Fernández, S., Gomez, F., and Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML 06, pages , New York, NY, USA. ACM. Graves, A. and Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
74 In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, June 2014, pages Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., and Schmidhuber, J. (2009). A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell., 31(5): Kim, S., Hori, T., and Watanabe, S. (2016). Joint ctc-attention based end-to-end speech recognition using multi-task learning. CoRR, abs/ Miao, Y., Gowayyed, M., and Metze, F. (2015). EESEN: end-to-end speech recognition using deep RNN models and wfst-based decoding. Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
75 In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pages Senior, A. W., Sak, H., de Chaumont Quitry, F., Sainath, T. N., and Rao, K. (2015). Acoustic modelling with CD-CTC-SMBR LSTM RNNS. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pages Soltau, H., Liao, H., and Sak, H. (2016). Neural speech recognizer: Acoustic-to-word LSTM model for large vocabulary speech recognition. CoRR, abs/ Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Laurent, C., Bengio, Y., and Courville, A. C. (2017). Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
76 Towards end-to-end speech recognition with deep convolutional neural networks. CoRR, abs/ Nussbaum-Thom: IBM Thomas J. Watson Research Center 71 February 22, 2017
Segmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationarxiv: v4 [cs.cl] 5 Jun 2017
Multitask Learning with CTC and Segmental CRF for Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, and Noah A Smith Toyota Technological Institute at Chicago, USA School of Computer Science, Carnegie
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationLarge Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks
Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Haşim Sak, Andrew Senior, Oriol Vinyals, Georg Heigold, Erik McDermott, Rajat Monga, Mark Mao, Françoise Beaufays
More informationADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING
ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING Amit Das, Jinyu Li, Rui Zhao, Yifan Gong Microsoft AI and Research, One Microsoft Way, Redmond, WA 98052 amitdas@illinois.edu,{jinyli,
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationDeep Learning for Automatic Speech Recognition Part II
Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM
More informationVery Deep Convolutional Neural Networks for LVCSR
INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationLecture 14. Advanced Neural Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom
Lecture 14 Advanced Neural Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
More informationDeep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ.
Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ. Sevilla) Computational and Biological Learning Lab Dep. of Engineering
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationUnfolded Recurrent Neural Networks for Speech Recognition
INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationResidual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,
More informationModeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network
More informationSPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS. Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton
SPEECH RECOGITIO WITH DEEP RECURRET EURAL ETWORKS Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton Department of Computer Science, University of Toronto ABSTRACT Recurrent neural networks (Rs) are
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationSequence Transduction with Recurrent Neural Networks
Alex Graves graves@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Abstract Many machine learning tasks can be expressed as the transformation or transduction
More informationTrajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Jianshu Zhang, Yixing Zhu, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information
More informationCompression of End-to-End Models
Interspeech 2018 2-6 September 2018, Hyderabad Compression of End-to-End Models Ruoming Pang, Tara N. Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-cheng Chiu Google Inc.,
More informationarxiv: v1 [cs.ne] 14 Nov 2012
Alex Graves Department of Computer Science, University of Toronto, Canada graves@cs.toronto.edu arxiv:1211.3711v1 [cs.ne] 14 Nov 2012 Abstract Many machine learning tasks can be expressed as the transformation
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationRARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS
RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS Hyungui Lim 1, Jeongsoo Park 1,2, Kyogu Lee 2, Yoonchang Han 1 1 Cochlear.ai, Seoul, Korea 2 Music and Audio Research Group,
More informationA Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property
A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property Jianshu Chen Department of Electrical Engineering University of California Los Angeles, CA 90034, USA
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationGate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationFASTER SEQUENCE TRAINING. Albert Zeyer, Ilia Kulikov, Ralf Schlüter, Hermann Ney
FASTER SEQUENCE TRAINING Albert Zeyer, Ilia Kulikov, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, 506 Aachen, Germany
More informationUnsupervised Model Adaptation using Information-Theoretic Criterion
Unsupervised Model Adaptation using Information-Theoretic Criterion Ariya Rastrow 1, Frederick Jelinek 1, Abhinav Sethy 2 and Bhuvana Ramabhadran 2 1 Human Language Technology Center of Excellence, and
More informationAn exploration of dropout with LSTMs
An exploration of out with LSTMs Gaofeng Cheng 1,3, Vijayaditya Peddinti 4,5, Daniel Povey 4,5, Vimal Manohar 4,5, Sanjeev Khudanpur 4,5,Yonghong Yan 1,2,3 1 Key Laboratory of Speech Acoustics and Content
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationarxiv: v2 [cs.ne] 7 Apr 2015
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units arxiv:154.941v2 [cs.ne] 7 Apr 215 Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton Google Abstract Learning long term dependencies
More informationForward algorithm vs. particle filtering
Particle Filtering ØSometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous X 2 may be too big to do updates ØSolution: approximate inference Track samples
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationarxiv: v5 [stat.ml] 19 Jun 2017
Chong Wang 1 Yining Wang 2 Po-Sen Huang 1 Abdelrahman Mohamed 3 Dengyong Zhou 1 Li Deng 4 arxiv:1702.07463v5 [stat.ml] 19 Jun 2017 Abstract Segmental structure is a common pattern in many types of sequences
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationa) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationNatural Language Understanding. Kyunghyun Cho, NYU & U. Montreal
Natural Language Understanding Kyunghyun Cho, NYU & U. Montreal 2 Machine Translation NEURAL MACHINE TRANSLATION 3 Topics: Statistical Machine Translation log p(f e) =log p(e f) + log p(f) f = (La, croissance,
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationHighway-LSTM and Recurrent Highway Networks for Speech Recognition
Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with
More informationBIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION
BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationDepartment of Computer Science and Engineering, Department of Electronic and Computer Engineering, HKUST, Hong Kong, Dec. 04, 2012
Department of Computer Science and Engineering, Department of Electronic and Computer Engineering, HKUST, Hong Kong, Dec. 04, 2012 The Statistical Approach to Speech Recognition and Natural Language Processing:
More informationCSC321 Lecture 10 Training RNNs
CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw
More informationBeyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition
INTERSPEECH 2014 Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition Zhen Huang 1, Jinyu Li 2, Chao Weng 1, Chin-Hui Lee
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationarxiv: v5 [cs.lg] 2 Nov 2015
A Recurrent Latent Variable Model for Sequential Data arxiv:1506.02216v5 [cs.lg] 2 Nov 2015 Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio Department of Computer
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationFun with weighted FSTs
Fun with weighted FSTs Informatics 2A: Lecture 18 Shay Cohen School of Informatics University of Edinburgh 29 October 2018 1 / 35 Kedzie et al. (2018) - Content Selection in Deep Learning Models of Summarization
More informationA GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition
A -based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition Jianshu Zhang, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language
More informationA TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR
A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR Daniel Povey,2, Hossein Hadian, Pegah Ghahremani, Ke Li, Sanjeev Khudanpur,2 Center for Language and Speech Processing, Johns Hopkins University, Baltimore,
More informationDeep Learning for Automatic Speech Recognition Part I
Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech
More informationEE-559 Deep learning LSTM and GRU
EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter
More informationRNNLIB: Connectionist Temporal Classification and Transcription Layer
RNNLIB: Connectionist Temporal Classification and Transcription Layer Wantee Wang 2015-02-08 16:56:40 +0800 Contents 1 The Name 2 2 The Theory 2 2.1 List of Symbols............................ 2 2.2 Training
More informationSpeech recognition. Lecture 14: Neural Networks. Andrew Senior December 12, Google NYC
Andrew Senior 1 Speech recognition Lecture 14: Neural Networks Andrew Senior Google NYC December 12, 2013 Andrew Senior 2 1
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationRecurrent Neural Networks
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More information4F10: Deep Learning. Mark Gales. Michaelmas 2016
4F10: Deep Learning Mark Gales Michaelmas 2016 What is Deep Learning? From Wikipedia: Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationHigh Order LSTM/GRU. Wenjie Luo. January 19, 2016
High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long
More informationNeural Hidden Markov Model for Machine Translation
Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationDeep Learning for Speech Processing An NST Perspective
Deep Learning for Speech Processing An NST Perspective Mark Gales University of Cambridge June 2016 September 2016 Natural Speech Technology (NST) EPSRC (UK Government) Programme Grant: collaboration significantly
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationChapter 3: Basics of Language Modelling
Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple
More informationTo Separate Speech! A System for Recognizing Simultaneous Speech
A System for Recognizing Simultaneous Speech John McDonough 1,2,Kenichi Kumatani 2,3,Tobias Gehrig 4, Emilian Stoimenov 4, Uwe Mayer 4, Stefan Schacht 1, Matthias Wölfel 4 and Dietrich Klakow 1 1 Spoken
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More information