Autoregressive Neural Models for Statistical Parametric Speech Synthesis
|
|
- Denis Bell
- 5 years ago
- Views:
Transcription
1 Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG contact: we welcome critical comments, suggestions, and discussion 1
2 2
3 CONTENTS q Introduction q Models and methods q Summary 3
4 INTRODUCTION Text-to-speech (TTS) pipeline [1,2] Linguistic Text Front-end Back-end features Speech Statistical parametric speech synthesizer (SPSS) [3,4] Acoustic models Spectral features Acoustic features Fundamental frequency (F0) Vocoder [1] Taylor, P. (2009). Text-to-Speech Synthesis. [2] Dutoit, T. (1997). An Introduction to Text-to-speech Synthesis. [3] Tokuda, K., et al., (2013). Speech Synthesis Based on Hidden Markov Models. Proceedings of the IEEE, 101(5), [4] Zen, H., et al. (2009). Statistical parametric speech synthesis. Speech Communication, 51,
5 Topic of this talk INTRODUCTION Linguistic features Acoustic model Acoustic features x 1:T = {x 1,, x T } bo 1:T = {bo 1,, bo T } x 1 x 2 x T p(o bo 1 bo 2 bo T 1:T x 1:T ; ) x t 2 R D x, o t 2 R D o T frames 5
6 Roadmap INTRODUCTION Feedforward Naïve network model (FNN) Recurrent network (RNN) Autoregressive (AR) neural models Perfect model Time dependency modeling 6
7 CONTENTS q Introduction q Models and methods Baseline models q Summary 7
8 FNN q Computation flow BASELINE MODELS bo 1 bo 2 bo 3 bo 4 bo 5 H (FNN) ( ) x 1 x 2 x 3 x 4 x 5 bo t = H (FNN) (x t ) x t 2 R D x, o t 2 R D o 8
9 FNN q As a probabilistic model [6] BASELINE MODELS o 1 o 2 o 3 o 4 o 5 Output layer M 1 M 2 M 3 M 4 M 5 H (FNN) ( ) Mixture density network (MDN) [6] p(o 1:T x 1:T ; ) = x 1 x 2 x 3 x 4 x 5 TY p(o t ; M t )= t=1 M t = {µ t }, where µ t = H (FNN) (x t ) bo t = µ t TY N (o t ; µ t, I) t=1 [6] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer- Verlag New York, Inc., Secaucus, NJ, USA,
10 BASELINE MODELS RNN q As a recurrent MDN (RMDN) [7] o 1 o 2 o 3 o 4 o 5 Output layer M 1 M 2 M 3 M 4 M 5 H (RNN) ( ) p(o 1:T x 1:T ; ) = x 1 x 2 x 3 x 4 x 5 TY p(o t ; M t )= t=1 TY N (o t ; µ t, I) t=1 M t = {µ t }, where µ t = H (RNN) (x 1:T,t) bo t = µ t [7] M. Schuster. Better generative models for sequential data problems: Bidirectional recurrent mixture density networks. In Proc. NIPS, pages ,
11 BASELINE MODELS Model assumption q (Conditional) Independence of o 1:T p(o 1:T x 1:T ; ) =p(o 1:T ; M 1:T )= TY p(o t ; M t ) t=1 q Verify the model assumption [8] Mean-based generation bo t = µ t Sampling bo t p(o t ; M t ) [8] M. Shannon. Probabilistic acoustic modelling for parametric speech synthesis. PhD thesis,
12 BASELINE MODELS Model assumption q Example: F0 modeling 450 p(o 1:T ; M 1:T ) by RMDN RMDN mean ±1 std range Frame index (utterance BC2011 nancy APDC ) 450 Natural F0 NAT RMDN sample NAT 350 F0 (Hz) F0 (Hz) Sampling Frame index (utterance BC2011 nancy APDC ) 12
13 CONTENTS q Introduction q Models and methods Baseline models AR models q Summary 13
14 AR MODELS Improve baseline models Directed graphic models AR models [8] o 1 o 2 o 3 p(o 1:T )= TY p(o t o 1:t 1 ) t=1 FNN RNN Undirected graphic models Trajectory model [9] o o, 2 1 o 2 ( o + MLPG [10] )* Q k o 3 p(o 1:T )= f k(o 1:T ) Z [8] M. Shannon. Probabilistic acoustic modelling for parametric speech synthesis. PhD thesis, [9] H. Zen, K. Tokuda, and T. Kitamura. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. 14 Computer Speech & Language, 21(1): , [10] T. Keiichi, Y. Takayoshi, M. Takashi, K. Takao, and K. Tadashi. Speech parameter generation algorithms for HMM-based speech synthesis. In Proc. ICASSP, pages , 2000.
15 AR MODELS Improve baseline models Directed graphic models AR models [8] o 1 o 2 o 3 p(o 1:T )= TY p(o t o 1:t 1 ) t=1 FNN RNN 1. Shallow AR models (SAR) 2. AR flow 3. Deep AR models (DAR) 15
16 Definition SHALLOW AR MODEL o 1 o 2 o 3 o 4 o 5 M 1 M 2 M 3 M 4 M 5 x 1 x 2 x 3 x 4 x 5 p(o 1:T ; M 1:T )= TY p(o t o t K:t 1 ; M t ) t=1 16
17 SHALLOW AR MODEL Implementation a 2 a 2 a 2 o 1 o 2 o 3 o 4 o 5 K = 2 a 1 a 1 a 1 a 1 p(o 1:T ; M 1:T )= = TY p(o t o t K:t 1 ; M t ) t=1 TY N (o t ; µ t + F(o t K:t 1 ), t ) t=1 F(o t K:t 1 )= Time invariant K: hyper-parameter {a k, b} KX a k o t k + b k=1 Trainable parameters Shallow AR (SAR) 17
18 Theory SHALLOW AR MODEL SAR versus RMDN with a recurrent output layer [11] SAR RMDN o a 1 o 2 o 1 o 2 w µ µ µ 1 µ 2 1 µ 2 output layer h 1 h 2 h 1 h 2 hidden layer x 1 x 2 x 1 x 2 Assume and, linear activation function o t 2 R t =1 [11] H. Zen and H. Sak. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In Proc. ICASSP, pages ,
19 Theory SHALLOW AR MODEL RMDN o 1 o 2 w µ µ 1 µ 2 h 1 h 2 p(o 1:2 )=N (o 1 ; µ 1, 1)N (o 2 ; µ 2 + w µ µ 1, 1) µ 1 = w > h 1 + b µ 2 = w > h 2 + b + w µ µ 1 = µ 2 + w µ µ 1 x 1 x 2 SAR o a 1 o 2 µ 1 µ 2 h 1 h 2 x 1 x 2 p(o 1:2 )=N (o 1 ; µ 1, 1)N (o 2 ; µ 2 + ao 1, 1) µ 1 = w > h 1 + b µ 2 = w > h 2 + b 19
20 Theory SHALLOW AR MODEL RMDN o 1 o 2 w µ µ 1 µ 2 h 1 h 2 x 1 x 2 p(o 1:2 )=N (o 1 ; µ 1, 1)N (o 2 ; µ 2 + w µ µ 1, 1) = 1 2 exp( 1 2 (o µ)> 1 (o µ)) o =[o 1,o 2 ] > µ =[µ 1, µ 2 + w µ µ 1 ] > = apple SAR o a 1 o 2 µ 1 µ 2 h 1 h 2 x 1 x 2 Dependency between µ t or o t? p(o 1:2 )=N (o 1 ; µ 1, 1)N (o 2 ; µ 2 + ao 1, 1) = 1 2 exp( 1 2 (o µ)> 1 (o µ)) apple o =[o 1,o 2 ] > µ =[µ 1,µ 2 + aµ 1 ] > 1 a = a 1+a 2 20
21 Experiments q Data SHALLOW AR MODEL Corpus: Blizzard Challenge 2011 x : similar to HTS English [12] t o t : MGC, interpolated F0, U/V, BAP q Network RNN feedforward -> feedforward -> Bi-LSTM -> Bi->LSTM RMDN SAR K=1, 2,0 for MGC, F0 and BAP RNN+MLPG RNN with o, 2 o + MLPG [12] K. Tokuda, H. Zen, and A. W. Black. An HMM-based speech synthesis system applied to english. In Proc. SSW, pages , Sept
22 Experiments q Results 8 SHALLOW AR MODEL 6 MGC (1st dim) MGC (15th dim) NAT RNN RNN+MLPG RMDN SAR AR-RMDN 1 st dimension of MGC Frame index (utterance BC2011_nancy_APDC ) NAT RNN RNN+MLPG RMDN AR-RMDN SAR 15 th dimension of MGC Frame index (utterance BC2011_nancy_APDC ) 1/12/18 22
23 MGC (30th dim) Experiments q Results SHALLOW AR MODEL NAT RNN RNN+MLPG RMDN SAR AR-RMDN F0 (Hz) th dimension of MGC Frame index (utterance BC2011_nancy_APDC ) NAT RNN RNN+MLPG RMDN SAR AR-RMDN F0 (after U/V classification) Frame index (utterance BC2011_nancy_APDC ) 1/12/18 23
24 Experiments q Results SHALLOW AR MODEL 2 GV of generated MGC 0-2 AR-RMDN SAR GV of MGC -4-6 NAT -8 RNN+MLPG NAT RNN -10 RNN RNN+MLPG RMDN RMDN AR-RMDN Order of MGC 1/12/18 24
25 Experiments q Samples SHALLOW AR MODEL RNN RNN+MLPG RMDN SAR Natural w/o formant enhancement with formant enhancement 25
26 Interpretation q Feature transformation SHALLOW AR MODEL RMDN c 1 c 2 µ 1 µ 2 h 1 h 2 p(o) =p(c) =N (c; µ c, c ) apple µ c =[µ 1,µ 2 ] > 1 0 c = 0 1 x 1 x 2 o a 1 o 2 c = apple c1 c 2 = apple o 1 = o 2 ao 1 apple 1 0 a 1 apple o1 o 2 = Ao SAR µ 1 µ 2 h 1 h 2 x 1 x 2 o =[o 1,o 2 ] > p(o) =N (o; µ o, o ) µ o =[µ 1,µ 2 + aµ 1 ] > o = apple 1 a a 1+a 2 26
27 Interpretation q Feature transformation For o 1:T 2 R D T SHALLOW AR MODEL o 1:T = 2 3 o 1:T,1 4 5 o 1:T,D A (1) A (D) 2 3 c 1:T,1 4 5 c 1:T,D SAR is equivalent to: Training o 1:T A (1) A (D) c 1:T TY p(c t ; M t ) t=1 Generation bo 1:T A (1) 1 A (D) 1 bc 1:T TY p(c t ; M t ) t=1 x 1:T 27
28 Interpretation q Signal and filters For o 1:T 2 R D T SHALLOW AR MODEL o 1:T = 2 3 o 1:T,1 4 5 o 1:T,D filter 1 filter D 2 3 c 1:T,1 4 5 c 1:T,D SAR is equivalent to: Training o 1:T filters A 1 (z) A D (z) c 1:T TY p(c t ; M t ) t=1 Generation bo 1:T filters 1/A 1 (z) 1/A D (z) bc 1:T TY p(c t ; M t ) t=1 x 1:T 28
29 Interpretation q Secret of SAR o 1:T filters A 1 (z) A D (z) SHALLOW AR MODEL c 1:T Magnitude (db) A 1 (z) A 1 (z) Frequency bin (:/1024) bo 1:T filters 1/A 1 (z) 1/A D (z) bc 1:T Magnitude (db) /A 1 (z) H 1 (z) Only due to 1/A d (z)? Frequency bin (:/1024) Due to {A d (z), 1/A d (z)}, less mismatch between c 1:T and RMDN 29
30 CONTENTS q Introduction q Models and methods Baseline models Shallow AR model AR flow q Summary 30
31 AR FLOW Is SAR good enough? q Random sampling on SAR 450 Randomly sampled F0 Natural F0 RMDN sample SAR sample F0 (Hz) Frame index (utterance BC2011 nancy APDC ) q Reason? Linear Time-invariant a k a 2 a 2 a 2 o 1 o 2 o 3 o 4 o 5 a 1 a 1 a 1 a 1 31
32 Theory q Change random variable AR FLOW o 1:T c = Ao c 1:T TY p(c t ; M t ) t=1 x 1:T bo 1:T bo = A 1 bc bc 1:T TY p(c t ; M t ) t=1 32
33 Theory q Change random variable AR FLOW o 1:T c 1:T = f (o 1:T ) c 1:T TY p(c t ; M t ) t=1 x 1:T bo 1:T bo 1:T = f 1 (bc 1:T ) bc 1:T TY p(c t ; M t ) t=1 Jacobian matrix must be simple f (.) must be invertible p o (o 1:T x 1:T )=p c (c 1:T x 1:T ) 1:T [13] D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, pages , [14] D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling. Improved variational inference with inverse autoregressive flow. In Proc. NIPS, pages ,
34 Basic idea AR FLOW p o (o 1:T x 1:T )=p c (c 1:T x 1:T ) 1:T SAR AR Flow Transform c t = o t X K k=1 a k o t k c t = o t X K k=1 f (k) (o 1:t k ) o t k De-transform KX bo t = c t + a k bo t k k=1 KX bo t = c t + f (k) (bo 1:t k ) bo t k k=1 Very simple: 1:T =1 µ t = RNN(o 1:t 1 ) 34
35 Implementation q Training AR FLOW o 1 o 2 o 3 o 4 o 5 µ t = RNN(o 1:t 1 ) 0 µ 2 µ 3 µ 4 µ 5 c 1 c 2 c 3 c 4 c 5 c t = o t + µ t M 1 M 2 M 3 M 4 M 5 x 1 x 2 x 3 x 4 x 5 35
36 Implementation q Generation AR FLOW bo 1 bo 2 bo 3 bo 4 bo 5 bo t = bc t µ t µ 2 µ 3 µ 4 µ 5 bc 1 bc 2 bc 3 bc 4 bc 5 M 1 M 2 M 3 M 4 M 5 x 1 x 2 x 3 x 4 x 5 36
37 AR FLOW Experiments on MGC q Data Japanese data Neither MLPG nor formant enhancement q Network RNN SAR AR Flow RNN + AR flow (Uni-LSTM + feedforward) 37
38 Experiments on MGC q Results AR FLOW 1 st dimension of MGC 30 th dimension of MGC 38
39 Experiments on MGC q Results AR FLOW GV of generated MGC Modulation spectrum 30 th dimension of MGC Natural RNN AR Flow SAR 39
40 Experiments on MGC q Samples AR FLOW RNN SAR AR Flow Natural Natural duration F0 from RNN Experiments will be done on F0 40
41 CONTENTS q Introduction q Models and methods Baseline models Shallow AR model AR flow Deep AR model q Summary 41
42 DEEP AR MODEL Random sampling from AR Flow? q Not work c TY o 1:T 1:T c 1:T = f (o 1:T ) p(c t ; M t ) x 1:T t=1 1:T q One way to go o 1:T Network with AR dependency x 1:T 42
43 Definition DEEP AR MODEL o 1 o 2 o 3 o 4 o 5 M 1 M 2 M 3 M 4 M 5 x 1 x 2 x 3 x 4 x 5 43
44 Definition DEEP AR MODEL o 1 o 2 o 3 o 4 o 5 M 1 M 2 M 3 M 4 M 5 x 1 x 2 x 3 x 4 x 5 p(o 1:T ; M 1:T )= TY p(o t o 1:t 1 ; M t ) t=1 44
45 DEEP AR MODEL Experiments on MGC q No better results yet Experiments on F0 q Samples RNN RMDN SAR DAR Natural Network only models F0 Given natural MGC (and duration) 45
46 DEEP AR MODEL Experiments on F0 p-value MOS NAT DAR MOS score SAR RMDN RNN 3.00 NAT DAR SAR RMDN RNN NAT <1e-30 <1.0e-30 <1.0e-30 <1.0e-30 DAR <1e e e e-30 SAR <1e e RMDN <1e e RNN <1e e GV of F0 at utterance-level (Hz) NAT RNN RMDN SAR DAR NAT RNN F0 GV RMDN SAR DAR 46
47 DEEP AR MODEL Random sampling on F0 q Japanese data 450 NAT RMDN sample SAR sample F0 (Hz) Frame index (utterance ATR Ximera F009 NIKKEIR T01) NAT DAR sample 450 F0 (Hz) Frame index (utterance ATR Ximera F009 NIKKEIR T01)
48 DEEP AR MODEL Random sampling on F0 q Japanese data 450 NAT DAR sample 1 F0 (Hz) NAT DAR sample 2 F0 (Hz) NAT DAR sample 2 F0 (Hz) Frame index (utterance ATR Ximera F009 NIKKEIR T01)
49 DEEP AR MODEL Random sampling on F0 q English data F0 (Hz) NAT SAMPLE F0 (Hz) NAT SAMPLE F0 (Hz) NAT SAMPLE
50 CONTENTS q Introduction q Models and methods Baseline models Shallow AR model AR flow Deep AR model q Summary 50
51 SUMMARY FNN RNN SAR AR Flow DAR Random sampling as a diagnostic tool Linear & invertible c 1:T = Ao 1:T Non-linear & invertible Non-linear & non-invertible c 1:T = f(o 1:T ) AR dependency 51
52 Message SUMMARY RMDN o 1 o 2 w µ µ 1 µ 2 Dependency between µ t or o t? SAR o 1 a o 2 µ 1 µ 2 h 1 h 2 h 1 h 2 x 1 x 2 x 1 x 2 52
53 Thank you for your attention Q & A Toolkit, scripts, slides tonywangx.github.io 53
An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis
ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07
More informationComparing Recent Waveform Generation and Acoustic Modeling Methods for Neural-network-based Speech Synthesis
ICASSP 2018 Calgary, Canada Comparing Recent Waveform Generation and Acoustic Modeling Methods for Neural-network-based Speech Synthesis Xin WANG, Jaime Lorenzo-Trueba, Shinji TAKAKI, Lauri Juvela, Junichi
More informationISSUE 2: TEMPORAL DEPENDENCY?
DAR DAR 8/2/18 1 DAR q Interpretation ISSUE 2: TEMPORAL DEPENDENCY? Only linear activation function, unit variance p(o 1:2 x 1:2 )= 1 2 exp( 1 2 (o µ)> 1 (o µ)) DAR o 1 o 2 SAR o 1 a o 2 o =[o 1,o 2 ]
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationAn RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis
INTERSPEECH 217 August 2 24, 217, Stockholm, Sweden An RNN-based Quantized F Model with Multi-tier Feedback Links for Text-to-Speech Synthesis Xin Wang 1,2, Shinji Takaki 1, Junichi Yamagishi 1,2,3 1 National
More informationGenerative Model-Based Text-to-Speech Synthesis. Heiga Zen (Google London) February
Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationA DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION P eng Liu 1,2, Quanjie Y u 1,2, Zhiyong W u 1,2,3, Shiyin Kang 3, Helen Meng 1,3, Lianhong Cai 1,2 1 Tsinghua-CUHK Joint Research Center
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationGenerating Sequences with Recurrent Neural Networks
Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based
More informationDevelopment of a Deep Recurrent Neural Network Controller for Flight Applications
Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationGate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi
More informationRecurrent Neural Networks
Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationBLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya
More informationa) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationBIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION
BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationUnfolded Recurrent Neural Networks for Speech Recognition
INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationVocoding approaches for statistical parametric speech synthesis
Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationDeep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Kaiyang Zhou, Yu Qiao, Tao Xiang AAAI 2018 What is video summarization? Goal: to automatically
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationRecurrent Neural Networks
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationLanguage Modelling. Steve Renals. Automatic Speech Recognition ASR Lecture 11 6 March ASR Lecture 11 Language Modelling 1
Language Modelling Steve Renals Automatic Speech Recognition ASR Lecture 11 6 March 2017 ASR Lecture 11 Language Modelling 1 HMM Speech Recognition Recorded Speech Decoded Text (Transcription) Acoustic
More informationNoise Compensation for Subspace Gaussian Mixture Models
Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationRARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS
RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS Hyungui Lim 1, Jeongsoo Park 1,2, Kyogu Lee 2, Yoonchang Han 1 1 Cochlear.ai, Seoul, Korea 2 Music and Audio Research Group,
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationarxiv: v4 [cs.cl] 5 Jun 2017
Multitask Learning with CTC and Segmental CRF for Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, and Noah A Smith Toyota Technological Institute at Chicago, USA School of Computer Science, Carnegie
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationMaster 2 Informatique Probabilistic Learning and Data Analysis
Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationClassification of Hand-Written Digits Using Scattering Convolutional Network
Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationRecurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks
Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationOn the use of Long-Short Term Memory neural networks for time series prediction
On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.
More informationTime series forecasting using ARIMA and Recurrent Neural Net with LSTM network
Time series forecasting using ARIMA and Recurrent Neural Net with LSTM network AvinashNath [1], Abhay Katiyar [2],SrajanSahu [3], Prof. Sanjeev Kumar [4] [1][2][3] Department of Information Technology,
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationModeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationWaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationProbability and Time: Hidden Markov Models (HMMs)
Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013 CPSC 322, Lecture 32 Slide 1 Lecture Overview Recap Markov Models Markov Chain
More informationRecurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta
Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationReading Group on Deep Learning Session 2
Reading Group on Deep Learning Session 2 Stephane Lathuiliere & Pablo Mesejo 10 June 2016 1/39 Chapter Structure Introduction. 5.1. Feed-forward Network Functions. 5.2. Network Training. 5.3. Error Backpropagation.
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationMasked Autoregressive Flow for Density Estimation
Masked Autoregressive Flow for Density Estimation George Papamakarios University of Edinburgh g.papamakarios@ed.ac.uk Theo Pavlakou University of Edinburgh theo.pavlakou@ed.ac.uk Iain Murray University
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationCS230: Lecture 10 Sequence models II
CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech
More informationConditional Random Fields: An Introduction
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationMULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING. Liang Lu and Steve Renals
MULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING Liang Lu and Steve Renals Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK {liang.lu, s.renals}@ed.ac.uk ABSTRACT
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationTowards Maximum Geometric Margin Minimum Error Classification
THE SCIENCE AND ENGINEERING REVIEW OF DOSHISHA UNIVERSITY, VOL. 50, NO. 3 October 2009 Towards Maximum Geometric Margin Minimum Error Classification Kouta YAMADA*, Shigeru KATAGIRI*, Erik MCDERMOTT**,
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationNecessary Corrections in Intransitive Likelihood-Ratio Classifiers
Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationSequence Transduction with Recurrent Neural Networks
Alex Graves graves@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Abstract Many machine learning tasks can be expressed as the transformation or transduction
More informationAttention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou
Attention Based Joint Model with Negative Sampling for New Slot Values Recognition By: Mulan Hou houmulan@bupt.edu.cn CONTE NTS 1 2 3 4 5 6 Introduction Related work Motivation Proposed model Experiments
More informationImproved Learning through Augmenting the Loss
Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language
More informationResidual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationHIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix
More informationarxiv: v1 [cs.ne] 14 Nov 2012
Alex Graves Department of Computer Science, University of Toronto, Canada graves@cs.toronto.edu arxiv:1211.3711v1 [cs.ne] 14 Nov 2012 Abstract Many machine learning tasks can be expressed as the transformation
More information