Vocoding approaches for statistical parametric speech synthesis
|
|
- Jemima Shonda Thompson
- 6 years ago
- Views:
Transcription
1 Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge, UK March 2nd, 2011
2 2 Topics of this presentation 1. Existing methods to generate the speech waveform in statistical parametric speech synthesis 2. An idea for closing the gap between acoustic modeling and waveform generation
3 3 Notation and acronyms in this presentation Notation x(n) X(z) X `ejω X `ejω magnitude X `ejω X `ejω 2 x X Acronyms OLA MELP STRAIGHT FFT IFFT LF LP PCA LSP a discrete-time signal x(n) in the z-transform domain Discrete-Time Fourier Transform of x(n) (frequency domain representation of x(n)) response of x(n) phase response of x(n) power spectrum of x(n) a vector a matrix OverLap and Add Mixed Excitation Linear Prediction Speech Transformation and Representation using Adaptive Interpolation of weighted spectrum Fast Fourier Transform Inverse Fast Fourier Transform Liljencrants-Fant model Linear Prediction Principal Component Analysis Line Spectral Pairs
4 4 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
5 5 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
6 6 Statistical parametric speech synthesis Speech synthesis methods 1. Rule-based 1.1 Parametric 1.2 Unit concatenation 2. Corpus-based 2.1 Unit selection and concatenation 2.2 Statistical parametric 2.3 Hybrid
7 7 Statistical parametric speech synthesis 1. Advantages several voices, small data, small footprint, language portability, etc 2. Unnatural synthesized speech 2.1 Parametric model of speech production 2.2 Parameters of the model are averaged How to alleviate this unnaturalness? 1. Statistical modeling 2. Choice of the speech production model 3. Choice of the parameters to represent such model 4. Way of synthesizing speech with these parameters
8 8 Statistical parametric speech synthesis 1. Advantages several voices, small data, small footprint, language portability, etc 2. Unnatural synthesized speech 2.1 Parametric model of speech production 2.2 Parameters of the model are averaged How to alleviate this unnaturalness? 1. Statistical modeling 2. Choice of the speech production model 3. Choice of the parameters to represent such model 4. Way of synthesizing speech with these parameters
9 9 Statistical parametric speech synthesis Training time Speech waveform s = [ s(0) s(n 1) ] Parameter extraction Synthesis time Speech Parameters c = [ c 0 c T 1 ] Labels Acoustic model training ˆλ c = arg max λ c p (c l, λ c ) Acoustic model parameters ˆλ c Labels Speech parameters Acoustic model parameters Parameter ĉ = [ ] ĉ 0 ĉ T 1 ˆλ generation c ĉ = arg max p (c l, ˆλ ) c λ c Waveform generation Speech waveform s = [ s(0) s(n 1) ]
10 10 Waveform generation part 1. Choice of the speech production mechanism Simple Speech synthesis filter Excitation Complete Vocal tract, glottal and lip radiation filters Excitation 2. Appropriate parameters for the chosen speech mechanism Good quantization/compression properties 3. Given the speech model and corresponding parameters, design the best way to synthesize the speech signal according to some criteria
11 11 Waveform generation part 1. Choice of the speech production mechanism Simple Speech synthesis filter Excitation Complete Vocal tract, glottal and lip radiation filters Excitation 2. Appropriate parameters for the chosen speech mechanism Good quantization/compression properties 3. Given the speech model and corresponding parameters, design the best way to synthesize the speech signal according to some criteria
12 12 Digital speech models The complete model [Deller, Jr. et al., 2000] Pitch period Gain Pulse train generator Uncorrelated noise generator Glottal filter G(z) Voice volume velocity V U Vocal tract filter H(z) Lip radiation filter L(z) Speech signal Gain The simplified model, assumed for LP analysis Pitch period Pulse train generator White noise generator V U Gain All pole filter Speech signal
13 13 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
14 14 Standard vocoder for statistical parametric synthesis Labels representing the text to be synthesized Trained acoustic models F 0 Pulse train P 0... White noise t(n) w(n) F 0 Spectral parameters V e(n) Synthesis U Excitation filter Speech Very simple Analysis: F 0 extraction Synthesis: pulse/white noise switch Poor speech quality!
15 15 Improved vocoding methods for statistical parametric synthesis 1. Methods that focus solely on the excitation signal 1.1 Fully parametric excitation models 1.2 Methods that attempt to mimic the LP residual 2. Methods that focus on source and vocal tract modeling
16 16 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
17 17 MELP mixed excitation [Yoshimura et al., 2001] MELP excitation building part Jitter Fourier magnitudes Bandpass voicing strengths Pulse train t(n) Position Jittering H v (z) H p (z) v(n) Voiced Excitation e(n) White noise w(n) H n (z) u(n) Unvoiced Excitation Mixed Excitation Period jitter derived from voicing strengths for aperiodic frames Fourier magnitudes simulates the glottal filter Filters H p (z) and H n (z) control the amount of pulse and noise in the final excitation e(n)
18 18 Pulse and noise shaping filters Filters H p (z) and H n (z) switch between noise and pulse excitation according to each band H p (z) = J 1 j=0 m=0 M β j h j (m)z m, H n (z) = β j = J 1 { 1 if β j if β j < 0.5 M j=0 m=0 h j (m): bandpass filter coefficients for the j band Bandpass voicing strength for the j band obtained according to a normalized correlation coefficient β j = f(r t ) ; r t = N 1 n=0 [ N 1 n=0 s2 (n) (1 β j ) h j (m)z m s(n)s(n + t) ] [ N 1 ] n=0 s2 (n + t)
19 19 Application to statistical parametric synthesis Additional parameters for acoustic modeling 1. Bandpass voicing strengths: 5 2. Fourier magnitudes: 10
20 20 STRAIGHT excitation [Zen et al., 2007a] STRAIGHT vocoder: excitation construction no phase manipulation case Pulse T ( e jω) T(e jω ) Zero phase Noise ( ) W e jω Random phase 1 W ( e jω) 1 ω ω ω ω H p ( e jω ) Hn ( e jω ) Aperiodicity parameters ω ω V ( e jω) Voiced Excitation U ( e jω) Unvoiced Excitation E ( e jω) e(n) Á Ì Mixed Excitation
21 21 STRAIGHT vocoder for statistical parametric synthesis Aperiodicity parameters extracted and averaged over specified frequency sub-bands Band-aperiodicity parameters (BAP) At synthesis time the generated BAP are converted in aperiodicity Speech is synthesized in the frequency domain Achieves very good quality Additional parameters for acoustic modeling BAP: usually 5 coefficients
22 22 Pulse and noise weighting filters Filters H p ( e jω ) and H n ( e jω ) shape the pulse and noise inputs, just like in MELP Frequency responses are obtained from the aperiodicity parameters a(w) ( Hp e jω ) = 1 a(w) 0 ω π ( H p e jω ) =0 ( Hn e jω ) = a(w) H n ( e jω ) =0 0 ω π 0 ω π 0 ω π
23 23 Band aperiodicity parameters Aperiodicity at frequency ω a(ω) = werb (λ; ω) S ( e jλ) 2 Υ ( ) S L(e jλ ) 2 S U(e jλ ) 2 werb (λ; ω) S (e jλ ) 2 dλ dλ S `e jω : speech spectral envelope S U `ejω : envelope constructed by connecting the peaks of S `ejω S L `ejω : envelope constructed by connecting the valleys of S `ejω w ERB (λ; ω): auditory filter to smooth S `ejω Υ( ): look-up table operation Band-aperiodicity b j = 1 a(ω)dω Ω j Ω j Ω j : j-th frequency band
24 24 Aperiodicity and band aperiodicity: examples 5 bands: 0-1kHz, 1-2kHz, 2-4kHz, 4-6kHz, 6-8kHz Log Magnitude Aperiodicity Band aperiodicity Frequency (Hz) 24 Bark critical bands Log Magnitude Aperiodicity Band aperiodicity Frequency (Hz)
25 25 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
26 26 State-dependent mixed excitation [Maia et al., 2007] Labels Trained HMMs HMM state sequence State s 0 State s 1 State s Q 1 State durations c (0) 0 c (0) c (1) T 1 0 c (1) T 1 c (Q 1) 0 c (Q 1) T 1 Spectral parameters (generated) F0 (0) 0 F0 (0) F0 (1) T 1 0 F0 (1) T 1 F0 (Q 1) 0 F0 (Q 1) T 1 F0 (generated) H (0) v (z), H (0) u (z) H (1) v (z), H (1) u (z) H v (Q 1) (z), H u (Q 1) (z) Filters Pulse train Generator Noise w(n) t(n) H v (z) H u(z) v(n) Voiced Excitation ũ(n) Unvoiced Excitation ẽ(n) Mixed Excitation H(z) Synthesized Speech Filters: H v(z) = M/2 X m= M/2 h(m)z m, H u(z) = 1 P L l=0 g(l)z l
27 27 State-dependent mixed excitation: training Residual (target signal) e(n) a 0a1 a 2 a Z 1 v(n) H v (z)... Voiced Excitation p 0 p 1 p 2 p Z 1 u(n) ÈÙÐ ØÖ Ò t(n) White noise w(n) (error signal) Filter coefficients h = [ h ( ) M 2 G(z) = 1 H u (z) And pulse positions and amplitudes Are optimized in a way that Unvoiced Excitation h ( )] M [ ] 2, g = g(0) g(l) {p 0,..., p J 1 }, {a 0,..., a J 1 } {h, g, t(n)} = arg max g,h,t(n) P (e(n) g, h, t(n))
28 28 State-dependent mixed excitation: synthesis F 0 State sequence s q = {s 0,...,s Q 1 } Pulse train generator t(n) H (q) v (z) ṽ(n) αṽ(n) α Excitation ẽ(n) White noise w(n) H (q) u (z) u(n) Time modulation H HP (z) ũ(n) ρ(n) Noise component is colored through 1. High-pass filtering (F c = 2kHz) 2. Time modulation with a pitch-synchronous triangular window: ρ(n) H v(z) is normalized in energy Gain α adjusts the energy of the voiced component so that the power of the excitation signal ẽ(n) becomes one
29 29 Voiced filter effect Pulse train Voiced filter Voiced excitation Amplitude Amplitude Amplitude Time (ms) Time (ms) Time (ms) 30 Pulse train 10 Voiced filter 40 Voiced excitation Magnitude (db) Magnitude (db) Magnitude (db) Frequency (Hz) Frequency (Hz) Frequency (Hz)
30 30 Unvoiced filter effect White noise Unvoiced filter Unvoiced excitation Amplitude Time (ms) Amplitude Time (ms) Amplitude Time (ms) Magnitude (db) White noise Frequency (Hz) Magnitude (db) Unvoiced filter Frequency (Hz) Magnitude (db) Unvoiced excitation Frequency (Hz)
31 31 Deterministic plus stochastic residual modeling [Drugman et al., 2009] Assumed model of the LP residual e(n) e(n) = e d (n) + e s (n) e d (n): deterministic part e s (n): stochastic part Maximum voiced frequency F m Boundary between deterministic and stochastic components Set to 4 khz
32 32 Deterministic modeling: eigenresidual calculation Spectral parameters Speech data Inverse filtering Residual GCI detection F 0 extraction Pitch periods Windowing Residual segments Resampling and energy normalization PCA Eigenresiduals Normalized frequency F0 for resampling the residual segments F 0 F Nyquist F m F 0,min PCA: eigenresiduals explain about 80% of the total dispersion
33 33 Stochastic modeling Stochastic component model e s (n) = ρ(n) [h u (n) w(n)] ρ(n): pitch synchronous modulation window h u (n): AR filter impulse response w(n): white noise Unvoiced filter h u (n) Fixed Auto-regressive (all-pole) Coefficients obtained through LP analysis
34 34 Application to statistical parametric synthesis Additional parameters for acoustic modeling PCA weights: 15 Use of eigenresidual of superior ranks makes no difference = Optionally, the stream of PCA weights can be removed Synthesis part Eigenresiduals F 0 PCA weights White noise w(n) Linear combination H u (z) Resampling Time modulation r d (n) r s (n) OLA e(n) ρ(n)
35 35 Waveform interpolation [Sung et al., 2010] Waveform interpolation (WI) Each cycle of the excitation signal represented by a characteristic waveform (CW) P/2 [ ( ) ( )] 2πkn 2πkn e(n) = A k cos + B k sin P P k=0 {A k, B k }: discrete-time Fourier series coefficients P : pitch period CW extracted from the LP residual at a fixed rate Information to reconstruct the excitation signal Pitch period: P CW coefficients: {A k, B k }
36 36 Application to statistical parametric synthesis Analysis is similar to eigenresidual calculation [Drugman et al., 2009] Speech data WI analysis CW coefficients PCA Basis vector selection Basis vector coefficient computation Additional parameters for acoustic modeling Coefficients of the basis vectors: 8 Synthesis Basis vector coefficients Pitch Basis vector coefficients CW reconstruction WI synthesis Speech
37 37 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
38 38 Glottal inverse filtering [Raitio et al., 2008] Uses Iterative Adaptive Inverse Filtering [Alku, 1992] Speech signal Harmonic to noise ratio (HNR) extraction Energy extraction Vocal tract spectrum V (z) Glottal inverse filtering (IAIF) Estimated voice source signal g(n) Features LP analysis Voice source spectrum G(z) Features for acoustic modeling 1. F 0 2. Energy 3. HNR in 4 bands: 0-2kHz, 2-4kHz, 4-6kHz,6-8kHz 4. Voice source spectrum = glottal flow = 10 LSPs 5. Vocal tract spectrum: 30 LSPs F 0 extraction
39 39 At synthesis time F 0 Energy HNR Glottal source spectrum G(z) Library pulse Interpolation Gain Noise addition Spectral macthing L(z) V White noise Gain U Excitation Library pulse extracted from the speech data through glottal inverse filtering Noise is separately added to each band of the voiced excitation according to HNR G(z) implements the spectral shape of the glottal pulse L(z) is a fixed lip radiation filter
40 40 Glottal spectrum separation (GSS) [Cabral et al., 2008] Speech production model S ( e jω) = D ( e jω) G ( e jω) V ( e jω) R ( e jω) D `e jω : pulse train G `e jω : glottal pulse V `e jω : vocal tract R `e jω : lip radiation Simplified speech production model S ( e jω) = D ( e jω) V ( e jω) What GSS does 1. Estimate a model of the glottal flow derivative = E `e jω 2. Remove its effect from the speech spectral envelope = Ĥ V e jω Ĥ `e jω = E (e jω ) 3. Re-synthesize speech S e jω = D e jω E e jω Ĥ `e jω E (e jω ) = S e jω = D e jω V `e jω e jω
41 41 Glottal flow model utilized in GSS LF model is used to represent the glottal pulse T 0 T a 0 t 0 t p t e t a t c E e Parameters t c: instant of complete closure t e: instant of maximum excitation T 0: fundamental period Time t p: instant of maximum flow T a = t a t e E e: amplitude of maximum excitation
42 42 Application to statistical parametric synthesis Speech Inverse filtering STRAIGHT Glottal flow derivative Ĥ ( e jω) Speech spectral envelope LF model parameterization LF model parameters e LF (n) FFT Features E LF ( e jω ) Spectral separation Spectral parameters V ( e jω) = Ĥ(ejω ) E LF(e jω ) Acoustic modeling 1. Spectral parameters: mel-cepstral coefficients 2. Band aperiodicity parameters 3. 5 LF-model parameters: t e, t p, T a, E e, T 0
43 43 Synthesis part LF model parameters Aperiodicity parameters LF model generator e LF (n) FFT E LF ( e jω ) H p ( e jω ) ω White noise w(n) FFT W ( e jω) H n ( e jω ) ω IFFT Mixed Excitation STRAIGHT vocoder is utilized Original delta pulse is replaced by the pulse created by the generated LF-model parameters
44 44 Vocal tract and LF-model plus noise separation [Lanchantin et al., 2010] Assumed speech model S ( e jω) = [ D ( e jω) G ( e jω) + W ( e jω)] V ( e jω) R ( e jω) D ( e jω) : pulse train W ( e jω) : white noise R ( e jω) : lip radiation Parameterization 1. Fundamental frequency D ( e jω) 2. LF model parameter G ( e jω) 3. Noise power W ( e jω) 4. Mel-cepstral coefficients V ( e jω) G ( e jω) : glottal pulse V ( e jω) : vocal tract
45 45 Application to statistical parametric synthesis 1. Estimate a single parameter for a simplified LF model: R d 2. Determine a maximum voiced frequency ω V U 3. Estimate power of the noise W ( e jω) : σg 2 4. Estimate vocal tract parameters V e jω = 8 «>< τ o S(e jω ) γ R(e jω )G(e jω ) 1, w < ω V U «>: C o S(e jω ) Γ 1, w ω R(e jω )G(e jω V U ) V U τ o : cepstral analysis by fitting the harmonic peaks C o : power cepstrum Γ, γ: normalization terms Acoustic modeling 1. Simplified LF-model parameter: R d 2. Standard deviation of the noise component: σ g 3. Cepstral coefficients that represent V ( e jω) 4. F 0
46 46 Synthesis time Excitation construction for voiced segments R d Glottal pulse generator White noise w(n) g(n) Windowing and time modulation FFT FFT G ( e jω) W ( e jω) H HP ( e jω ) ω V U E ( e jω) Excitation Excitation construction for unvoiced segments White noise w(n) Windowing FFT Synthesized speech signal E ( e jω) Excitation S ( e jω) = E ( e jω) V ( e jω) R ( e jω)
47 Vocoding methods: summary and examples Method Simple MELP STRAIGHT SDF DSM WI GlottiHMM GSS SVLN Description Pulse train/white noise simple switch MELP mixed excitation STRAIGHT mixed excitation State-dependent filtering mixed excitation Deterministic plus stochastic of the residual Waveform interpolation to statistical parametric synthesis Glottal inverse filtering Glottal source separation Separation of vocal tract and LF-model plus noise Sample 47
48 48 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
49 49 Joint vocoding-acoustic modeling The goal of any speech synthesizer is to reproduce the speech waveform Parameters of a joint acoustic-excitation model are estimated by maximizing the probability of the speech waveform l Speech waveform s = [ s(0) s(n 1) ] Joint acoustic excitation model training ˆλ = arg max λ = arg max λ p (s l, λ) p (s c, λ) p (c l, λ) dc c = [ ] c 0 c T 1 Spectral parameter as hidden variable ˆλ λ = {λ c, λ e}: acoustic-excitation model parameters λ c: acoustic model part λ e: excitation model part
50 50 Another viewpoint: waveform-level modeling Comparison with typical modeling for parametric synthesis Typical: state sequence is hidden variable, spectrum is the observation New model: state sequence and spectrum are hidden variables, speech is the observation q q c c Similar concepts [Toda and Tokuda, 2008]: factor analyzed trajectory HMM for spectral estimation [Wu and Tokuda, 2009]: closed-loop training for HMM-based synthesis s
51 51 A close look into the probabilities involved Typical statistical modeling for parametric synthesis X ˆλ c = arg max p (c q, λ c) p (q l, λ c) λ c Augmented statistical modeling X Z ˆλ = arg max p (s c, q, λ) λ {z } q w q Speech generative model (speech production from spectrum) p (c q, λ) p (q l, λ) {z } w Can be modeled by existing machines, e.g. HMM, HSMM, trajectory HMM dc
52 52 One possible speech generative model Pulse train t(n) a 0 a 2 a1 a Z 1 H v (z)... p 0 p 1 p 2 p Z 1 Gaussian white noise w(n) (zero mean, variance one) H u (z) v(n) Voiced Excitation u(n) Unvoiced Excitation e(n) Excitation H c (z) Speech s(n) Probability of the speech signal p (s H c, q, λ e ) = H c 1 N ( Hc 1 ) s; H v,q t, Φ q ) 1 Φ q = (G q G q λ e = {H v, G, t} : excitation model parameters
53 53 Vocal tract filter impulse response and spectral parameters: relationship We need p (s c, q, λ), not p (s H c, q, λ) Mapping between H c and c is necessary! Two possibilities 1. Relationship between H c and c represented as a Gaussian process 2. Relationship between H c and c is deterministic Cepstral coefficients!
54 54 Acoustic modeling Trajectory HMM [Zen et al., 2007b] for acoustic modeling p (c l, λ c ) = q p (c q, λ c ) }{{} N (c ; c q, P q ) p (q l, λ c ) }{{} π T 1 q0 t=0 α q tq t+1 R q = P 1 q c q = P qr q = W Σ 1 q W r q = W Σ 1 q µ q W : append dynamic features to c Why: modeling of p (c l, λ c ) instead of p (W c l, λ c ) (conventional HMM)
55 55 Joint acoustic-excitation model Final joint model p (s l, λ) = p (s c, q, λ e ) }{{} q Excitation model H c 1 N ( ) Hc 1 s; H v,q t, Φ q p (c q, λ c ) p (q l, λ c ) }{{} Acoustic model N (c; c q, P q ) π q0 T 1 t=0 α q tq t+1 dc
56 56 Training procedure Speech Initial cepstral analysis Initialization ÁÒ Ø Ð Ô ØÖÙÑ Ú ØÓÖ c Trajectory HMM training State sequence ˆq ÁÒ Ø Ð Ô ØÖÙÑ Ú ØÓÖ c Excitation model training Rˆq,rˆq H v (z), G(z), t(n) Estimation of the best cepstral coefficients Recursion Ô ØÖÙÑ Ú ØÓÖ ĉ Ô ØÖÙÑ Ú ØÓÖ ĉ Trajectory HMM training State sequence ˆq Excitation model training
57 57 Contents Introduction Vocoding methods for statistical parametric speech synthesis Fully parametric excitation methods Methods that attempt to mimic the LP residual Methods that work on source and vocal tract modeling Joint acoustic modeling and waveform generation for statistical parametric speech synthesis Conclusion
58 58 Conclusions Quality improvement of statistical parametric synthesizers through better waveform generation methods Existing approaches that use source-filter modeling can be classified into Methods that attempt to improve the excitation signal solely Methods that focus on the speech production model as a whole Naturalness degradation of statistical parametric synthesizers has basically two causes Acoustic modeling that produces averaged parameter trajectories Use of parametric speech production models Methods which can integrate both acoustic modeling and speech production
59 59 Acknowledgments Many thanks to João Cabral, from University College Dublin, Ireland Tuomo Raitio, from Helsinki University of Technology, Finland Thomas Drugman, from Faculté Polytechnique de Mons, Belgium Pierre Lanchantin, from IRCAM, France June Sig Sung, from Seoul National University, South Korea for proving samples of their systems.
60 References I Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2 3): Cabral, J., Renals, S., Richmond, K., and Yamagishi, J. (2008). Glottal spectral separation for parametric speech synthesis. In Proc. of Interspeech, pages Deller, Jr., J. R., Hansen, J. H. L., and Proaks, J. G. (2000). Discrete-Time Processing of Speech Signals. IEEE Press Classic Reissue, New York. Drugman, T., Wilfart, G., and Dutoit, T. (2009). A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In Proc. of Interspeech, pages Lanchantin, P., Degottex, G., and Rodet, X. (2010). An HMM-based speech synthesis system using a new glottal source and vocal-tract separation method. In Proc. of ICASSP, pages Maia, R., Toda, T., Zen, H., Nankaku, Y., and Tokuda, K. (2007). An excitation model for HMM-based speech synthesis based on residual modeling. In Proc. of the 6th ISCA Workshop on Speesh Synthesis, pages Raitio, T., Suni, A., Pulakka, H., Vainio, M., and Alku, P. (2008). HMM-based Finnish text-to-speech system using glottal inverse filtering. In Interspeech, pages
61 References II Sung, J., Kyung, D., Oh, H., and Kim, N. (2010). Excitation modeling based on waveform interpolation for HMM-based speech synthesis. In Proc. of Interspeech, pages Toda, T. and Tokuda, K. (2008). Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. In Proc. of ICASSP, pages Wu, Y. J. and Tokuda, K. (2009). Minimum generation error training by using original spectrum as reference for log spectral distortion measure. In Proc. of ICASSP, pages Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., and Kitamura, T. (2001). Mixed-excitation for HMM-based speech synthesis. In Proc. of EUROSPEECH. Zen, H., Toda, T., Nakamura, M., and Tokuda, K. (2007a). Details of the Nitech HMM-based speech synthesis for Blizzard Challenge IEICE Trans. on Inf. and Systems, E90-D(1): Zen, H., Tokuda, K., and Kitamura, T. (2007b). Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequence. Computer Speech and Language, 21(1):
62
An Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling
An Model for HMM-Based Speech Synthesis Based on Residual Modeling Ranniery Maia,, Tomoki Toda,, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, National Inst. of Inform. and Comm. Technology (NiCT), Japan
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationMel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
More informationChirp Decomposition of Speech Signals for Glottal Source Estimation
Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationSinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,
Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech
More informationGlottal Source Estimation using an Automatic Chirp Decomposition
Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationTimbral, Scale, Pitch modifications
Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationImproved Method for Epoch Extraction in High Pass Filtered Speech
Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationAn Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis
ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationL8: Source estimation
L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationComparing Recent Waveform Generation and Acoustic Modeling Methods for Neural-network-based Speech Synthesis
ICASSP 2018 Calgary, Canada Comparing Recent Waveform Generation and Acoustic Modeling Methods for Neural-network-based Speech Synthesis Xin WANG, Jaime Lorenzo-Trueba, Shinji TAKAKI, Lauri Juvela, Junichi
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationAutoregressive Neural Models for Statistical Parametric Speech Synthesis
Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn
More informationNearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender
Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm
More informationLab 9a. Linear Predictive Coding for Speech Processing
EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationCEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.
CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi
More informationGenerative Model-Based Text-to-Speech Synthesis. Heiga Zen (Google London) February
Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationrepresentation of speech
Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationZeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals
Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017
More informationELEN E4810: Digital Signal Processing Topic 11: Continuous Signals. 1. Sampling and Reconstruction 2. Quantization
ELEN E4810: Digital Signal Processing Topic 11: Continuous Signals 1. Sampling and Reconstruction 2. Quantization 1 1. Sampling & Reconstruction DSP must interact with an analog world: A to D D to A x(t)
More informationApplication of the Bispectrum to Glottal Pulse Analysis
ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationCS578- Speech Signal Processing
CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction
More informationDepartment of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions
Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions The complex cepstrum, ˆx[n], of a sequence x[n] is the inverse Fourier transform of the complex
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationTopic 6. Timbre Representations
Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?
More informationLinear Prediction Coding. Nimrod Peleg Update: Aug. 2007
Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationCOMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.
COMP 40420 2. Signals and Systems Dr Chris Bleakley UCD School of Computer Science and Informatics. Scoil na Ríomheolaíochta agus an Faisnéisíochta UCD. Introduction 1. Signals 2. Systems 3. System response
More informationGMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System
GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationOptimal Design of Real and Complex Minimum Phase Digital FIR Filters
Optimal Design of Real and Complex Minimum Phase Digital FIR Filters Niranjan Damera-Venkata and Brian L. Evans Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University
More informationFusion of Spectral Feature Sets for Accurate Speaker Identification
Fusion of Spectral Feature Sets for Accurate Speaker Identification Tomi Kinnunen, Ville Hautamäki, and Pasi Fränti Department of Computer Science University of Joensuu, Finland {tkinnu,villeh,franti}@cs.joensuu.fi
More informationDesign of a CELP coder and analysis of various quantization techniques
EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005
More informationLab 4: Quantization, Oversampling, and Noise Shaping
Lab 4: Quantization, Oversampling, and Noise Shaping Due Friday 04/21/17 Overview: This assignment should be completed with your assigned lab partner(s). Each group must turn in a report composed using
More informationLECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES
LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch
More informationCausal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation
Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Thomas Drugman, Baris Bozkurt, Thierry Dutoit To cite this version: Thomas Drugman, Baris Bozkurt, Thierry
More informationQUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING
QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING Dhananjaya Gowda, Manu Airaksinen, Paavo Alku Dept. of Signal Processing and Acoustics,
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationSPEECH COMMUNICATION 6.541J J-HST710J Spring 2004
6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual
More informationRe-estimation of Linear Predictive Parameters in Sparse Linear Prediction
Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;
More informationReview of Fundamentals of Digital Signal Processing
Solution Manual for Theory and Applications of Digital Speech Processing by Lawrence Rabiner and Ronald Schafer Click here to Purchase full Solution Manual at http://solutionmanuals.info Link download
More informationThe Equivalence of ADPCM and CELP Coding
The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationE : Lecture 1 Introduction
E85.2607: Lecture 1 Introduction 1 Administrivia 2 DSP review 3 Fun with Matlab E85.2607: Lecture 1 Introduction 2010-01-21 1 / 24 Course overview Advanced Digital Signal Theory Design, analysis, and implementation
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationL6: Short-time Fourier analysis and synthesis
L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude
More informationSpeech Coding. Speech Processing. Tom Bäckström. October Aalto University
Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.
More informationIntroduction to Biomedical Engineering
Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis
More informationDigital Signal Processing. Midterm 1 Solution
EE 123 University of California, Berkeley Anant Sahai February 15, 27 Digital Signal Processing Instructions Midterm 1 Solution Total time allowed for the exam is 8 minutes Some useful formulas: Discrete
More informationEE538 Final Exam Fall :20 pm -5:20 pm PHYS 223 Dec. 17, Cover Sheet
EE538 Final Exam Fall 005 3:0 pm -5:0 pm PHYS 3 Dec. 17, 005 Cover Sheet Test Duration: 10 minutes. Open Book but Closed Notes. Calculators ARE allowed!! This test contains five problems. Each of the five
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationCHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME
CHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME Shri Mata Vaishno Devi University, (SMVDU), 2013 Page 13 CHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME When characterizing or modeling a random variable, estimates
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationSource modeling (block processing)
Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured
More informationThursday, October 29, LPC Analysis
LPC Analysis Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed
More informationSpeaker Identification Based On Discriminative Vector Quantization And Data Fusion
University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou
More informationMITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationCausal anticausal decomposition of speech using complex cepstrum for glottal source estimation
Available online at www.sciencedirect.com Speech Communication 53 (2011) 855 866 www.elsevier.com/locate/specom Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationSystem Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain
System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline
More informationFilter structures ELEC-E5410
Filter structures ELEC-E5410 Contents FIR filter basics Ideal impulse responses Polyphase decomposition Fractional delay by polyphase structure Nyquist filters Half-band filters Gibbs phenomenon Discrete-time
More informationVarious signal sampling and reconstruction methods
Various signal sampling and reconstruction methods Rolands Shavelis, Modris Greitans 14 Dzerbenes str., Riga LV-1006, Latvia Contents Classical uniform sampling and reconstruction Advanced sampling and
More informationSpeech Synthesis as A Statistical. Keiichi Tokuda Nagoya Institute of Technology
Speech Synthesis as A Statistical Machine Learning Problem Keiichi Tokuda Nagoya Institute of Technology ASRU2011, Hawaii December 14, 2011 Introduction Rule-based, formant synthesis (~ 90s) Hand-crafting
More informationA 600bps Vocoder Algorithm Based on MELP. Lan ZHU and Qiang LI*
2017 2nd International Conference on Electrical and Electronics: Techniques and Applications (EETA 2017) ISBN: 978-1-60595-416-5 A 600bps Vocoder Algorithm Based on MELP Lan ZHU and Qiang LI* Chongqing
More informationTime-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking Dhananjaya Gowda and
More informationSource/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.
Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that
More informationNew Statistical Model for the Enhancement of Noisy Speech
New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem
More informationChapter 5 Frequency Domain Analysis of Systems
Chapter 5 Frequency Domain Analysis of Systems CT, LTI Systems Consider the following CT LTI system: xt () ht () yt () Assumption: the impulse response h(t) is absolutely integrable, i.e., ht ( ) dt< (this
More informationReview of Fundamentals of Digital Signal Processing
Chapter 2 Review of Fundamentals of Digital Signal Processing 2.1 (a) This system is not linear (the constant term makes it non linear) but is shift-invariant (b) This system is linear but not shift-invariant
More informationOn Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR
On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR Vivek Tyagi a,c, Hervé Bourlard b,c Christian Wellekens a,c a Institute Eurecom, P.O Box: 193, Sophia-Antipolis, France.
More informationKeywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.
Volume 3, Issue 6, June 213 ISSN: 2277 128X International Journal of Advanced Research in Comuter Science and Software Engineering Research Paer Available online at: www.ijarcsse.com Lattice Filter Model
More informationA comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information
Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationStress detection through emotional speech analysis
Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationDiscrete-Time Signals and Systems. Frequency Domain Analysis of LTI Systems. The Frequency Response Function. The Frequency Response Function
Discrete-Time Signals and s Frequency Domain Analysis of LTI s Dr. Deepa Kundur University of Toronto Reference: Sections 5., 5.2-5.5 of John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing:
More informationA FAST AND ACCURATE ADAPTIVE NOTCH FILTER USING A MONOTONICALLY INCREASING GRADIENT. Yosuke SUGIURA
A FAST AND ACCURATE ADAPTIVE NOTCH FILTER USING A MONOTONICALLY INCREASING GRADIENT Yosuke SUGIURA Tokyo University of Science Faculty of Science and Technology ABSTRACT In this paper, we propose a new
More informationGlottal Modeling and Closed-Phase Analysis for Speaker Recognition
Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Raymond E. Slyh, Eric G. Hansen and Timothy R. Anderson Air Force Research Laboratory, Human Effectiveness Directorate, Wright-Patterson
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationSignals, Instruments, and Systems W5. Introduction to Signal Processing Sampling, Reconstruction, and Filters
Signals, Instruments, and Systems W5 Introduction to Signal Processing Sampling, Reconstruction, and Filters Acknowledgments Recapitulation of Key Concepts from the Last Lecture Dirac delta function (
More information