Improved Method for Epoch Extraction in High Pass Filtered Speech

Size: px
Start display at page:

Download "Improved Method for Epoch Extraction in High Pass Filtered Speech"

Transcription

1 Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu d govind@cb.amrita.edu S. R. Mahadeva Prasanna, Ramesh K Department of Electronics & Electrical Engineering Indian Institute of Technology Guwahati, Assam {prasanna,kk.ramesh}@iitg.ernet.in Abstract The objective of present work is to improve the epoch estimation performance in high pass filtered (HPF) speech using conventional zero frequency filtering (ZFF) approach. The strength of impulse at zero frequency is significantly attenuated in case of HPF speech and hence shows significant degradation in epoch estimation performance by ZFF approach. Since linear prediction (LP) residual of speech is characterized by sharper impulse discontinuities at epochs location compared to speech waveform, the present work uses LP residual of HPF speech for epoch estimation using ZFF method. The Gabor filtering on LP residual is carried out for further increasing strength of impulses at epochs location of LP residual. The epochs location are estimated by ZFF of Gabor filtered LP residual. The performance of proposed method is better compared to that of existing Hilbert envelope based ZFF approach with improved epoch identification accuracy. I. INTRODUCTION The epochs in speech are the time instants at which excitation of vocal tract is maximum [], [2], [3], [4]. The epochs represent instants of glottal closure in case of voiced speech and onset of burst or frication in unvoiced speech. Due to the effect of vocal tract characteristics, estimation of epochs from speech becomes a challenging task [2]. Hence many methods are proposed in the literature for reliable estimation of epochs from speech [5], [2], [3], [4]. Due to its significance, in many applications, the processing is carried out anchored around epochs location [6], [7], [8]. Among the existing approaches, the group delay (GD) based processing, DYPSA and zero frequency filtering (ZFF) methods are the popular approaches used for extracting epochs. Among these methods, ZFF method is a well known approach for reliable epoch estimation with reduced computational complexity [8], [2]. The impulse like characteristics of epochs are exploited in the ZFF method [2]. In ZFF method, the speech is initially passed through cascade of two zero frequency resonators. The trend in zero frequency resonator output is then removed by local mean subtraction to obtain zero frequency filtered signal. The negative to positive zero crossings of zero frequency filtered signal are hypothesized as epochs location. The ZFF method provides the best epochs estimation performance for clean speech signals which has sufficient energy near the zero frequency. However, due to significant attenuation of low frequency components near the zero frequency, the performance degrades for band limited signals such as high pass filtered (HPF) and telephone recorded speech signals [9]. There were attempts to improve epochs estimation performances in HPF speech by the ZFF method [9]. In this work, low frequency nature of Hilbert envelope is used to emphasize the energy near the zero frequency of speech signal. An improved epochs estimation performance for HPF speech is obtained by passing Hilbert envelope of the HPF speech or its residual through zero frequency resonator. Although, the epoch estimation performance is improved in terms of higher epoch identification rate, lower miss rate and false alarm rates, the epoch identification error measured as the deviation of the estimated epochs from the reference epochs remain higher. Reduced epoch identification accuracy of this approach make the method less suitable for applications such as epoch based prosody modification, where the perceptual quality of the prosody modified speech mostly depends on the accuracy with which epochs are estimated [8]. The smoothing tendency of the Hilbert envelope at the epochs location increases the deviation of estimated epochs from actual reference epochs location obtained from the differenced electro-glottogram (EGG). Hence epoch estimation using Hilbert envelope of HPF of speech results in a poor temporal resolution. Figure compares the epochs estimated from ZFF of Hilbert envelope of HPF speech and reference epochs estimated from the corresponding differenced EGG. It can be observed from the Figure that, though the discontinuities at the epochs location are enhanced by Hilbert envelope, the low frequency nature of Hilbert envelope smoothes the peaks. Hence the estimation of epochs location result in a poor temporal resolution. The poor temporal resolution of estimated epochs can be confirmed by comparing reference epochs location represented differenced EGG peaks given in Figure. The samples in LP residual are uncorrelated and higher prediction errors form strong impulse like discontinuities at the epochs location. Hence the present work focusses on exploiting the impulse like discontinuities in LP residual for epoch estimation from HPF speech. Even though, ZFF of LP residual obtained from HPF speech gives better performance compared to performance of conventional ZFF of HPF speech, the performance is not at par with ZFF of Hilbert envelope of HPF speech. However, the epoch identification accuracy is found to be better than the ZFF of Hilbert envelope of HPF speech. To further enhance the impulse like discontinuities at the epochs location, the LP residual of HPF speech is filtered using a Gabor filter having a shape equivalent to the discontinuity at glottal pulse. The epochs are obtained by ZFF of Gabor filtered residual sequence. The performance of pro-

2 Time (Samples) Fig.. Deviation of estimated epochs in the Hilbert envelope of HPF speech from true locations. A voiced segment of HPF speech, its Hilbert envelope Estimated epochs by ZFF of Hilbert envelope of HPF speech Difference EGG peaks showing the reference epochs location posed method is confirmed by improved epoch identification accuracy compared to that of ZFF of Hilbert envelope of HPF speech. The rest of the paper is organized as follows: Section II describes the algorithmic steps in ZFF method. The comparison of epoch estimation performances of ZFF of HPF speech and LP residual of HPF speech is given in Section III. The description of Gabor filter and proposed ZFF method using Gabor filtering of LP residual is given in Section IV. Finally Section V summarizes the present work with scope for future work. II. ZERO FREQUENCY FILTERING OF SPEECH This section reviews the ZFF method for epoch estimation and performance measures used for evaluating epoch extraction methods. A. Epoch Estimation Using ZFF method The algorithm for estimating the epochs in clean speech by ZFF is given as follows [2]: Difference input speech signal s(n) x(n) = s(n) s(n ) () Compute the output of cascade of two ideal digital resonators at Hz 4 y(n) = a k y(n k) + x(n) (2) k= where a = 4, a 2 = -6, a 3 = 4, a 4 = - Remove the trend i.e., ŷ(n) = y(n) ȳ(n) (3) Fig. 2. Epoch estimation performance measure for epoch identification, missing, false alram and identification accuracy where ȳ(n) = N 2N+ n= N y(n) and 2N + corresponds to average pitch period computed over a longer segment of speech The trend removed signal ŷ(n) is termed as zero frequency filtered signal. The positive zero crossings of filtered signal will give location of epochs. B. Performance Measures for Epoch Estimation The performance measures proposed in [4] such as epoch identification rate, miss rate, false alarm rate and identification accuracy are the measures that are used for epoch estimation performance analysis. The description of these measures are as follows: Larynx cycle: The range of sample (/2) (l r +l r )< n <(/2)(l r+ + l r ) where l r, l r and l r+ are the current, preceding and succeeding reference epoch locations, respectively Identification Rate (IDR): The percentage of larynx cycles for which exactly one epoch is detected. Miss Rate (MR): The percentage of larynx cycles for which no epoch is detected. False Alarm Rate (FAR): The percentage of larynx cycles for which more than one epoch is detected. Identification Error (ζ): The timing error between reference and detected epochs in larynx cycles for which exactly one epoch was detected. Identification Accuracy (σ) (IDA): The standard deviation of identification error ζ. Small values of σ indicate high accuracy of identification. Figure 2 gives the graphical illustration for the epochs identification, missing, false alarm and epoch identification accuracy. III. EPOCHS ESTIMATION PERFORMANCE FOR CLEAN AND HIGH PASS FILTERED SPEECH The performance is evaluated across CMU arctic database having simultaneous EGG recordings []. 32 phonetically

3 TABLE I. COMPARISON ZFF EPOCH ESTIMATION PERFORMANCE FOR CLEAN SPEECH, HPF SPEECH AND LP RESIDUAL OF HPF SPEECH IN THE CMU ARCTIC DATABASE. Speaker IDR MR FAR IDA (ms) Speech HPF Speech LP residual- HPF speech Gabor filter, σ=.3, ω=.75, N=8 x Time (s) 5 Fig. 4. Gabor filtering of LP residual. A voiced segment of HPF speech, corresponding LP residual, two time convolved residual sequence with the Gabor filter and the Gabor filtered residual sequence Time index (n) Fig. 3. Gabor filter with parameters σ=.3, ω =.75 and N=8 filter is given by, g(n) = e ( (n N ) 2 2 2σ 2 2πσ +jωn) (4) balanced utterances of three speakers (2 males and male) are used for evaluation. The reference epochs are obtained by ZFF of difference EGG. All utterances of CMU Arctic database are converted from the original recorded sampling rate of 32 khz to 8 khz. The HPF speech signals are generated by filtering Arctic speech utterances using a high pass filter with a cutoff frequency of 5 Hz [9]. The cutoff frequency of 5 Hz for high pass filter is selected in order to attenuate all the frequency components that are in human pitch range. Table I shows comparison of epochs estimation performance from clean and HPF speech using ZFF method [9]. Table I shows the effectiveness of ZFF method in extracting accurate epochs locations for clean speech. However, a significant degradation in the performance is observed in the estimated epochs in HPF speech using ZFF method. As the LP residual shows sharp discontinuities at epochs location, the epoch estimated by the ZFF of LP residual of HPF speech gives better performance compared to ZFF of HPF speech. The LP residual of HPF speech is computed by th order LP analysis with a frame size of 2 ms and shift of ms. However, the performance is not at par with that of clean speech case. The degradation of the epoch estimation performance in HPF speech using DYPSA is reported in [9]. The epoch estimation performance from HPF speech can be further improved by enhancing impulse like discontinuities at epochs location of LP residual. Section IV describes the proposed method of improving epochs estimation performance of ZFF of LP residual obtained from HPF speech using Gabor filter. IV. EPOCH ESTIMATION FROM LP RESIDUAL USING GABOR FILTER The impulse like discontinuities at epochs locations in LP residual is sharpened by convolving LP residual with a Gabor filter or a modulated gaussian pulse. The expression for Gabor where σ represents spread of gaussian, ω is frequency of the modulating sinusoid, n is time index and N is length of filter [], [2]. In the present work, value of σ, ω and filter length (N) are selected as.3,.75 and 8, respectively. From the Figure 3, it can be observed that the shape of gabor filter is similar to discontinuities at reference epochs location of the difference EGG. To further sharpen the discontinuities, the residual of HPF speech is filtered two times with Gabor filter. The filtered residual is then subtracted from the residual of HPF speech. This mathematically represented by the following equation, y(n) = r(n) r(n) (5) where r(n) is obtained by convolving residual of HPF speech, r(n), two times with Gabor coefficients, g(n) given in Eq. 4. Hereafter, the sequence y(n) is termed as Gabor filtered residual sequence. Figure 4 plots a voiced frame of HPF speech, LP residual, residual sequence obtained by convolution with Gabor filter coefficients. Comparison of Figure 4 and shows sharper impulse like discontinuities for Gabor filtered residual sequence than residual of HPF speech. Also, it has to be noted from the plot that impulse like discontinuities of other regions are suppressed in Gabor filtered residual compared to LP residual of HPF speech. The epochs in HPF speech are estimated by ZFF of Gabor filtered residual signal. Table II presents the epochs estimation performance obtained for each speaker of CMU-Arctic database. A significant improvement in epochs estimation performance over ZFF of HPF residual given in Table I. A. Comparison with Epoch Estimation by ZFF of Hilbert Envelope of HPF Speech Table III shows the epochs estimation performance of Hilbert envelope of HPF speech and Hilbert envelope of

4 (e) (g).5 (h).5 (i) (f) Time (Samples) Fig. 5. Comparison of Epochs estimation by ZFF method using HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech. A voiced segment of HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech. The corresponding segments of zero frequency filtered signal and Estimated epochs locations from HPF speech (&(g)), Gabor filtered residual ((e)&(h))and Hilbert envelope of HPF speech ((f)&(i)). TABLE II. PERFORMANCE EVALUATION OF EPOCH ESTIMATION BY THE ZFF OF GABOR FILTERED RESIDUAL FROM HPF SPEECH. CMU-Arctic Spkr IDR MR FAR IDA (ms) Tot. Ref. Epochs SLT BDL JMK Tot. Avg TABLE III. EPOCHS ESTIMATION PERFORMANCES OF ZFF USING HILBERT ENVELOPE OF SPEECH AND LP RESIDUAL OF HPF SPEECH. THE PERFORMANCE IS THE AVERAGE PERFORMANCE OF ALL SPEAKERS IN CMU-ARCTIC DATABASE BY CONSIDERING A TOTAL OF REFERENCE EPOCHS. Signal HE-HPF Speech HE-LP residual-hpf Speech IDR MR FAR IDA (ms) HE:- Hilbert envelope LP residual of HPF speech. Even though Hilbert envelope of HPF speech gives significantly better epochs estimation performance in terms of higher epoch identification rate and reduced miss rate and false alarm rate, provides relatively poor epoch identification accuracy. However, ZFF of Gabor filtered LP residual gives a better identification accuracy compared to that of Hilbert envelope of HPF speech or Hilbert envelope of residual of HPF speech cases. Figure 5 compares the zero frequency filtered signal and epochs estimated by the ZFF of HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech respectively. The spurious zero crossings in the zero frequency filtered signal, as shown in Figure 5, result in the false estimation of epochs in the conventional ZFF of HPF speech which is given in Figure 5(g). The zero frequency filtered signal segment obtained by the ZFF of Gabor filtered residual, shown in Figure 5(e), is free from spurious zero crossings. Figure 5 shows the Hilbert envelope of HPF speech. It can be observed Error Probability Density HE HPF Proposed Absolute Deviation (ms) Fig. 6. Comparison of distributions of estimated epochs deviation(σ) values obtained by ZFF of Hilbert envelope-hpf Speech (Blue colored plot) and Proposed Gabor filtered residual of HPF speech (Red color plot). that the low pass nature of the Hilbert envelope smoothes the impulse like discontinuity around the epochs location and hence a smooth ZFFS without spurious zero crossings is obtained in Figure 5(f). However, the deviation of the estimated epochs from the true locations can be observed by comparing the estimated epochs given in Figure 5(h). Figure 6 probability distribution of σ values for Hilbert envelope of HPF speech case and Gabor filtered residual case. Histogram of standard deviation (σ) values of estimated epochs location with respect to reference epochs location obtained for each utterance in the CMU-Arctic database are used to compute the probability density function. The plot indicates the probable deviation occurred for the utterances in the whole database. The epochs estimated by the ZFF of Hilbert envelope of HPF speech has an average identification accuracy of.59 ms which is higher than that of the proposed Gabor filtered residual case which has an average deviation of.34 ms. The increased spread of the Hilbert envelope of HPF speech case in Figure 6 indicates higher deviation of estimated epochs from reference epochs location. V. SUMMARY AND SCOPE FOR FUTURE WORK A significant degradation in the epoch estimation performance by ZFF method is observed in case of HPF speech.

5 The use of Hilbert envelope of HPF speech improves the epoch identification rate at the cost of reduced identification accuracy. To improve the epoch identification accuracy, the strength of impulse like discontinuities at epochs location of LP residual of HPF speech are enhanced using a Gabor filter. The identification accuracy of estimated epochs using ZFF of Gabor filtered residual found to show improvement over ZFF of Hilbert envelope of HPF speech. As the HPF speech signal is a special case for bandlimited telephonic speech signal (between 35 Hz-3.4kHz), performance of proposed epochs estimation method has to be evaluated for a large telephonic speech database. VI. ACKNOWLEDGEMENTS The work presented in this paper is a part of DST Fast track project titled, Analysis, processing and synthesis of emotions in speech. Hence we are thaknful to the funding agency, Science and Engineering Research Board (SERB), New Delhi, for supporting this project. REFERENCES [] T. Drugman and T. Dutoit, Glottal closure and opening instant from speech signals, in Proc. INTERSPEECH, 29. [2] K. S. R. Murty and B. Yegnanarayana, Epoch extraction from speech signals, IEEE Trans. Audio, Speech and Language Process., vol. 6, no. 8, pp , Nov. 28. [3] R. Smits and B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 4, pp , Sep.995. [4] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, Estimation of glottal closure instants in voiced speech using DYPSA algorithm, IEEE Trans. Audio, Speech and Lang. Process., vol. 5, no., pp , 27. [5] T. V. Ananthapadmanabha and B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Trans. Acoust., Speech and Signal Process., vol. ASSP-27, no. 4, pp , 979. [6] K. S. Rao and B. Yegnanarayana, Prosody modification using instants of significant excitation, IEEE Trans. Audio, Speech and Language Processing, vol. 4, pp , May 26. [7] E. A. P. Habets, N. D. Gaubitch, and P. A. Naylor, Temporal selective dereverberation of noisy speech using one microphone, in Proc. ICASSP, Jan. 28, pp [8] S. R. M. Prasanna, D. Govind, K. S. Rao, and B. Yenanarayana, Fast prosody modification using instants of significant excitation, in Proc Speech Prosody, May 2. [9] D. Govind, S. R. M. Prasanna, and D. Pati, Epoch extraction in high pass filtered speech using hilbert envelope, in Proc. INTERSPEECH, 2. [] J. Kominek and A. Black, CMU-Arctic speech databases, in in 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, 24, pp [] D. Gabor, Theory of communications, J. Inst. Elect. Eng., vol. 93, no. 2, p , 946. [2] K. S. Rao, S. R. M. Prasanna, and B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function, IEEE Signal Processing Letters, vol. 4, pp , Oct. 27.

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA

More information

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Chirp Decomposition of Speech Signals for Glottal Source Estimation Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

Glottal Source Estimation using an Automatic Chirp Decomposition

Glottal Source Estimation using an Automatic Chirp Decomposition Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

Modeling the creaky excitation for parametric speech synthesis.

Modeling the creaky excitation for parametric speech synthesis. Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Linear Prediction: The Problem, its Solution and Application to Speech

Linear Prediction: The Problem, its Solution and Application to Speech Dublin Institute of Technology ARROW@DIT Conference papers Audio Research Group 2008-01-01 Linear Prediction: The Problem, its Solution and Application to Speech Alan O'Cinneide Dublin Institute of Technology,

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

Vocoding approaches for statistical parametric speech synthesis

Vocoding approaches for statistical parametric speech synthesis Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR

More information

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,

More information

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING

QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING Dhananjaya Gowda, Manu Airaksinen, Paavo Alku Dept. of Signal Processing and Acoustics,

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Thomas Drugman, Baris Bozkurt, Thierry Dutoit To cite this version: Thomas Drugman, Baris Bozkurt, Thierry

More information

Applications of Linear Prediction

Applications of Linear Prediction SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.

More information

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions Ville Vestman 1, Dhananjaya Gowda, Md Sahidullah 1, Paavo Alku 3, Tomi

More information

An Alternating Projections Algorithm for Sparse Blind Deconvolution

An Alternating Projections Algorithm for Sparse Blind Deconvolution An Alternating Projections Algorithm for Sparse Blind Deconvolution Aniruddha Adiga Department of Electrical Engineering Indian Institute of Science, Bangalore Email: aniruddha@ee.iisc.ernet.in Advisor:

More information

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007 Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Acoustic Source

More information

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm

More information

Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation

Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation Available online at www.sciencedirect.com Speech Communication 53 (2011) 855 866 www.elsevier.com/locate/specom Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation

More information

The Equivalence of ADPCM and CELP Coding

The Equivalence of ADPCM and CELP Coding The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share

More information

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking Dhananjaya Gowda and

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao ISSN: 77-3754 International Journal of Engineering and Innovative echnology (IJEI Volume 1, Issue, February 1 A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

Feature Extraction for ASR: Pitch

Feature Extraction for ASR: Pitch Feature Extraction for ASR: Pitch Wantee Wang 2015-03-14 16:55:51 +0800 Contents 1 Cross-correlation and Autocorrelation 1 2 Normalized Cross-Correlation Function 3 3 RAPT 4 4 Kaldi Pitch Tracker 5 Pitch

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.

More information

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,

More information

Novelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

Novelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly Novelty detection Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly Novelty detection Energy burst Find the start time (onset) of new events in the audio signal.

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Glottal Modeling and Closed-Phase Analysis for Speaker Recognition

Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Raymond E. Slyh, Eric G. Hansen and Timothy R. Anderson Air Force Research Laboratory, Human Effectiveness Directorate, Wright-Patterson

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

A comparative study of time-delay estimation techniques for convolutive speech mixtures

A comparative study of time-delay estimation techniques for convolutive speech mixtures A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es

More information

Improved system blind identification based on second-order cyclostationary statistics: A group delay approach

Improved system blind identification based on second-order cyclostationary statistics: A group delay approach SaÅdhanaÅ, Vol. 25, Part 2, April 2000, pp. 85±96. # Printed in India Improved system blind identification based on second-order cyclostationary statistics: A group delay approach P V S GIRIDHAR 1 and

More information

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

5Nonlinear methods for speech analysis

5Nonlinear methods for speech analysis 5Nonlinear methods for speech analysis and synthesis Steve McLaughlin and Petros Maragos 5.1. Introduction Perhaps the first question to ask on reading this chapter is why should we consider nonlinear

More information

LAB 6: FIR Filter Design Summer 2011

LAB 6: FIR Filter Design Summer 2011 University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering ECE 311: Digital Signal Processing Lab Chandra Radhakrishnan Peter Kairouz LAB 6: FIR Filter Design Summer 011

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper

More information

GMM-Based Speech Transformation Systems under Data Reduction

GMM-Based Speech Transformation Systems under Data Reduction GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex

More information

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill

More information

A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS. Luke Dahl, Jean-Marc Jot

A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS. Luke Dahl, Jean-Marc Jot Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 000 A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS Lue Dahl, Jean-Marc Jot Creative Advanced

More information

A SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION

A SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION 6th European Signal Processing Conference (EUSIPCO 28), Lausanne, Switzerland, August 25-29, 28, copyright by EURASIP A SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION Pradeep

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Thursday, October 29, LPC Analysis

Thursday, October 29, LPC Analysis LPC Analysis Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed

More information

Application of the Bispectrum to Glottal Pulse Analysis

Application of the Bispectrum to Glottal Pulse Analysis ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker

More information

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization Fei Sha and Lawrence K. Saul Dept. of Computer and Information Science University of Pennsylvania, Philadelphia,

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

DIGITAL SIGNAL PROCESSING LECTURE 1

DIGITAL SIGNAL PROCESSING LECTURE 1 DIGITAL SIGNAL PROCESSING LECTURE 1 Fall 2010 2K8-5 th Semester Tahir Muhammad tmuhammad_07@yahoo.com Content and Figures are from Discrete-Time Signal Processing, 2e by Oppenheim, Shafer, and Buck, 1999-2000

More information

Design of a CELP coder and analysis of various quantization techniques

Design of a CELP coder and analysis of various quantization techniques EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005

More information

Empirical Mean and Variance!

Empirical Mean and Variance! Global Image Properties! Global image properties refer to an image as a whole rather than components. Computation of global image properties is often required for image enhancement, preceding image analysis.!

More information

Introduction to Computer Vision. 2D Linear Systems

Introduction to Computer Vision. 2D Linear Systems Introduction to Computer Vision D Linear Systems Review: Linear Systems We define a system as a unit that converts an input function into an output function Independent variable System operator or Transfer

More information

Chapter 10 Applications in Communications

Chapter 10 Applications in Communications Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM

More information

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES

LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES Saikat Chatterjee and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science,

More information

Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University Novelty detection Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University Novelty detection Energy burst Find the start time (onset) of new events (notes) in the music signal. Short

More information

Antialiased Soft Clipping using an Integrated Bandlimited Ramp

Antialiased Soft Clipping using an Integrated Bandlimited Ramp Budapest, Hungary, 31 August 2016 Antialiased Soft Clipping using an Integrated Bandlimited Ramp Fabián Esqueda*, Vesa Välimäki*, and Stefan Bilbao** *Dept. Signal Processing and Acoustics, Aalto University,

More information

AUTOREGRESSIVE (AR) modeling identifies and exploits

AUTOREGRESSIVE (AR) modeling identifies and exploits IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007 5237 Autoregressive Modeling of Temporal Envelopes Marios Athineos, Student Member, IEEE, and Daniel P. W. Ellis, Senior Member, IEEE

More information

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models 8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer. Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION Hauke Krüger and Peter Vary Institute of Communication Systems and Data Processing RWTH Aachen University, Templergraben

More information

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada February 2003 c 2003 Peter Kabal 2003/02/25

More information

Efficient Use Of Sparse Adaptive Filters

Efficient Use Of Sparse Adaptive Filters Efficient Use Of Sparse Adaptive Filters Andy W.H. Khong and Patrick A. Naylor Department of Electrical and Electronic Engineering, Imperial College ondon Email: {andy.khong, p.naylor}@imperial.ac.uk Abstract

More information

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power

More information

DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo

DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo 7th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 4-8, 9 DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS Sakari Tervo Helsinki University of Technology Department of

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

Chapter 2 Speech Production Model

Chapter 2 Speech Production Model Chapter 2 Speech Production Model Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

Image Enhancement in the frequency domain. GZ Chapter 4

Image Enhancement in the frequency domain. GZ Chapter 4 Image Enhancement in the frequency domain GZ Chapter 4 Contents In this lecture we will look at image enhancement in the frequency domain The Fourier series & the Fourier transform Image Processing in

More information

STATISTICAL MODELLING OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION ERRORS. Felicia Lim, Patrick A. Naylor

STATISTICAL MODELLING OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION ERRORS. Felicia Lim, Patrick A. Naylor STTISTICL MODELLING OF MULTICHNNEL BLIND SYSTEM IDENTIFICTION ERRORS Felicia Lim, Patrick. Naylor Dept. of Electrical and Electronic Engineering, Imperial College London, UK {felicia.lim6, p.naylor}@imperial.ac.uk

More information

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig

More information