Improved Method for Epoch Extraction in High Pass Filtered Speech
|
|
- Lucinda Warner
- 6 years ago
- Views:
Transcription
1 Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu d govind@cb.amrita.edu S. R. Mahadeva Prasanna, Ramesh K Department of Electronics & Electrical Engineering Indian Institute of Technology Guwahati, Assam {prasanna,kk.ramesh}@iitg.ernet.in Abstract The objective of present work is to improve the epoch estimation performance in high pass filtered (HPF) speech using conventional zero frequency filtering (ZFF) approach. The strength of impulse at zero frequency is significantly attenuated in case of HPF speech and hence shows significant degradation in epoch estimation performance by ZFF approach. Since linear prediction (LP) residual of speech is characterized by sharper impulse discontinuities at epochs location compared to speech waveform, the present work uses LP residual of HPF speech for epoch estimation using ZFF method. The Gabor filtering on LP residual is carried out for further increasing strength of impulses at epochs location of LP residual. The epochs location are estimated by ZFF of Gabor filtered LP residual. The performance of proposed method is better compared to that of existing Hilbert envelope based ZFF approach with improved epoch identification accuracy. I. INTRODUCTION The epochs in speech are the time instants at which excitation of vocal tract is maximum [], [2], [3], [4]. The epochs represent instants of glottal closure in case of voiced speech and onset of burst or frication in unvoiced speech. Due to the effect of vocal tract characteristics, estimation of epochs from speech becomes a challenging task [2]. Hence many methods are proposed in the literature for reliable estimation of epochs from speech [5], [2], [3], [4]. Due to its significance, in many applications, the processing is carried out anchored around epochs location [6], [7], [8]. Among the existing approaches, the group delay (GD) based processing, DYPSA and zero frequency filtering (ZFF) methods are the popular approaches used for extracting epochs. Among these methods, ZFF method is a well known approach for reliable epoch estimation with reduced computational complexity [8], [2]. The impulse like characteristics of epochs are exploited in the ZFF method [2]. In ZFF method, the speech is initially passed through cascade of two zero frequency resonators. The trend in zero frequency resonator output is then removed by local mean subtraction to obtain zero frequency filtered signal. The negative to positive zero crossings of zero frequency filtered signal are hypothesized as epochs location. The ZFF method provides the best epochs estimation performance for clean speech signals which has sufficient energy near the zero frequency. However, due to significant attenuation of low frequency components near the zero frequency, the performance degrades for band limited signals such as high pass filtered (HPF) and telephone recorded speech signals [9]. There were attempts to improve epochs estimation performances in HPF speech by the ZFF method [9]. In this work, low frequency nature of Hilbert envelope is used to emphasize the energy near the zero frequency of speech signal. An improved epochs estimation performance for HPF speech is obtained by passing Hilbert envelope of the HPF speech or its residual through zero frequency resonator. Although, the epoch estimation performance is improved in terms of higher epoch identification rate, lower miss rate and false alarm rates, the epoch identification error measured as the deviation of the estimated epochs from the reference epochs remain higher. Reduced epoch identification accuracy of this approach make the method less suitable for applications such as epoch based prosody modification, where the perceptual quality of the prosody modified speech mostly depends on the accuracy with which epochs are estimated [8]. The smoothing tendency of the Hilbert envelope at the epochs location increases the deviation of estimated epochs from actual reference epochs location obtained from the differenced electro-glottogram (EGG). Hence epoch estimation using Hilbert envelope of HPF of speech results in a poor temporal resolution. Figure compares the epochs estimated from ZFF of Hilbert envelope of HPF speech and reference epochs estimated from the corresponding differenced EGG. It can be observed from the Figure that, though the discontinuities at the epochs location are enhanced by Hilbert envelope, the low frequency nature of Hilbert envelope smoothes the peaks. Hence the estimation of epochs location result in a poor temporal resolution. The poor temporal resolution of estimated epochs can be confirmed by comparing reference epochs location represented differenced EGG peaks given in Figure. The samples in LP residual are uncorrelated and higher prediction errors form strong impulse like discontinuities at the epochs location. Hence the present work focusses on exploiting the impulse like discontinuities in LP residual for epoch estimation from HPF speech. Even though, ZFF of LP residual obtained from HPF speech gives better performance compared to performance of conventional ZFF of HPF speech, the performance is not at par with ZFF of Hilbert envelope of HPF speech. However, the epoch identification accuracy is found to be better than the ZFF of Hilbert envelope of HPF speech. To further enhance the impulse like discontinuities at the epochs location, the LP residual of HPF speech is filtered using a Gabor filter having a shape equivalent to the discontinuity at glottal pulse. The epochs are obtained by ZFF of Gabor filtered residual sequence. The performance of pro-
2 Time (Samples) Fig.. Deviation of estimated epochs in the Hilbert envelope of HPF speech from true locations. A voiced segment of HPF speech, its Hilbert envelope Estimated epochs by ZFF of Hilbert envelope of HPF speech Difference EGG peaks showing the reference epochs location posed method is confirmed by improved epoch identification accuracy compared to that of ZFF of Hilbert envelope of HPF speech. The rest of the paper is organized as follows: Section II describes the algorithmic steps in ZFF method. The comparison of epoch estimation performances of ZFF of HPF speech and LP residual of HPF speech is given in Section III. The description of Gabor filter and proposed ZFF method using Gabor filtering of LP residual is given in Section IV. Finally Section V summarizes the present work with scope for future work. II. ZERO FREQUENCY FILTERING OF SPEECH This section reviews the ZFF method for epoch estimation and performance measures used for evaluating epoch extraction methods. A. Epoch Estimation Using ZFF method The algorithm for estimating the epochs in clean speech by ZFF is given as follows [2]: Difference input speech signal s(n) x(n) = s(n) s(n ) () Compute the output of cascade of two ideal digital resonators at Hz 4 y(n) = a k y(n k) + x(n) (2) k= where a = 4, a 2 = -6, a 3 = 4, a 4 = - Remove the trend i.e., ŷ(n) = y(n) ȳ(n) (3) Fig. 2. Epoch estimation performance measure for epoch identification, missing, false alram and identification accuracy where ȳ(n) = N 2N+ n= N y(n) and 2N + corresponds to average pitch period computed over a longer segment of speech The trend removed signal ŷ(n) is termed as zero frequency filtered signal. The positive zero crossings of filtered signal will give location of epochs. B. Performance Measures for Epoch Estimation The performance measures proposed in [4] such as epoch identification rate, miss rate, false alarm rate and identification accuracy are the measures that are used for epoch estimation performance analysis. The description of these measures are as follows: Larynx cycle: The range of sample (/2) (l r +l r )< n <(/2)(l r+ + l r ) where l r, l r and l r+ are the current, preceding and succeeding reference epoch locations, respectively Identification Rate (IDR): The percentage of larynx cycles for which exactly one epoch is detected. Miss Rate (MR): The percentage of larynx cycles for which no epoch is detected. False Alarm Rate (FAR): The percentage of larynx cycles for which more than one epoch is detected. Identification Error (ζ): The timing error between reference and detected epochs in larynx cycles for which exactly one epoch was detected. Identification Accuracy (σ) (IDA): The standard deviation of identification error ζ. Small values of σ indicate high accuracy of identification. Figure 2 gives the graphical illustration for the epochs identification, missing, false alarm and epoch identification accuracy. III. EPOCHS ESTIMATION PERFORMANCE FOR CLEAN AND HIGH PASS FILTERED SPEECH The performance is evaluated across CMU arctic database having simultaneous EGG recordings []. 32 phonetically
3 TABLE I. COMPARISON ZFF EPOCH ESTIMATION PERFORMANCE FOR CLEAN SPEECH, HPF SPEECH AND LP RESIDUAL OF HPF SPEECH IN THE CMU ARCTIC DATABASE. Speaker IDR MR FAR IDA (ms) Speech HPF Speech LP residual- HPF speech Gabor filter, σ=.3, ω=.75, N=8 x Time (s) 5 Fig. 4. Gabor filtering of LP residual. A voiced segment of HPF speech, corresponding LP residual, two time convolved residual sequence with the Gabor filter and the Gabor filtered residual sequence Time index (n) Fig. 3. Gabor filter with parameters σ=.3, ω =.75 and N=8 filter is given by, g(n) = e ( (n N ) 2 2 2σ 2 2πσ +jωn) (4) balanced utterances of three speakers (2 males and male) are used for evaluation. The reference epochs are obtained by ZFF of difference EGG. All utterances of CMU Arctic database are converted from the original recorded sampling rate of 32 khz to 8 khz. The HPF speech signals are generated by filtering Arctic speech utterances using a high pass filter with a cutoff frequency of 5 Hz [9]. The cutoff frequency of 5 Hz for high pass filter is selected in order to attenuate all the frequency components that are in human pitch range. Table I shows comparison of epochs estimation performance from clean and HPF speech using ZFF method [9]. Table I shows the effectiveness of ZFF method in extracting accurate epochs locations for clean speech. However, a significant degradation in the performance is observed in the estimated epochs in HPF speech using ZFF method. As the LP residual shows sharp discontinuities at epochs location, the epoch estimated by the ZFF of LP residual of HPF speech gives better performance compared to ZFF of HPF speech. The LP residual of HPF speech is computed by th order LP analysis with a frame size of 2 ms and shift of ms. However, the performance is not at par with that of clean speech case. The degradation of the epoch estimation performance in HPF speech using DYPSA is reported in [9]. The epoch estimation performance from HPF speech can be further improved by enhancing impulse like discontinuities at epochs location of LP residual. Section IV describes the proposed method of improving epochs estimation performance of ZFF of LP residual obtained from HPF speech using Gabor filter. IV. EPOCH ESTIMATION FROM LP RESIDUAL USING GABOR FILTER The impulse like discontinuities at epochs locations in LP residual is sharpened by convolving LP residual with a Gabor filter or a modulated gaussian pulse. The expression for Gabor where σ represents spread of gaussian, ω is frequency of the modulating sinusoid, n is time index and N is length of filter [], [2]. In the present work, value of σ, ω and filter length (N) are selected as.3,.75 and 8, respectively. From the Figure 3, it can be observed that the shape of gabor filter is similar to discontinuities at reference epochs location of the difference EGG. To further sharpen the discontinuities, the residual of HPF speech is filtered two times with Gabor filter. The filtered residual is then subtracted from the residual of HPF speech. This mathematically represented by the following equation, y(n) = r(n) r(n) (5) where r(n) is obtained by convolving residual of HPF speech, r(n), two times with Gabor coefficients, g(n) given in Eq. 4. Hereafter, the sequence y(n) is termed as Gabor filtered residual sequence. Figure 4 plots a voiced frame of HPF speech, LP residual, residual sequence obtained by convolution with Gabor filter coefficients. Comparison of Figure 4 and shows sharper impulse like discontinuities for Gabor filtered residual sequence than residual of HPF speech. Also, it has to be noted from the plot that impulse like discontinuities of other regions are suppressed in Gabor filtered residual compared to LP residual of HPF speech. The epochs in HPF speech are estimated by ZFF of Gabor filtered residual signal. Table II presents the epochs estimation performance obtained for each speaker of CMU-Arctic database. A significant improvement in epochs estimation performance over ZFF of HPF residual given in Table I. A. Comparison with Epoch Estimation by ZFF of Hilbert Envelope of HPF Speech Table III shows the epochs estimation performance of Hilbert envelope of HPF speech and Hilbert envelope of
4 (e) (g).5 (h).5 (i) (f) Time (Samples) Fig. 5. Comparison of Epochs estimation by ZFF method using HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech. A voiced segment of HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech. The corresponding segments of zero frequency filtered signal and Estimated epochs locations from HPF speech (&(g)), Gabor filtered residual ((e)&(h))and Hilbert envelope of HPF speech ((f)&(i)). TABLE II. PERFORMANCE EVALUATION OF EPOCH ESTIMATION BY THE ZFF OF GABOR FILTERED RESIDUAL FROM HPF SPEECH. CMU-Arctic Spkr IDR MR FAR IDA (ms) Tot. Ref. Epochs SLT BDL JMK Tot. Avg TABLE III. EPOCHS ESTIMATION PERFORMANCES OF ZFF USING HILBERT ENVELOPE OF SPEECH AND LP RESIDUAL OF HPF SPEECH. THE PERFORMANCE IS THE AVERAGE PERFORMANCE OF ALL SPEAKERS IN CMU-ARCTIC DATABASE BY CONSIDERING A TOTAL OF REFERENCE EPOCHS. Signal HE-HPF Speech HE-LP residual-hpf Speech IDR MR FAR IDA (ms) HE:- Hilbert envelope LP residual of HPF speech. Even though Hilbert envelope of HPF speech gives significantly better epochs estimation performance in terms of higher epoch identification rate and reduced miss rate and false alarm rate, provides relatively poor epoch identification accuracy. However, ZFF of Gabor filtered LP residual gives a better identification accuracy compared to that of Hilbert envelope of HPF speech or Hilbert envelope of residual of HPF speech cases. Figure 5 compares the zero frequency filtered signal and epochs estimated by the ZFF of HPF speech, Gabor filtered residual and Hilbert envelope of HPF speech respectively. The spurious zero crossings in the zero frequency filtered signal, as shown in Figure 5, result in the false estimation of epochs in the conventional ZFF of HPF speech which is given in Figure 5(g). The zero frequency filtered signal segment obtained by the ZFF of Gabor filtered residual, shown in Figure 5(e), is free from spurious zero crossings. Figure 5 shows the Hilbert envelope of HPF speech. It can be observed Error Probability Density HE HPF Proposed Absolute Deviation (ms) Fig. 6. Comparison of distributions of estimated epochs deviation(σ) values obtained by ZFF of Hilbert envelope-hpf Speech (Blue colored plot) and Proposed Gabor filtered residual of HPF speech (Red color plot). that the low pass nature of the Hilbert envelope smoothes the impulse like discontinuity around the epochs location and hence a smooth ZFFS without spurious zero crossings is obtained in Figure 5(f). However, the deviation of the estimated epochs from the true locations can be observed by comparing the estimated epochs given in Figure 5(h). Figure 6 probability distribution of σ values for Hilbert envelope of HPF speech case and Gabor filtered residual case. Histogram of standard deviation (σ) values of estimated epochs location with respect to reference epochs location obtained for each utterance in the CMU-Arctic database are used to compute the probability density function. The plot indicates the probable deviation occurred for the utterances in the whole database. The epochs estimated by the ZFF of Hilbert envelope of HPF speech has an average identification accuracy of.59 ms which is higher than that of the proposed Gabor filtered residual case which has an average deviation of.34 ms. The increased spread of the Hilbert envelope of HPF speech case in Figure 6 indicates higher deviation of estimated epochs from reference epochs location. V. SUMMARY AND SCOPE FOR FUTURE WORK A significant degradation in the epoch estimation performance by ZFF method is observed in case of HPF speech.
5 The use of Hilbert envelope of HPF speech improves the epoch identification rate at the cost of reduced identification accuracy. To improve the epoch identification accuracy, the strength of impulse like discontinuities at epochs location of LP residual of HPF speech are enhanced using a Gabor filter. The identification accuracy of estimated epochs using ZFF of Gabor filtered residual found to show improvement over ZFF of Hilbert envelope of HPF speech. As the HPF speech signal is a special case for bandlimited telephonic speech signal (between 35 Hz-3.4kHz), performance of proposed epochs estimation method has to be evaluated for a large telephonic speech database. VI. ACKNOWLEDGEMENTS The work presented in this paper is a part of DST Fast track project titled, Analysis, processing and synthesis of emotions in speech. Hence we are thaknful to the funding agency, Science and Engineering Research Board (SERB), New Delhi, for supporting this project. REFERENCES [] T. Drugman and T. Dutoit, Glottal closure and opening instant from speech signals, in Proc. INTERSPEECH, 29. [2] K. S. R. Murty and B. Yegnanarayana, Epoch extraction from speech signals, IEEE Trans. Audio, Speech and Language Process., vol. 6, no. 8, pp , Nov. 28. [3] R. Smits and B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 4, pp , Sep.995. [4] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, Estimation of glottal closure instants in voiced speech using DYPSA algorithm, IEEE Trans. Audio, Speech and Lang. Process., vol. 5, no., pp , 27. [5] T. V. Ananthapadmanabha and B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval, IEEE Trans. Acoust., Speech and Signal Process., vol. ASSP-27, no. 4, pp , 979. [6] K. S. Rao and B. Yegnanarayana, Prosody modification using instants of significant excitation, IEEE Trans. Audio, Speech and Language Processing, vol. 4, pp , May 26. [7] E. A. P. Habets, N. D. Gaubitch, and P. A. Naylor, Temporal selective dereverberation of noisy speech using one microphone, in Proc. ICASSP, Jan. 28, pp [8] S. R. M. Prasanna, D. Govind, K. S. Rao, and B. Yenanarayana, Fast prosody modification using instants of significant excitation, in Proc Speech Prosody, May 2. [9] D. Govind, S. R. M. Prasanna, and D. Pati, Epoch extraction in high pass filtered speech using hilbert envelope, in Proc. INTERSPEECH, 2. [] J. Kominek and A. Black, CMU-Arctic speech databases, in in 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, 24, pp [] D. Gabor, Theory of communications, J. Inst. Elect. Eng., vol. 93, no. 2, p , 946. [2] K. S. Rao, S. R. M. Prasanna, and B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function, IEEE Signal Processing Letters, vol. 4, pp , Oct. 27.
A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information
Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA
More informationChirp Decomposition of Speech Signals for Glottal Source Estimation
Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationL8: Source estimation
L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction
More informationGlottal Source Estimation using an Automatic Chirp Decomposition
Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationSinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,
Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationLinear Prediction: The Problem, its Solution and Application to Speech
Dublin Institute of Technology ARROW@DIT Conference papers Audio Research Group 2008-01-01 Linear Prediction: The Problem, its Solution and Application to Speech Alan O'Cinneide Dublin Institute of Technology,
More informationrepresentation of speech
Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationVocoding approaches for statistical parametric speech synthesis
Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationMITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationZeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals
Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationQUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING
QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING Dhananjaya Gowda, Manu Airaksinen, Paavo Alku Dept. of Signal Processing and Acoustics,
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationSPEECH COMMUNICATION 6.541J J-HST710J Spring 2004
6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual
More informationTimbral, Scale, Pitch modifications
Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationLab 9a. Linear Predictive Coding for Speech Processing
EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationCausal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation
Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Thomas Drugman, Baris Bozkurt, Thierry Dutoit To cite this version: Thomas Drugman, Baris Bozkurt, Thierry
More informationApplications of Linear Prediction
SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.
More informationTime-Varying Autoregressions for Speaker Verification in Reverberant Conditions
INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions Ville Vestman 1, Dhananjaya Gowda, Md Sahidullah 1, Paavo Alku 3, Tomi
More informationAn Alternating Projections Algorithm for Sparse Blind Deconvolution
An Alternating Projections Algorithm for Sparse Blind Deconvolution Aniruddha Adiga Department of Electrical Engineering Indian Institute of Science, Bangalore Email: aniruddha@ee.iisc.ernet.in Advisor:
More informationLinear Prediction Coding. Nimrod Peleg Update: Aug. 2007
Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Acoustic Source
More informationNearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender
Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm
More informationCausal anticausal decomposition of speech using complex cepstrum for glottal source estimation
Available online at www.sciencedirect.com Speech Communication 53 (2011) 855 866 www.elsevier.com/locate/specom Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation
More informationThe Equivalence of ADPCM and CELP Coding
The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share
More informationTime-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking Dhananjaya Gowda and
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationA low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao
ISSN: 77-3754 International Journal of Engineering and Innovative echnology (IJEI Volume 1, Issue, February 1 A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationFeature Extraction for ASR: Pitch
Feature Extraction for ASR: Pitch Wantee Wang 2015-03-14 16:55:51 +0800 Contents 1 Cross-correlation and Autocorrelation 1 2 Normalized Cross-Correlation Function 3 3 RAPT 4 4 Kaldi Pitch Tracker 5 Pitch
More informationCOMP 546, Winter 2018 lecture 19 - sound 2
Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,
More informationSpeech Coding. Speech Processing. Tom Bäckström. October Aalto University
Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.
More informationSPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION
SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,
More informationNovelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly
Novelty detection Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly Novelty detection Energy burst Find the start time (onset) of new events in the audio signal.
More informationCS578- Speech Signal Processing
CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction
More informationGlottal Modeling and Closed-Phase Analysis for Speaker Recognition
Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Raymond E. Slyh, Eric G. Hansen and Timothy R. Anderson Air Force Research Laboratory, Human Effectiveness Directorate, Wright-Patterson
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationA comparative study of time-delay estimation techniques for convolutive speech mixtures
A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es
More informationImproved system blind identification based on second-order cyclostationary statistics: A group delay approach
SaÅdhanaÅ, Vol. 25, Part 2, April 2000, pp. 85±96. # Printed in India Improved system blind identification based on second-order cyclostationary statistics: A group delay approach P V S GIRIDHAR 1 and
More informationMULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh
MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel
More informationDigital Signal Processing
Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using
More informationLECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES
LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch
More information5Nonlinear methods for speech analysis
5Nonlinear methods for speech analysis and synthesis Steve McLaughlin and Petros Maragos 5.1. Introduction Perhaps the first question to ask on reading this chapter is why should we consider nonlinear
More informationLAB 6: FIR Filter Design Summer 2011
University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering ECE 311: Digital Signal Processing Lab Chandra Radhakrishnan Peter Kairouz LAB 6: FIR Filter Design Summer 011
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationImproved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR
Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationBayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill
More informationA REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS. Luke Dahl, Jean-Marc Jot
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), Verona, Italy, December 7-9, 000 A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS Lue Dahl, Jean-Marc Jot Creative Advanced
More informationA SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION
6th European Signal Processing Conference (EUSIPCO 28), Lausanne, Switzerland, August 25-29, 28, copyright by EURASIP A SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION Pradeep
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationThursday, October 29, LPC Analysis
LPC Analysis Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed
More informationApplication of the Bispectrum to Glottal Pulse Analysis
ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker
More informationReal-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization Fei Sha and Lawrence K. Saul Dept. of Computer and Information Science University of Pennsylvania, Philadelphia,
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationDIGITAL SIGNAL PROCESSING LECTURE 1
DIGITAL SIGNAL PROCESSING LECTURE 1 Fall 2010 2K8-5 th Semester Tahir Muhammad tmuhammad_07@yahoo.com Content and Figures are from Discrete-Time Signal Processing, 2e by Oppenheim, Shafer, and Buck, 1999-2000
More informationDesign of a CELP coder and analysis of various quantization techniques
EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005
More informationEmpirical Mean and Variance!
Global Image Properties! Global image properties refer to an image as a whole rather than components. Computation of global image properties is often required for image enhancement, preceding image analysis.!
More informationIntroduction to Computer Vision. 2D Linear Systems
Introduction to Computer Vision D Linear Systems Review: Linear Systems We define a system as a unit that converts an input function into an output function Independent variable System operator or Transfer
More informationChapter 10 Applications in Communications
Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM
More informationLOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES
LOW COMPLEXITY WIDEBAND LSF QUANTIZATION USING GMM OF UNCORRELATED GAUSSIAN MIXTURES Saikat Chatterjee and T.V. Sreenivas Department of Electrical Communication Engineering Indian Institute of Science,
More informationNovelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University
Novelty detection Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University Novelty detection Energy burst Find the start time (onset) of new events (notes) in the music signal. Short
More informationAntialiased Soft Clipping using an Integrated Bandlimited Ramp
Budapest, Hungary, 31 August 2016 Antialiased Soft Clipping using an Integrated Bandlimited Ramp Fabián Esqueda*, Vesa Välimäki*, and Stefan Bilbao** *Dept. Signal Processing and Acoustics, Aalto University,
More informationAUTOREGRESSIVE (AR) modeling identifies and exploits
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007 5237 Autoregressive Modeling of Temporal Envelopes Marios Athineos, Student Member, IEEE, and Daniel P. W. Ellis, Senior Member, IEEE
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationSource/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.
Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationRe-estimation of Linear Predictive Parameters in Sparse Linear Prediction
Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationSCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION
SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION Hauke Krüger and Peter Vary Institute of Communication Systems and Data Processing RWTH Aachen University, Templergraben
More informationIll-Conditioning and Bandwidth Expansion in Linear Prediction of Speech
Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada February 2003 c 2003 Peter Kabal 2003/02/25
More informationEfficient Use Of Sparse Adaptive Filters
Efficient Use Of Sparse Adaptive Filters Andy W.H. Khong and Patrick A. Naylor Department of Electrical and Electronic Engineering, Imperial College ondon Email: {andy.khong, p.naylor}@imperial.ac.uk Abstract
More informationwhere =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag
Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power
More informationDIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo
7th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 4-8, 9 DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS Sakari Tervo Helsinki University of Technology Department of
More informationSound 2: frequency analysis
COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure
More informationChapter 2 Speech Production Model
Chapter 2 Speech Production Model Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationImage Enhancement in the frequency domain. GZ Chapter 4
Image Enhancement in the frequency domain GZ Chapter 4 Contents In this lecture we will look at image enhancement in the frequency domain The Fourier series & the Fourier transform Image Processing in
More informationSTATISTICAL MODELLING OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION ERRORS. Felicia Lim, Patrick A. Naylor
STTISTICL MODELLING OF MULTICHNNEL BLIND SYSTEM IDENTIFICTION ERRORS Felicia Lim, Patrick. Naylor Dept. of Electrical and Electronic Engineering, Imperial College London, UK {felicia.lim6, p.naylor}@imperial.ac.uk
More informationA POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL
A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig
More information