Digital Signal Processing

Size: px
Start display at page:

Download "Digital Signal Processing"

Transcription

1 Digital Signal Processing 0 (010) Contents lists available at ScienceDirect Digital Signal Processing Improved minima controlled recursive averaging technique using conditional maximum a posteriori criterion for speech enhancement Jong-Mo Kum, Yun-Sik Park, Joon-yuk Chang School of Electronic and Electrical Engineering, Inha University, Incheon , Republic of Korea article info abstract Article history: Available online February 010 Keywords: Speech enhancement Minima controlled recursive averaging (MCRA) Conditional maximum a posteriori (CMAP) In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) based on a conditional maximum a posteriori (MAP) criterion. From an investigation of the MCRA scheme, it is discovered that the MCRA method cannot take full consideration of the inter-frame correlation of voice activity since the noise power estimate is adjusted by the speech presence probability depending on an observation of the current frame. To avoid this phenomenon, the proposed MCRA approach incorporates the conditional MAP criterion in which the noise power estimate is obtained using the speech presence probability conditioned on both the current observation and the speech activity decision in the previous frame. Experimental results show that compared to the conventional MCRA method the proposed MCRA technique based on conditional MAP obtains low estimation error and when integrated into a speech enhancement system achieves improved speech quality. 010 Elsevier Inc. All rights reserved. 1. Introduction The robustness of a speech enhancement system is significantly affected by the capability to reliably track noise statistics under adverse environments involving nonstationary noise, weak speech components and low signal-to-noise ratio (SNR) [1 9]. Recently, there has appeared a new class of speech enhancement technique based on the minimum statistics (MS) which tracks the minima values of a smoothed power estimate of the noisy signal, and multiply the result by a factor that compensates the bias. owever, this noise estimate is sensitive to the minimal tracking which is a primary procedure. And, a computationally efficient minimal tracking scheme is proposed by Doblinger [5]. This method is known to be slow for updating the noise estimate especially in the case of a sudden change in the noise level [6]. Recently, one of the successful noise power estimation is the minima controlled recursive averaging (MCRA) technique, which obtains the noise power estimate by averaging past spectral power values using a smoothing parameter adjusted by the speech presence probability [10 13]. The resultant noise power estimate depending on the probability of speech presence is known to be computationally efficient and robust with respect to noise environments. owever, this method is sensitive to temporal variation since the presence of speech in each subband is inherently estimated by the ratio of the local energy and its minimum of a given frame. Recently, Shin et al. proposed a novel technique to voice activity detection (VAD) considering the inter-frame correlation of voice activity based on the conditional maximum a posteriori (MAP) criterion [9]. The conditional MAP scheme was originally motivated by the observation that the speech presence of a given frame is dependent not only This work was supported by an Inha University Research Grant. * Corresponding author. address: changjh@inha.ac.kr (J.-. Chang) /$ see front matter 010 Elsevier Inc. All rights reserved. doi: /j.dsp

2 J.-M. Kum et al. / Digital Signal Processing 0 (010) on the current observation but also on the voice activity in the previous frame. This method is considered to be efficient by exploiting the inter-frame correlation to enhance the performance of the VAD dramatically [14,15]. In this paper, we propose a MCRA-based noise power estimation incorporating the conditional MAP criterion for speech enhancement. To determine efficiently the speech presence probability of the MCRA method, we consider the inter-frame correlation of speech activity based on the conditional MAP, which depends on both the current observation of a given frame and the speech presence decision in the previous frame. Based on this, we first derive the multiple thresholds for updating the noise power in the MCRA framework and further improve the performance by incorporating the soft decision scheme, which results in time varying thresholds. The proposed technique is substantially adopted for a speech enhancement technique and is evaluated with objective and subjective quality test experiments under various noise conditions.. Review of minima controlled recursive averaging technique Let y(n) denote a noisy speech signal that is the sum of a clean speech signal x(n) and an uncorrelated additive noise signal d(n); y(n) = x(n) + d(n). Applying a discrete Fourier transform (DFT), we have in the time-frequency domain Y (k,l) = X(k,l) + D(k,l) (1) where k is the frequency bin and l is the frame index, respectively. Given two hypotheses, 0 (k,l) and 1 (k,l), which respectively indicate speech absence and presence, it is assumed that 0 (k,l): 1 (k,l): Y (k,l) = D(k,l), Y (k,l) = X(k,l) + D(k,l). () Letting λ d (k,l) = E[ D(k,l) ] be the variance of noise, the MCRA method obtains the noise power estimate by applying a temporal recursive smoothing to the noise measurement during periods of speech absence as follows: 0 (k,l): ˆλ d (k,l + 1) = α d ˆλ d (k,l) + (1 α d ) Y (k,l), (k,l): ˆλ d (k,l + 1) = ˆλ d (k,l) (3) where α d (0 < α d < 1) is a smoothing parameter. Also, 0 and designate hypothetical speech absence and presence, respectively. In [10], a clear distinction is made between the hypotheses in (), which is used for estimation of clean speech, and the hypotheses in (3), which controls adaptation of the noise statistics. In other words, deciding speech is absent ( 0 ) when speech is present ( 1 ) is more destructive when estimating the signal than when estimation noise power spectrum. Therefore, different decision rules are employed, in that 1 is chosen to get a higher confidence than i.e., P( 1 (k,l) Y (k,l)) P( (k,l) Y (k,l)) [10], where P( (k,l) Y (k,l)) denote the a posteriori probability of speech presence. Then, (3) implies ˆλ d (k,l + 1) = ˆλ ( d (k,l)p (k,l) Y (k,l)) + [ α d ˆλ d (k,l) + (1 α d ) Y (k,l) ]( ( 1 P 1 (k,l) Y (k,l))) = ˆα d (k,l)ˆλ d (k,l) + [ 1 ˆα d (k,l) ] Y (k,l) (4) where ˆα d is a time-varying smoothing parameter that is adjusted by the speech presence probability as follows: ˆα d (k,l) = α d + (1 α d )P ( (k,l) Y (k,l)). (5) The conditional speech presence probability, P( (k,l) Y (k,l)), iscalculated ˆP ( { (k,l) Y (k,l)) αp ˆP( 1 = (k,l 1) Y (k,l 1)) + (1 α p), if S r (k,l)>δ α p ˆP( 1 (k,l 1) Y (k,l 1)), (6) otherwise where α p (0 < α p < 1) is a smoothing parameter and δ is a probability threshold of speech signal presence. Also, S r (k,l) S(k,l) S min (k,l) denotes the ratio between the local energy of noisy speech, S(k,l) and its determined minimum, S min(k,l) for the current frame. To obtain S(k, l) recursive averaging is employed such that S(k,l) = ζ s S(k,l 1) + (1 ζ s ) Y (k,l) (7) where ζ s (0 <ζ s < 1) is a smoothing parameter. In addition, to obtain the minimum value S min (k,l) of the current frame, a samplewise comparison of the local energy and the local minimum value of the previous frame is performed [10] such that S min (k,l) = min{s min (k,l 1), S(k,l)}. ere, S r (k,l)-based decision rule is inherently derived from a Bayes minimum-cost decision rule given by P(S r (k,l) 1 (k,l)) P(S r (k,l) 0 (k,l)) 0 α C 10 P((k,l) = 0 ) C 01 P((k,l) = 1 ) (8)

3 1574 J.-M. Kum et al. / Digital Signal Processing 0 (010) where P ((k,l) = 0 ) = (1 P((k,l) = 1 )) is the a priori probability of speech absence and C ij is the cost of deciding when i j. Since (8) is a monotonic function, the decision rule (8) is expressed by S 1 r(k,l) δ like (6), in which δ is a given threshold. 3. Enhanced MCRA based on conditional MAP Intheprevioussection,wenotethatthecrucialparameterintheMCRAmethodisP( (k,l) Y (k,l)), controlling the smoothing parameter for the recursively averaged power spectrum of the noisy signal. owever, P( (k,l) Y (k,l)) does not consider the inter-frame correlation carefully since the conditional probability of speech presence incorporating a Bayes decision rule is simply based on a current observation Y (k,l). Since it is known that speech activities in the adjacent frames have a strong correlation representing P((k,l) = 1 (k,l 1) = 1 )>P((k,l) = 1 ) where (k,l) denotes the correct hypothesis at the kth frequency bin in the lth frame [9,16], we first consider the decision rule conditioned on both the current observation and the speech presence decision in the previous (l 1) frame as follows: P((k,l) = 1 S r (k,l), (k,l 1) = i ) P((k,l) = 0 S r (k,l), (k,l 1) = i ) 0 α, i = 0, 1. (9) Using Bayes rule, this criterion results in the following likelihood ratio test (LRT) which is analogous to (8). P(S r (k,l) (k,l) = 1, (k,l 1) = i ) P(S r (k,l) (k,l) = 0, (k,l 1) = i ) 0 α i, i = 0, 1 (10) where α i = α P((k,l)= 0 (k,l 1)= i ) P((k,l)= 1 (k,l 1)= i ). It is assumed that speech presence of the current frame dominantly determines the distribution of the observed signal in the current frame [9]. Therefore (10) can be simplified as follows: P(S r (k,l) (k,l) = 1 ) P(S r (k,l) (k,l) = 0 ) 0 α i, i = 0, 1. (11) From this, we know that the threshold α 0 (= α P((k,l)= 0 (k,l 1)= 0 ) P((k,l)= 1 (k,l 1)= 0 ) ) is used if absence of the speech signal is detected in the previous frame (i.e., P((k,l 1) = 0 Y (k,l 1)) 1) while the threshold α (= α P((k,l)= 0 (k,l 1)= 1 ) P((k,l)= 1 (k,l 1)= 1 ) ) is adopted otherwise. It should be noted that multiple thresholds can provide more accurate speech presence probabilities producing the improved noise power estimator. Finally, the proposed method combines the above double decision rules using the soft decision scheme as follows: 0 S r (k,l) η 0 (1) where η = P((k,l 1) = 1 Y (k,l 1))g(α 0 ) + (1 P((k,l 1) = 1 Y (k,l 1))g(α ) and g( ) is a mapping function between the LRT in (11) and S r (k,l) in (1). From (1), we can see that α 0 replaces η in the case of speech presence at the previous frame and α becomes more dominant as the speech presence probability of the previous frame decreases. This method can be considered desirable because of taking full considerations on the soft decision scheme. In this regard, it is not difficult to see from Fig. 1 that the speech presence probability seems to be more accurate than the conventional method. Based on this, from Fig. showing a typical example of the estimated noise power contour, we can see that the proposed method offers more accurate noise power estimate than the previous method. As a main application of the proposed technique, we adopt a speech enhancement algorithm based on a minimum mean-square error (MMSE) as follows [1,8]: ˆX(k,l) = G ( ξ(k,l), γ (k,l) ) Y (k,l) (13) where ˆX(k,l) is the estimated clean speech spectrum and G( ) indicates the noise suppression gain. Also, γ (k,l) is the a posteriori SNR and ξ(k,l) is the a priori SNR defined by γ (k,l) Y (k,l) λ d (k,l), ξ(k,l) λ x(k,l) λ d (k,l). ere, the MMSE noise suppression gain is given by (ˆξ(k,l), G ˆγ (k,l) ) ( πν(k,l) = exp ν(k,l) ˆγ (k,l) )[ (1 + ν(k,l) ) I0 ( ν(k,l) ) ( )] ν(k,l) + ν(k,l)i 1 (14) (15) (16)

4 J.-M. Kum et al. / Digital Signal Processing 0 (010) Fig. 1. Comparison of the speech presence probability (k = 5, around 310 z) under F16 noise (SNR = 10 db). (a) Clean speech waveform. (b) Speech presence probability in short-time frames: probability of conventional MCRA (dashed line), probability of proposed algorithm (bold line). Fig.. Comparison of noise power estimation (k = 5, around 310 z) under F16 noise (SNR = 10 db). (a) Actual noise power (solid line), estimated noise power based on MCRA (dashed line), estimated noise power based on proposed algorithm (bold line). (b) Noisy speech. (c) Clean speech.

5 1576 J.-M. Kum et al. / Digital Signal Processing 0 (010) Table 1 Relative estimation error obtained from the MCRA and proposed method. Noise Method SNR (db) White MCRA Proposed Babble MCRA Proposed F16 MCRA Proposed Office MCRA Proposed Street MCRA Proposed in which I 0 and I 1 are the modified Bessel functions of zero and first order. Also, ν(k,l) is defined based on (4), (14) and (15) as given by ν(k,l) = ˆξ(k,l) ˆγ (k,l) 1 + ˆξ(k,l) (17) where ˆξ(k,l) and ˆγ (k,l) is inherently obtained by the estimator ˆλ d (k,l) for λ d (k,l) with the result in (4). 4. Experiments and results The proposed approach was adopted for the MCRA-based speech enhancement method as given in [1,8,10] and was evaluated with a quantitative comparison and subjective quality test experiment under various noise environments. One hundred test phrases from the NTT database, spoken by four male and four female speakers, were used as experimental data. Each phrase consisted of two different meaningful sentences and lasted 8 s. The MCRA-based noise power estimation was performed with a trapezoidal window of length 13 ms for each frame of 10 ms at a sampling frequency of 8 kz. Subsequently, each frame of the windowed signal was transformed to the corresponding spectrum through the 18-point DFT after zero-padding. Three types of noise sources such as white, babble and F16 noises from the NOISEX-9 database were added to the clean speech waveform at SNRs of 5, 10 and 15 db. In all cases, speech enhancement was conducted with the experimentally optimized parameter values according to a variety of the experimental data; α d = 0.95, α p = 0., ζ s = To determine the parameters α 0 and α, we investigated the joint probabilities P((l) = 0, (l 1) = 0 )(= 39.1%), P((l) = 0, (l 1) = 1 )(= 1.%), P((l) = 1, (l 1) = 0 )(= 1.%) and P((l) = 1, (l 1) = 1 )(= 58.5%) based on the speech material which length was 456 s. In this regard, α 0 and α used in the experiment were experimentally chosen to yield the effective conditional MAP such that α 0 = 9 and α = 3.5. Also, we incorporated two different noises such as office and street to demonstrate that the selected parameters are not specially biased to the given noise used for the parameter setting. For the first time, the performance of noise power estimation was evaluated frame by frame based on the normalized relative estimation error ε n defined by [10] ε n = 1 N N 1 l=0 k [ˆλ d (k,l) λ d (k,l)] k λ d (k,l) (18) where λ d (k,l) is the actual noise power estimate directly obtained by the noise signal [10], ˆλ d (k,l) is the estimated noise power by the conventional MCRA method and the proposed method for the number of frames N. Table 1 shows the results of the relative estimation error for the given noise power estimation methods under the various noise conditions. From the results, it can be seen that the proposed scheme gave us an improvement over the previous MCRA approach because, in all of the environmental conditions including the office and street noises, the proposed method gave us much lower estimation error. Especially, it should be noted that the performance gain becomes larger as the SNR gets higher [10]. For the purpose of evaluating the performance of the presented method in terms of speech quality, we first measured a perceptual evaluation of speech quality (PESQ) based on the ITU-T P.86 tests [17]. Even though the PESQ cannot access the quality of the signal including acoustic terminals and background noises, we employed the PESQ due to its widespread use as a decent objective measure of subjective quality in a speech enhancement system [18]. Table presenting the results of the PESQ shows that the proposed conditional MAP-based MCRA approach outperformed the original MCRA-based scheme under the given noise conditions, where we can see that incorporation of the conditional MAP definitely has a positive effective in terms of the objective measure of subjective quality. Secondly, to evaluate the subjective quality of the proposed scheme, we carried out a set of informal tests under the same noise conditions. Subjective opinions were decided by a group of 0 listeners; each listener gave a score for each test sentence: 5 (Excellent), 4 (Good), 3 (Fair), (Poor), 1 (Bad). All listener scores were then averaged to yield a mean opinion score (MOS) [8]. The MOS test results are summarized in Table 3, in which a higher value indicates preference. Table 3

6 J.-M. Kum et al. / Digital Signal Processing 0 (010) Table PESQ scores of the MCRA and proposed method. Noise Method SNR (db) White MCRA Proposed Babble MCRA Proposed F16 MCRA Proposed Office MCRA Proposed Street MCRA Proposed Table 3 MOS of the MCRA and proposed method. Noise Method SNR (db) White MCRA Proposed Babble MCRA Proposed F16 MCRA Proposed Office MCRA Proposed Street MCRA Proposed demonstrates that the proposed approach improved over the conventional MCRA method under the given conditions. It is noted that performance improvement was found to be greater for the F16 noise case at all SNRs. This phenomenon could be attributed to the fact that the speech presence probability can be effectively estimated since the F16 noise spectra are dominantly located in the low frequency range. These results confirm that the proposed conditional MAP scheme substantially improves the MCRA method in speech quality enhancement. 5. Conclusions In this paper, we have proposed a novel approach to incorporate the conditional MAP into the conventional MCRA-based noise power estimation scheme for speech enhancement. The noise power estimate was given by the recursive smoothing parameter incorporating the speech absence probability conditioned on both the current observation and the voice activity decision in the previous frame. Compared to the conventional MCRA method, it has been demonstrated that the proposed technique provides better noise power estimates in the speech enhancement systems. References [1] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-3 (6) (Dec. 1984) [] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-3 () (Apr. 1985) [3] S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. ASSP-7 () (Apr. 1979) [4] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9 (5) (Jul. 001) [5] G. Doblinger, Computationally efficient speech enhancement by spectral minima tracking in subbands, in: Proc. EUROSPEEC, Madrid, Spain, Sept. 1995, pp [6] J. Meyer, K.U. Simmer, K.D. Kammeyer, Comparison of one- and two-channel noise-estimation techniques, in: Proc. IWAENC, London, UK, Sept. 1997, pp [7] I. Cohen, B. Berdugo, Speech enhancement for non-stationary noise environments, Signal Process. 81 (Nov. 001) [8] N.S. Kim, J.-. Chang, Spectral enhancement based on global soft decision, IEEE Signal Process. Lett. 7 (5) (May 000) [9] J.W. Shin,.J. Kwon, S.. Jin, N.S. Kim, Voice activity detection based on conditional MAP criterion, IEEE Signal Process. Lett. 15 (Feb. 008) [10] I. Cohen, B. Berdugo, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Process. Lett. 9 (1) (Jan. 00) [11] I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process. 11 (5) (Sept. 003) [1] V. Stouten,.V. amme, P. Wambacq, Application of minimum statistics and minima controlled recursive averaging methods to estimate a spectral noise model for robust ASR, in: Proc. ICASSP, Toulouse, France, May 006, pp

7 1578 J.-M. Kum et al. / Digital Signal Processing 0 (010) [13] N. Fan, J. Rosca, R. Balan, Speech noise estimation using enhanced minima controlled recursive averaging, in: Proc. ICASSP, onolulu, awaii, USA, Apr. 007, pp [14] Y.D. Cho, K. Al-Naimi, A. Kondoz, Improved voice activity detection based on a smoothed statistical likelihood ratio, in: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol., Salt Lake City, UT, May 001, pp [15] A. Davis, S. Nordholm, R. Togneri, Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold, IEEE Trans. Audio Speech Lang. Process. 14 () (Mar. 006) [16] J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection, IEEE Signal Process. Lett. 6 (1) (Jan. 1999) 1 3. [17] ITU-T P.86, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, 001. [18] J.-. Chang, Q.-. Jo, D.K. Kim, N.S. Kim, Global soft decision employing support vector machine for speech enhancement, IEEE Signal Process. Lett. 16 (1) (Jan. 009) Jong-Mo Kum received B.S. degree in electronic engineering from Inha University, Incheon, Korea, in 008. e is now pursuing toward M.S. degree at the same university. Yun-Sik Park received B.S. and M.S. degrees in electronics engineering from Inha Univeristy, Incheon, Korea, in 006 and 008, respectively. e is now pursuing toward Ph.D. degree at the same university. Joon-yuk Chang received B.S. degree in electronics engineering from Kyungpook National University, Daegu, in 1998 and M.S. and Ph.D. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 000 and 004, respectively. From March 000 to April 005, he was with Netdus Corp., Seoul, as a Chief Engineer. From May 004 to April 005, he was with the University of California, Santa Barbara, in a postdoctoral position to work on adaptive signal processing and audio coding. In May 005, he joined Korea Institute of Science and Technology, Seoul, as a Research Scientist to work on speech recognition. Currently, he is an assistant professor with the Faculty of Electronic Engineering, Inha University, Incheon, Korea. is research interests are in speech coding, speech enhancement, speech recognition, audio coding, and adaptive signal processing.

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

Improved noise power spectral density tracking by a MAP-based postprocessor

Improved noise power spectral density tracking by a MAP-based postprocessor Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer

More information

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,

More information

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper

More information

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,

More information

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information

A Priori SNR Estimation Using a Generalized Decision Directed Approach

A Priori SNR Estimation Using a Generalized Decision Directed Approach A Priori SNR Estimation Using a Generalized Decision Directed Approach Aleksej Chinaev, Reinhold Haeb-Umbach Department of Communications Engineering, Paderborn University, 3398 Paderborn, Germany {chinaev,haeb}@nt.uni-paderborn.de

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,

More information

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach Department of Communications

More information

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics

More information

A priori SNR estimation and noise estimation for speech enhancement

A priori SNR estimation and noise estimation for speech enhancement Yao et al. EURASIP Journal on Advances in Signal Processing (2016) 2016:101 DOI 10.1186/s13634-016-0398-z EURASIP Journal on Advances in Signal Processing RESEARCH A priori SNR estimation and noise estimation

More information

Modeling speech signals in the time frequency domain using GARCH

Modeling speech signals in the time frequency domain using GARCH Signal Processing () 53 59 Fast communication Modeling speech signals in the time frequency domain using GARCH Israel Cohen Department of Electrical Engineering, Technion Israel Institute of Technology,

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Proc. of the 1 th Int. Conference on Digital Audio Effects (DAFx-9), Como, Italy, September 1-4, 9 A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Matteo

More information

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Enhancement of Noisy Speech. State-of-the-Art and Perspectives Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction

More information

SINGLE-CHANNEL speech enhancement methods based

SINGLE-CHANNEL speech enhancement methods based IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 1741 Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors Jan S.

More information

A Priori SNR Estimation Using Weibull Mixture Model

A Priori SNR Estimation Using Weibull Mixture Model A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University

More information

Voice activity detection based on conjugate subspace matching pursuit and likelihood ratio test

Voice activity detection based on conjugate subspace matching pursuit and likelihood ratio test Deng and Han EURASIP Journal on Audio, Speech, and Music Processing, : http://asmp.eurasipjournals.com/content/// RESEARCH Open Access Voice activity detection based on conjugate subspace matching pursuit

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofiène Affes Abstract We propose a second-order-statistics-based

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

A Survey on Voice Activity Detection Methods

A Survey on Voice Activity Detection Methods e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 668-675 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com A Survey on Voice Activity Detection Methods Shabeeba T. K. 1, Anand Pavithran 2

More information

NOISE reduction is an important fundamental signal

NOISE reduction is an important fundamental signal 1526 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Non-Causal Time-Domain Filters for Single-Channel Noise Reduction Jesper Rindom Jensen, Student Member, IEEE,

More information

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline

More information

VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS

VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS 2014 IEEE 28-th Convention of Electrical and Electronics Engineers in Israel VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS Oren Rosen, Saman Mousazadeh

More information

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,

More information

Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data

Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan Ming Hsieh Department of Electrical Engineering University

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

Recent Advancements in Speech Enhancement

Recent Advancements in Speech Enhancement Recent Advancements in Speech Enhancement Yariv Ephraim and Israel Cohen 1 May 17, 2004 Abstract Speech enhancement is a long standing problem with numerous applications ranging from hearing aids, to coding

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Acoustic Source

More information

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

SPEECH enhancement algorithms are often used in communication

SPEECH enhancement algorithms are often used in communication IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 25, NO. 2, FEBRUARY 2017 397 An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation Robert Rehr, Student

More information

Robust Sound Event Detection in Continuous Audio Environments

Robust Sound Event Detection in Continuous Audio Environments Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University

More information

SNR Features for Automatic Speech Recognition

SNR Features for Automatic Speech Recognition SNR Features for Automatic Speech Recognition Philip N. Garner Idiap Research Institute Martigny, Switzerland pgarner@idiap.ch Abstract When combined with cepstral normalisation techniques, the features

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract

More information

A Priori SNR Estimation Using Weibull Mixture Model. Abstract. 2 MAP estimation of the a priori SNR based on Weibull mixture models.

A Priori SNR Estimation Using Weibull Mixture Model. Abstract. 2 MAP estimation of the a priori SNR based on Weibull mixture models. A Priori SNR Estimation Using Weibull ixture odel Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering, Paderborn University, 33100 Paderborn, Germany Email:

More information

IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN USING ADAPTIVE BLOCK LMS FILTER

IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN USING ADAPTIVE BLOCK LMS FILTER SANJAY KUMAR GUPTA* et al. ISSN: 50 3676 [IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY Volume-, Issue-3, 498 50 IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN

More information

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540 REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com

More information

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

A geometric approach to spectral subtraction

A geometric approach to spectral subtraction Available online at www.sciencedirect.com Speech Communication 5 (8) 53 66 www.elsevier.com/locate/specom A geometric approach to spectral subtraction Yang Lu, Philipos C. Loizou * Department of Electrical

More information

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Seema Sud 1 1 The Aerospace Corporation, 4851 Stonecroft Blvd. Chantilly, VA 20151 Abstract

More information

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha

More information

Efficient Use Of Sparse Adaptive Filters

Efficient Use Of Sparse Adaptive Filters Efficient Use Of Sparse Adaptive Filters Andy W.H. Khong and Patrick A. Naylor Department of Electrical and Electronic Engineering, Imperial College ondon Email: {andy.khong, p.naylor}@imperial.ac.uk Abstract

More information

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR

More information

Hidden Markov Model Based Robust Speech Recognition

Hidden Markov Model Based Robust Speech Recognition Hidden Markov Model Based Robust Speech Recognition Vikas Mulik * Vikram Mane Imran Jamadar JCEM,K.M.Gad,E&Tc,&Shivaji University, ADCET,ASHTA,E&Tc&Shivaji university ADCET,ASHTA,Automobile&Shivaji Abstract

More information

AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS. Sai Han and Tim Fingscheidt

AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS. Sai Han and Tim Fingscheidt AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS Sai Han and Tim Fingscheidt Institute for Communications Technology, Technische Universität Braunschweig Schleinitzstr.

More information

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Determining the Optimal Decision Delay Parameter for a Linear Equalizer

Determining the Optimal Decision Delay Parameter for a Linear Equalizer International Journal of Automation and Computing 1 (2005) 20-24 Determining the Optimal Decision Delay Parameter for a Linear Equalizer Eng Siong Chng School of Computer Engineering, Nanyang Technological

More information

Soft-Output Trellis Waveform Coding

Soft-Output Trellis Waveform Coding Soft-Output Trellis Waveform Coding Tariq Haddad and Abbas Yongaçoḡlu School of Information Technology and Engineering, University of Ottawa Ottawa, Ontario, K1N 6N5, Canada Fax: +1 (613) 562 5175 thaddad@site.uottawa.ca

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016

502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016 502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016 Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement Hanwook Chung, Student Member, IEEE, Eric Plourde,

More information

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach Jesper Jensen Abstract In this work we consider the problem of feature enhancement for noise-robust

More information

MODERN video coding standards, such as H.263, H.264,

MODERN video coding standards, such as H.263, H.264, 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006 Analysis of Multihypothesis Motion Compensated Prediction (MHMCP) for Robust Visual Communication Wei-Ying

More information

Adaptive Filter Theory

Adaptive Filter Theory 0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao ISSN: 77-3754 International Journal of Engineering and Innovative echnology (IJEI Volume 1, Issue, February 1 A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation

More information

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki

More information

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization Nasser Mohammadiha*, Student Member, IEEE, Paris Smaragdis,

More information

CURRENT state-of-the-art automatic speech recognition

CURRENT state-of-the-art automatic speech recognition 1850 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 Switching Linear Dynamical Systems for Noise Robust Speech Recognition Bertrand Mesot and David Barber Abstract

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

IN THE FIELD of array signal processing, a class of

IN THE FIELD of array signal processing, a class of 960 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 4, APRIL 1997 Distributed Source Modeling Direction-of-Arrival Estimation Techniques Yong Up Lee, Jinho Choi, Member, IEEE, Iickho Song, Senior

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

A Brief Survey of Speech Enhancement 1

A Brief Survey of Speech Enhancement 1 A Brief Survey of Speech Enhancement 1 Yariv Ephraim, Hanoch Lev-Ari and William J.J. Roberts 2 August 2, 2003 Abstract We present a brief overview of the speech enhancement problem for wide-band noise

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

Asymptotic Analysis of the Generalized Coherence Estimate

Asymptotic Analysis of the Generalized Coherence Estimate IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2001 45 Asymptotic Analysis of the Generalized Coherence Estimate Axel Clausen, Member, IEEE, and Douglas Cochran, Senior Member, IEEE Abstract

More information

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence

More information

Application of the Tuned Kalman Filter in Speech Enhancement

Application of the Tuned Kalman Filter in Speech Enhancement Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India

More information

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach Noise Reduction Two Stage Mel-Warped Weiner Filter Approach Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

Sound Source Tracking Using Microphone Arrays

Sound Source Tracking Using Microphone Arrays Sound Source Tracking Using Microphone Arrays WANG PENG and WEE SER Center for Signal Processing School of Electrical & Electronic Engineering Nanayang Technological Univerisy SINGAPORE, 639798 Abstract:

More information

Short-Time ICA for Blind Separation of Noisy Speech

Short-Time ICA for Blind Separation of Noisy Speech Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk

More information

SC-Fano Decoding of Polar Codes

SC-Fano Decoding of Polar Codes SC-Fano Decoding of Polar Codes Min-Oh Jeong and Song-Nam Hong Ajou University, Suwon, Korea, email: {jmo0802, snhong}@ajou.ac.kr arxiv:1901.06791v1 [eess.sp] 21 Jan 2019 Abstract In this paper, we present

More information

MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT. Ulrik Kjems and Jesper Jensen

MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT. Ulrik Kjems and Jesper Jensen 20th European Signal Processing Conference (EUSIPCO 202) Bucharest, Romania, August 27-3, 202 MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT Ulrik Kjems

More information

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill

More information

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline

More information

Exploring Non-linear Transformations for an Entropybased Voice Activity Detector

Exploring Non-linear Transformations for an Entropybased Voice Activity Detector Exploring Non-linear Transformations for an Entropybased Voice Activity Detector Jordi Solé-Casals, Pere Martí-Puig, Ramon Reig-Bolaño Digital Technologies Group, University of Vic, Sagrada Família 7,

More information

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University EDICS category SP 1 A Subspace Approach to Estimation of Autoregressive Parameters From Noisy Measurements 1 Carlos E Davila Electrical Engineering Department, Southern Methodist University Dallas, Texas

More information

A Fast Algorithm for. Nonstationary Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

A Fast Algorithm for. Nonstationary Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong A Fast Algorithm for Nonstationary Delay Estimation H. C. So Department of Electronic Engineering, City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong Email : hcso@ee.cityu.edu.hk June 19,

More information

Adaptive beamforming for uniform linear arrays with unknown mutual coupling. IEEE Antennas and Wireless Propagation Letters.

Adaptive beamforming for uniform linear arrays with unknown mutual coupling. IEEE Antennas and Wireless Propagation Letters. Title Adaptive beamforming for uniform linear arrays with unknown mutual coupling Author(s) Liao, B; Chan, SC Citation IEEE Antennas And Wireless Propagation Letters, 2012, v. 11, p. 464-467 Issued Date

More information

Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum

Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum Downloaded from vbn.aau.dk on: marts 31, 2019 Aalborg Universitet Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum Karimian-Azari, Sam; Mohammadiha, Nasser; Jensen, Jesper

More information

Acomplex-valued harmonic with a time-varying phase is a

Acomplex-valued harmonic with a time-varying phase is a IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 46, NO. 9, SEPTEMBER 1998 2315 Instantaneous Frequency Estimation Using the Wigner Distribution with Varying and Data-Driven Window Length Vladimir Katkovnik,

More information

1 Introduction 1 INTRODUCTION 1

1 Introduction 1 INTRODUCTION 1 1 INTRODUCTION 1 Audio Denoising by Time-Frequency Block Thresholding Guoshen Yu, Stéphane Mallat and Emmanuel Bacry CMAP, Ecole Polytechnique, 91128 Palaiseau, France March 27 Abstract For audio denoising,

More information