Overview of Single Channel Noise Suppression Algorithms

Size: px
Start display at page:

Download "Overview of Single Channel Noise Suppression Algorithms"

Transcription

1 Overview of Single Channel Noise Suppression Algorithms Matías Zañartu Salas Post Doctoral Research Associate, Purdue University October 4, 2010 General notes Only single-channel speech enhancement schemes are reviewed in the present outline. A complete discussion of single and multichannel algorithms will be included in the first trimester report. Signal model and notation: y[n] = x[n] + v[n], which in vector form y k = x k + v k allows computing STFT Y (ω) = X(ω)+V (ω), where ω is the DFT frequency bin and k is the frame time index, (sometimes the notation ω k is used). For each frequency, the amplitude of the signal spectra are denoted Y k, X k, and V k. Noise and speech are assumed to be zero mean and uncorrelated. Frequency bins are generally considered independent, yet assigning joint distributions to them is possible. No effects of reverberation are considered in this initial review. Phase is generally not recovered and it is assumed to follow that of the corrupted speech. Perceptually, this is OK for SNR>5 db [10, 1]. Effects of phase distortion in ASR remain unknown. Most techniques use non-overlapping frames of short duration: win = 4-40 ms (most use ms). Thus, for fs = Hz a window length in samples is m = samples. The following classification of single channel noise suppression algorithms is an extension of that proposed in [1]. Notes: algorithm proposed for acoustic evaluation, no matlab code available from [10]. A Filtering schemes All of the following algorithms can be represented by a linear (but possibly non-causal) transfer function H(ω), normally referred to as the gain function, such that ˆX(ω) = H(ω)Y (ω). These methods are highly dependent on the ability to estimate the time-varying noise component and speech presence estimation. 1

2 1. Spectral subtraction: Based on heuristic principles, such as ˆX(ω) = Y (ω) ˆV (ω) if this difference is positive and ˆX(ω) = 0 otherwise. Thus, H(ω) = (1 ˆV (ω) / Y (ω) ). Overall complexity: O(mlogm). The substraction is normally made frequency by frequency. Imposing X(ω)=0 along with and sharp frame-to-frame differences in the frequency estimates create musical noise. Some techniques to reduce the musical noise are: (a) Alternative forms: ˆX(ω) p = Y (ω) p ˆV (ω) p. Cross terms are ignored, which introduces low-frequency distortions. Schemes to retrieve cross-terms normally require further statistical assumptions and yield other types of distortions. (b) Undersubtraction: provides a more gradual attenuation by using ˆX(ω) = 1/2 Y (ω) 1/2( Y (ω) 2 ˆV (ω) 2 ) 1/2. Less noise removal overall but minor distortion. (c) Smoothing: Averages estimates of ˆX from neighboring frames to reduce the spectral changes between them. In spectral subtraction, this method can introduce delays. Thus, it is preferred in the context of Wiener filter, where such delays are better handled. (d) Oversubtraction: Increases the amount of noise to be subtracted while defining a noise floor larger than 0. That is: ˆX(ω) 2 = Y (ω) 2 α ˆV (ω) 2 if Y (ω) 2 > (α + β) ˆV (ω) 2 and ˆX(ω) 2 = β ˆV (ω) 2 otherwise, and where α 1 and 0 < β 1. Less noise removal during silent portions but higher suppression during voiced ones. (e) Nonlinear: Similar to oversubtraction but using frequency dependent terms for α(ω) and smoothed estimates of noise and speech, thus improving the substraction of colored noise. (f) Multiband: Makes use of bands instead of single frequencies to reduce sharp spectral variations between frames. The filter-bank can be set linearly or on a MEL frequency scale. 2. Wiener filtering: Based on an optimal solution that minimizes the MSE and yields maximum reduction in terms of the noise reduction factor. This could be considered the theoretical gold standard of noise suppression, if the noise and SNR estimates are perfect. In time domain (assuming full rank) h 0 = Ry 1r yy Ry 1r vv = [ ] Ry 1r vv = h 1 Ry 1r vv. When applied, this filter yields the unaffected observation minus the ideal noise. In frequency domain, H 0 (ω) = P xy (ω)/p yy (ω) = P xx (ω)/(p xx (ω)+p vv (ω)), where both P xx (ω) and P vv (ω) need to be estimated. Alternatively, it can be noted that H 0 (ω) = 1/(1 + 1/ξ k (ω)), where ξ k (ω) = P xx (ω)/p vv (ω) is the a priori SNR. Overall complexity: O(m 2 ) in time domain and O(mlogm) in frequency domain. Musical noise is also present as in spectral subtraction and the following methods are used to reduce it: (a) 1-frame smoothing: The goal is to smooth sharp frame transitions that affect P xx (ω). Thus, the estimate for the k-th frame ˆP xx (ω,k) = (1 α) P xx (ω,k)+α ˆP xx (ω,k 1), with P xx can be obtained from spectral substraction or from noisy observation and α ǫ [0,1]. The process can be repeated iteratively if needed. Tradeoff: musical noise vs. onset distortion. (b) Gain-adaptive smoothing: As in the 1-frame smoothing but using a time-varying α that follows spectral transients. Underlying assumption: noise in regions of rapid spectral changes is easily masked. Thus, the goal is to apply less smoothing for transients and significant smoothing for stationary regions. Then, α k = f(1 2(Ŷk Ȳk)), where 2

3 f(x) = 1 if x 1, f(x) = 0 if x 0, and f(x) = x if 0 < x < 1. Ŷ k is a mean spectral distortion measure, i.e., Ŷk = [1/π π 0 Y k(ω) 2 Y k 1 (ω) 2 dω] 1/2. Ȳ k is the mean Ŷk in a noisy segment. Best results were obtained using a short frame size (4 ms) to better capture the desired transients [11]. (c) Suboptimal design: Simple tradeoff between suppression and distortion with a single coefficient α. Better understood in time domain h 0 = h 1 αry 1r vv, with α ǫ[0,1]. That is, the unaffected observation minus a non-ideal noise. (d) Mean adaptive: Based on an adaptive scheme used in image processing where the signal is modeled as a Gaussian random process represented by the sum of its mean and variance. Both quantities are computed and updated online. The technique is widely used in image processing but seldom used in speech. Thus, little information is available regarding its performance. (e) Multiband: see description for spectral subtraction. 3. ESTI scheme : Two-stage Wiener filter scheme combining schemes described above. Filter is estimated in frequency domain, where a 1-frame smoothing is used along with MELscaled based multiband approach. The filter magnitude is decomposed with the log scale filter-banks and its linear coefficients retrieved using a MEL-warped inverse discrete cosine transformation. These coefficients are used to filter the signal. After filtering the process is repeated once (second stage), but using an additional adaptive gain that is increased if the input contains noise only (over-suppression). The ESTI standard contains additional features for ASR (e.g., cepstral coefficient computation and enhancement) which, as a total, provide a notable increase in WER improvement (10 % lower than the best possible solution [4]). However, its noise suppression module has not been evaluated separately. The scheme handles 4 frames of 5 ms at the time for smoothing processing. Overall complexity: O(mlogm). Complete details for implementation can be found in [7]. 4. Subspace methods : Similar to the Wiener filter but with a constrained MMSE optimization, i.e., since ˆx k = Hy k, then e k = ˆx k x k = (H I)x k + Hv k = e x + e v. Thus, each noise is treated independently. Wiener: minimizing e k, Subspace: minimizing e x while limiting the noise residual e v. The constrained optimization yields a solution that uses SVD, i.e., H 0 = B T Λ(Λ + µi)b T, where B is an invertible matrix such that R v = BB T, R y = B T (Λ + I)B T, and Λ=diag(λ 1,λ 2,...,λ L ). Overall complexity: O(m 2 ) but can be reduced with iterative implementations. This approach is gaining more attention in recent years (particularly in cochlear implant applications). Its performance is variable, but can be considered comparable with that of the MMSE-LSA estimator. B Statistical spectral estimation schemes These algorithms generally estimate spectral amplitude and assume that the phase follows that of the noisy observation (which was shown to be an optimal estimate [5, 1]). The estimation is performed in the frequency domain for each frequency bin. All schemes assume probability density functions (pdfs) for the speech and noise and search for estimates that minimize certain distortion measure via some optimization algorithm. The selection of the distortion measures, pdfs, and optimization algorithm constitute the main differences between these algorithms. Unfortunately, pdf 3

4 assumptions can vary from case to case and simplifying assumptions are needed to have trackable mathematical expressions. Under simplifying assumptions, all schemes in this section yield (generally nonlinear) filter gains. The overall asymptotic complexity of these schemes is O(mlogm), yet their implementation may not always be trivial due to the presence of nonlinear terms. 1. MMSE estimator : this approach that estimates real and imaginary components of the signal spectrum (which would allow estimating both phase and amplitude). Thus, X(ω) = X R (ω)+x I (ω), where ˆX MMSE = E[X(ω) Y (ω)] = E[X R (ω) Y (ω)]+e[x I (ω) Y (ω)]. Assuming Gaussian pdfs for both noise and speech the MMSE estimate becomes the Wiener filter. Gaussian assumption is generally not OK for short term speech signals (<40 ms), for which either Laplacian or Gamma distributions are used. Expression for H MMSE (ω) can be found in all cases, being nonlinear for the latter pdfs. This method is not widely used since estimating only the amplitude is more efficient and yields the same results. 2. MMSE-SA estimator: [5] Assuming that Y (ω k ) = Y k e iθ Y k and X(ω k ) = X k e iθ X k, then ˆX k,mmse = 0 X k p[x k Y (ω k )]dx k. In other terms, ˆXk,MMSE minimizes the distortion measure d = E[ X k ˆX k 2 ] given the noisy observation y[n]. Assuming Gaussian pdfs, it is shown that ˆθ Xk = θ Yk and that the amplitude estimator is a function of the a priori SNR (ξ k = σx/σ 2 v 2 = P xx (ω k )/P vv (ω k )) and a posteriori SNR (γ k = Yk 2/σ2 v). This feature allows the filter to increase its noise suppression in terms of the instantaneous SNR, which suppresses more residual noise. Even under the Gaussian pdf assumption, the expression H MMSE SA (ξ k,γ k ) is highly nonlinear. However, under high SNR conditions, the MMSE-SA estimator converges to the Wiener filter. 3. MMSE-LSA estimator : (a.k.a. Log-MMSE) [6]. This estimator is almost identical in nature to the MMSE-SA, but uses a different distortion measure, such that ˆX k,mmse LSA minimizes the distortion measure d = E[ log(x k ) log( ˆX k ) 2 ] given the noisy observation y[n]. This is shown to be ˆX k,mmse LSA = exp(e[ln(x k ) Y k ]). This estimator can only be solved assuming Gaussian pdfs, and yields a H MMSE LSA that also depends on a priori and a posteriori SNRs. However, the H MMSE LSA provides further reduction, particularly when the instantaneous SNR is low. This yields a much lower noise residual than the MMSE-SA estimator with minor speech distortion. 4. Optimally-modified-MMSE-LSA estimator : It follows the same principles as in MMSE-LSA but using smoothing techniques for noise estimation and includes speech presence probability in subbands. 5. ML-A estimator : Maximum likelihood estimator is attractive due to asymptotical optimal propriety. ˆXk,MLA = arg max Xk {ln(p[y (ω k ) X k ])}. Assuming Gaussian pdfs, this estimate has a simple gain function H MLA (ω k ) = (1 + {(Yk 2 σ2 v )/Y k 2}1/2 )/2. Simplicity of the filter is the advantage of this approach. Little testing has been though with it, though. 6. MAP-A estimator : Maximum a posteriori estimator is similar to ML-A. ˆXk,MAP A = arg max Xk {ln(p[x k Y (ω k )])}. Assuming Gaussian pdfs, this estimate yields a simple gain function that is a function of the a priori (ξ k ) and a posteriori (γ k ) SNRs. Thus, H MAP A (ω k ) = (ξ k +{ξk 2+(1+ξ k)ξ k /γ k } 1/2 /{2(1+ξ k )}. The simple, closed-form of the filter, and its dependency on the a priori and a posteriori SNRs is the advantage of this approach. 4

5 The performance of this scheme has been shown to be almost the same as that of MMSE-SA [10]. 7. Perceptually-motivated Bayesian estimators: These schemes modify the distortion measure to introduce perceptually-based ideas. The distortion measures that emphasize the spectral valleys more than the spectral peaks were the ones that outperformed the MMSE- SA and MMSE-LSA, in terms of better noise residual and less speech distortion [10]. These algorithms assume Gaussian pdfs for simplicity. The selected distortion measures are: (a) Weighted Euclidian: The proposed distortion measure is d WE = X p k (X k + ˆX k ) 2. The Gain function is a function of the a priori and a posteriori and its highly nonlinear. Best results were observed when p=-1 [10]. (b) Weighted Cosh : The proposed distortion measure is d WCOSH = [X k / ˆX k + ˆX k /X k 1]X p k. The Gain function is a function of the a priori and a posteriori and its highly nonlinear. Best results were observed when p=-0.5, outperforming those from MMSE- SA, MMSE-LSA, and the weighted Euclidian distortion measure [10]. C Model-based schemes 1. Harmonic : Retrieves harmonic structure of voiced speech by using a comb filter such as h COMB [n] = N i=0 h iδ(n Ti). Challenge: estimate f 0, spectral slope (h i ), and number of harmonics. Even when performed correctly, it generally introduces distortion in unvoiced portions that is considered worse than musical noise [2]. 2. Linear prediction: Aim to retrieve the AR coefficients. Two main approaches are used, both based on a ML estimate obtained via iterative EM algorithms. Both potentially converge to an optimal estimation in the MMSE sense. (a) Wiener-EM: Assumes Gaussian distributions. E-step: uses a Wiener filter constructed using AR coefficients to estimate the signal. M-step: uses a MAP algorithm based on previous coefficient and clean speech estimates. Overall complexity: O(pm), where p is the AR order. (b) Kalman-EM. E-step uses a Kalman filter from noisy observations. M-step: Solves the Yule-Walker equations but using previous AR coefficients instead of correlation coefficients. ASR was evaluated for this algorithm in terms of WER, it was found to outperform those from Log-MMSE, Wiener-EM, and HMM but not those from Optimallymodified-Log-MMSE. Initial discussion is presented in [9] with further details in [8]. This scheme estimates not only the AR coefficient but also the complete enhanced speech signal and (possibly colored) background noise. This algorithm is the natural extension for a single-channel implementation of our modified Kalman filter we studied last year. Overall complexity: O(pm), where p is the highest AR order between the speech and noise models. 3. HMM : Statistical model that makes use of finite number of states and state transitions to estimate desired signals. It uses same approach as in statistical spectral estimation schemes but estimates specific pdfs from training data. Noise reduction: two HMMs are required 5

6 one for noise and one for speech, both needing training. It uses an EM algorithm during the training and an iterative MAP algorithm via AR-Wiener filter during the estimation. A different HMM enhancement combined ideas of HMM and harmonic model in [3]. Noise reduction was achieved by applying an HMM-based MMSE estimator to find the harmonic sinusoidal model parameters of clean speech from speech corrupted by additive noise. The model is considered to outperform the traditional HMM-based enhancement. HMM overall complexity: O(mK), where K is largest between the total number of Gaussian distributions and the codebook size, count that is generally larger than O(m 3 ). Thus, the scheme is computationally expensive and requires training, which does not appear compatible with a pre-processing, front-end noise suppression scheme for low-power applications. Furthermore, it has been outperformed in SNR and WER by simpler methods (i.e., Kalman-EM) [8]. Key building blocks 1. Voice activity detection (VAD): important component of noise suppression algorithms, as schemes can vary without the presence of the signal. It is also important for beamforming algorithms. It can also be used in silence compression schemes in speech coding (e.g., in a two way conference each participant utters speech about 35% of the time [10]). Overall complexity: between O(m) when no FFT is computed and O(mlogm) otherwise. (a) Heuristic approach : Main trend is based on thresholding of log-energy combined with zero-crossing count. The underlying assumption is that voiced segments have more energy and periodicity, thus the scheme has some problem with low-energy unvoiced consonants. The scheme can be made iterative and adaptive to improve this. Similar heuristic methods follow the same principles but make use of cepstral coefficients and other spectral distance measures. (b) Bayesian VUS (voiced-unvoiced-silence) : statistical scheme using multivariate Gaussian distribution using a vector feature containing five key features (short-time log energy, zero-cross count, normalized autocorrelation coefficient at unit sample delay, first predictor of a pth-order LPC, and normalized energy of a pth-order LPC). The approach requires training, but could be skipped as parameter sets can be taken from a study for English speakers [12]. (c) Model-based VAD : a thresholding scheme based on the likelihood ratio. The decision rule of speech presence (H1 k) is given above the threshold δ = 1/N N 1 k=1 logλ k (otherwise is speech absence: H0 k ). The likelihood ratio is computed assuming Gaussian distribution and simplified to λ k = 1/(1 + ξ k )exp{γ k ξ k /(1 + ξ k )}. Performance is higher than most methods at low SNR conditions (d) Speech presence probability estimation: designed to work with statistical spectral estimation schemes, where an additional speech presence probability P(H k 1 Y k) is used to multiply the gain functions. Expressions depend on each spectral estimation scheme and they are generally a function of the a priori SNR (ξ k ) and a posteriori SNR (γ k ) for each frequency bin. Although less common, they can also be used in a thresholding scheme to define VAD. However, since they are generally based on SNRs (i.e., energy based) they can have problems with low-energy unvoiced speech portions. 6

7 2. Noise estimator: Estimate of σ 2 v = P vv(ω) is fundamental for noise suppression algorithms. Basic approach makes use of VAD to identify silent portions (with only noise) and may use an averaged or recursive to update the estimate. However, this approach does not work for nonstationary noises, where continuous estimation is needed. Assumptions behind continuous noise speech estimator schemes require using long segments that include speech pauses and low energy portions, but being short enough so that noise is still more stationary than speech. These competing assumptions yield tradeoff between stationarity and temporal resolution as a function of the segment size. Most techniques use short overlapping windows of ms along with longer non-overlapping segments of s. Interestingly, preliminary tests by [10] do not show much improvement in objective measures (only tested with some of these estimates) with respect to a simple VAD-based scheme. Overall complexity: O(mlogm). (a) Spectral minimum tracking: Power of noisy speech decays to the power of noise. Thus, tracking minimum levels for each frequency band yields a (biased) estimate of noise. A simple single-frame smoothing is applied to the noisy observation periodogram to enhance the estimates, such as P yy (ω,k) = αp(ω,k 1) + (1 α) Y k (ω) 2. A bias correction (increase the noise floor for each band) can be considered by assuming a Gaussian distribution and observing the variance of the noisy speech. A modified version makes use of a single-sample recursive method (instead of a long segment) and a nonlinear smoothing scheme. Using the same initial smoothing with α, but considering P min (ω,k) = γp min (ω,k 1) + (1 γ)/(1 β){p yy (ω,k) βp yy (ω,k 1)} when P yy (ω,k) > P min (ω,k 1), and P min (ω,k) = P yy (ω,k) otherwise. Typical values for the smoothing parameters are α = 0.7, β = 0.96, and γ = This latter algorithm yields a good performance in a MSE sense when estimating the true background noise [10]. (b) Histogram-based: Most frequent level for each band (within a frame) corresponds to the noise level in that band. It is obtained from the histogram of a smoothed noisy observation P yy (ω) and smoothing its noise estimate using ˆσ 2 v(ω,k) = α mˆσ 2 v(ω,k 1) + (1 α m )h max (ω,k), where h max is the peak of the histogram distribution for the ω frequency bin during the k-th frame, and α m is a smoothing constant. This algorithm yields a consistently good performance in a MSE sense when estimating the true background noise [10]. (c) Time-recursive - SNR dependent: Noise spectrum can be estimated for each frequency with good precision when the a posteriori SNR (γ k ) is low. That means that we can update each frequency band as a function of SNR. This lead to a recursive structure given by ˆσ v (ω,k) = α(ω,k)ˆσ v (ω,k 1) + (1 α(ω,k)) Y k (ω) 2 (**). All the subsequent algorithms in this section (d-f) have the same type of recursion, but propose different methods to compute possible α(ω,k). In the SNR dependent scheme, α(ω,k) = 1 min(1/γ p k,1), among other options. (d) Time-recursive - weighted spectral averaging : based on the same principles as the SNR dependent case, but uses a hard threshold on β = γ k to define whether α(ω,k) needs to be updated (as in the SNR case). Otherwise, ˆσ v (ω,k) = ˆσ v (ω,k 1). Typical values are β = 2.5 and α = 0.9. Updates on this technique represent β as a function of the variance of the noisy observation. This algorithm yields one of the best performances in a MSE sense when estimating the true background noise [10]. 7

8 (e) Signal presence uncertainty - likelihood ratio: Uses the same principles and recursion than the time-recursive scheme. However, the estimation problem can be regarded as updating individual frequency bands of the noise estimate whenever the probability of speech being present is low. Thus, it can be shown that α = P(H 1 Y (ω,k)). Assuming a Gaussian distribution and using the likelihood ratio method, this probability is given by α = rλ/(1 + rλ), where λ = P(Y (ω,k) H 1 )/P(Y (ω,k) H 0 ) is the likelihood ratio, which can be computed, for instance, as log(λ G ) = 1/L L 1 k (γ k log(γ k ) 1). This value of α can then be used in (**) as described in (c). (f) Minima-controlled recursive averaging (MCRA): As in the previous case, this method is based on time-recursive averaging of the signal presence probability. However, it combines it with minimum tracking in the following fashion: The minima of a smoothed version of P yy (ω) is used to obtain a normalized periodogram of the noisy observation, P norm (ω) = P yy (ω)/p min (ω). A threshold is applied to this normalized periodogram to obtain an estimate of ˆp = P(H 1 Y k (ω)), probability that is then smoothed to obtain the final parameter α(ω,k) to be used in (**), as described in (c). Modifications have been proposed to the way the minima is tracked and the ˆp probability computed. However, multiple smoothing parameters are still needed. Note that this method is available via the Intel IPP (integrated performance primitive) function. 3. A priori SNR estimation: The a priori SNR (ξ k = var(x k )/σv) 2 is more difficult to estimate than its counterpart, the a posteriori SNR (γ k = Yk 2/σ2 v). For the latter, a noise estimation scheme can be used along with the periodogram of the noisy observation for each frequency. Overall complexity: O(mlogm). (a) Spectral substraction - Optimal ML: from power spectral substraction ˆX k 2 = Y k 2 σ 2 v and dividing by σ 2 v, then ˆξ k = γ k 1. This is also the optimal ML estimate assuming Gaussian distributions. In practice use ˆξ k = max( γ k 1,0), where γ k is a smoothed version of γ k obtained using a one-frame running average. (b) Decision-directed approach : Originally proposed from the MMSE-SA estimator [5]. It makes use of the estimate ˆX k provided by any desired algorithm by combining ξ k = E[X 2 k ]/σ2 v = E[γ k ] 1, a smoothed version can be constructed such as ˆξ k = a ˆX 2 k 1 /σ2 v,k 1 +(1 a)max(γ k 1,0). A common value for the smoothing constant References is a = Improvements of this scheme considered limiting ˆξ k to ˆξ min = 15 db, and making the smoothing constant a = a(ω,k) time and frequency dependent. [1] J. Chen, J. Benesty, Y. Huang, and E. J. Diethorn. Handbook of Speech Processing, chapter Fundamentals of Noise Reduction, pages Springer-Verlag, Berlin Heidelberg, 1st edition, [2] I. Cohen and S. Gannot. Handbook of Speech Processing, chapter Spectral Enhancement Methods, pages Springer-Verlag, Berlin Heidelberg, 1st edition, [3] Michael E. Deisher and Andreas S. Spanias. Speech enhancement using state-based estimation and sinusoidal modeling. J. Acoust. Soc. Am., 102(2): ,

9 [4] J. Droppo and A. Acero. Handbook of Speech Processing, chapter Environmental Robustness, pages Springer-Verlag, Berlin Heidelberg, 1st edition, [5] Y. Ephraim and D. Malah. Speech enhancement using a- minimum mean-square error shorttime spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process., 32(6): , [6] Y. Ephraim and D. Malah. Speech enhancement using a minimum mean-square error logspectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process., 33(2): , [7] European Telecommunications Standards Institute. Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES V1.1.5 ( ). European Telecommunications Standards Institute, Sophia Antipolis, France, [8] S. Gannot. Speech Enhancement, chapter Speech Enhancement Application of the Kalman Filter in the Estimate-Maximize Framework, pages Springer-Verlag, Berlin Heidelberg, 1st edition, [9] S. Gannot and A. Yeredor. Handbook of Speech Processing, chapter The Kalman Filter, pages Springer-Verlag, Berlin Heidelberg, 1st edition, [10] Philipos C. Loizou. Speech enhancement: theory and practice. CRC Press, Boca Raton, FL, 1st edition, [11] T. F. Quatieri. Discrete-time speech signal processing: Principles and practice. Prentice-Hall signal processing series. Prentice Hall, Upper Saddle River, NJ, [12] L. R. Rabiner and R. W. Schafer. Theory and applications of digital speech processing. Prentice Hall, Upper Saddle River, NJ,

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS 204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,

More information

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper

More information

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Enhancement of Noisy Speech. State-of-the-Art and Perspectives Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction

More information

NOISE reduction is an important fundamental signal

NOISE reduction is an important fundamental signal 1526 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Non-Causal Time-Domain Filters for Single-Channel Noise Reduction Jesper Rindom Jensen, Student Member, IEEE,

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofiène Affes Abstract We propose a second-order-statistics-based

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,

More information

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

Modeling speech signals in the time frequency domain using GARCH

Modeling speech signals in the time frequency domain using GARCH Signal Processing () 53 59 Fast communication Modeling speech signals in the time frequency domain using GARCH Israel Cohen Department of Electrical Engineering, Technion Israel Institute of Technology,

More information

SNR Features for Automatic Speech Recognition

SNR Features for Automatic Speech Recognition SNR Features for Automatic Speech Recognition Philip N. Garner Idiap Research Institute Martigny, Switzerland pgarner@idiap.ch Abstract When combined with cepstral normalisation techniques, the features

More information

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Proc. of the 1 th Int. Conference on Digital Audio Effects (DAFx-9), Como, Italy, September 1-4, 9 A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Matteo

More information

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction

More information

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig

More information

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University EDICS category SP 1 A Subspace Approach to Estimation of Autoregressive Parameters From Noisy Measurements 1 Carlos E Davila Electrical Engineering Department, Southern Methodist University Dallas, Texas

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Improved noise power spectral density tracking by a MAP-based postprocessor

Improved noise power spectral density tracking by a MAP-based postprocessor Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer

More information

Median Filter Based Realizations of the Robust Time-Frequency Distributions

Median Filter Based Realizations of the Robust Time-Frequency Distributions TIME-FREQUENCY SIGNAL ANALYSIS 547 Median Filter Based Realizations of the Robust Time-Frequency Distributions Igor Djurović, Vladimir Katkovnik, LJubiša Stanković Abstract Recently, somenewefficient tools

More information

Acoustic MIMO Signal Processing

Acoustic MIMO Signal Processing Yiteng Huang Jacob Benesty Jingdong Chen Acoustic MIMO Signal Processing With 71 Figures Ö Springer Contents 1 Introduction 1 1.1 Acoustic MIMO Signal Processing 1 1.2 Organization of the Book 4 Part I

More information

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach Noise Reduction Two Stage Mel-Warped Weiner Filter Approach Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum

Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum Downloaded from vbn.aau.dk on: marts 31, 2019 Aalborg Universitet Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum Karimian-Azari, Sam; Mohammadiha, Nasser; Jensen, Jesper

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Application of the Tuned Kalman Filter in Speech Enhancement

Application of the Tuned Kalman Filter in Speech Enhancement Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using

More information

SPEECH ENHANCEMENT USING A LAPLACIAN-BASED MMSE ESTIMATOR OF THE MAGNITUDE SPECTRUM

SPEECH ENHANCEMENT USING A LAPLACIAN-BASED MMSE ESTIMATOR OF THE MAGNITUDE SPECTRUM SPEECH ENHANCEMENT USING A LAPLACIAN-BASED MMSE ESTIMATOR OF THE MAGNITUDE SPECTRUM APPROVED BY SUPERVISORY COMMITTEE: Dr. Philipos C. Loizou, Chair Dr. Mohammad Saquib Dr. Issa M S Panahi Dr. Hlaing Minn

More information

Speaker Tracking and Beamforming

Speaker Tracking and Beamforming Speaker Tracking and Beamforming Dr. John McDonough Spoken Language Systems Saarland University January 13, 2010 Introduction Many problems in science and engineering can be formulated in terms of estimating

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 11, November 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Acoustic Source

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

Single and Multi Channel Feature Enhancement for Distant Speech Recognition Single and Multi Channel Feature Enhancement for Distant Speech Recognition John McDonough (1), Matthias Wölfel (2), Friedrich Faubel (3) (1) (2) (3) Saarland University Spoken Language Systems Overview

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Elec4621 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis

Elec4621 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis Elec461 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis Dr. D. S. Taubman May 3, 011 In this last chapter of your notes, we are interested in the problem of nding the instantaneous

More information

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

Speech Enhancement with Applications in Speech Recognition

Speech Enhancement with Applications in Speech Recognition Speech Enhancement with Applications in Speech Recognition A First Year Report Submitted to the School of Computer Engineering of the Nanyang Technological University by Xiao Xiong for the Confirmation

More information

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline

More information

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Stochastic Processes Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING

MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING Yichuan Hu (), Javier Garcia-Frias () () Dept. of Elec. and Comp. Engineering University of Delaware Newark, DE

More information

BME 50500: Image and Signal Processing in Biomedicine. Lecture 5: Correlation and Power-Spectrum CCNY

BME 50500: Image and Signal Processing in Biomedicine. Lecture 5: Correlation and Power-Spectrum CCNY 1 BME 50500: Image and Signal Processing in Biomedicine Lecture 5: Correlation and Power-Spectrum Lucas C. Parra Biomedical Engineering Department CCNY http://bme.ccny.cuny.edu/faculty/parra/teaching/signal-and-image/

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Bayesian estimation of chaotic signals generated by piecewise-linear maps

Bayesian estimation of chaotic signals generated by piecewise-linear maps Signal Processing 83 2003 659 664 www.elsevier.com/locate/sigpro Short communication Bayesian estimation of chaotic signals generated by piecewise-linear maps Carlos Pantaleon, Luis Vielva, David Luengo,

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM

SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM SELECTIVE ANGLE MEASUREMENTS FOR A 3D-AOA INSTRUMENTAL VARIABLE TMA ALGORITHM Kutluyıl Doğançay Reza Arablouei School of Engineering, University of South Australia, Mawson Lakes, SA 595, Australia ABSTRACT

More information

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline

More information

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,

More information

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach Jesper Jensen Abstract In this work we consider the problem of feature enhancement for noise-robust

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

Time Series: Theory and Methods

Time Series: Theory and Methods Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition With 124 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition vn ix CHAPTER 1 Stationary

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Seema Sud 1 1 The Aerospace Corporation, 4851 Stonecroft Blvd. Chantilly, VA 20151 Abstract

More information

A Priori SNR Estimation Using Weibull Mixture Model

A Priori SNR Estimation Using Weibull Mixture Model A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University

More information

Convolutive Transfer Function Generalized Sidelobe Canceler

Convolutive Transfer Function Generalized Sidelobe Canceler IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. XX, NO. Y, MONTH 8 Convolutive Transfer Function Generalized Sidelobe Canceler Ronen Talmon, Israel Cohen, Senior Member, IEEE, and Sharon

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

On Moving Average Parameter Estimation

On Moving Average Parameter Estimation On Moving Average Parameter Estimation Niclas Sandgren and Petre Stoica Contact information: niclas.sandgren@it.uu.se, tel: +46 8 473392 Abstract Estimation of the autoregressive moving average (ARMA)

More information

SPEECH enhancement has been studied extensively as a

SPEECH enhancement has been studied extensively as a JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)

More information

Efficient Use Of Sparse Adaptive Filters

Efficient Use Of Sparse Adaptive Filters Efficient Use Of Sparse Adaptive Filters Andy W.H. Khong and Patrick A. Naylor Department of Electrical and Electronic Engineering, Imperial College ondon Email: {andy.khong, p.naylor}@imperial.ac.uk Abstract

More information

Optimal Time-Domain Noise Reduction Filters

Optimal Time-Domain Noise Reduction Filters SpringerBriefs in Electrical and Computer Engineering Optimal Time-Domain Noise Reduction Filters A Theoretical Study Bearbeitet von Jacob Benesty, Jingdong Chen 1. Auflage 2011. Taschenbuch. vii, 79 S.

More information

A Brief Survey of Speech Enhancement 1

A Brief Survey of Speech Enhancement 1 A Brief Survey of Speech Enhancement 1 Yariv Ephraim, Hanoch Lev-Ari and William J.J. Roberts 2 August 2, 2003 Abstract We present a brief overview of the speech enhancement problem for wide-band noise

More information

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR Vivek Tyagi a,c, Hervé Bourlard b,c Christian Wellekens a,c a Institute Eurecom, P.O Box: 193, Sophia-Antipolis, France.

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

Detection and Estimation Theory

Detection and Estimation Theory Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,

More information

III.C - Linear Transformations: Optimal Filtering

III.C - Linear Transformations: Optimal Filtering 1 III.C - Linear Transformations: Optimal Filtering FIR Wiener Filter [p. 3] Mean square signal estimation principles [p. 4] Orthogonality principle [p. 7] FIR Wiener filtering concepts [p. 8] Filter coefficients

More information

Time-domain noise reduction based on an orthogonal decomposition for desired signal extraction

Time-domain noise reduction based on an orthogonal decomposition for desired signal extraction Time-domain noise reduction based on an orthogonal decomposition for desired signal extraction Jacob Benesty INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, Suite 6900, Montreal, Quebec H5A

More information

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics

More information

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca

More information

SIMON FRASER UNIVERSITY School of Engineering Science

SIMON FRASER UNIVERSITY School of Engineering Science SIMON FRASER UNIVERSITY School of Engineering Science Course Outline ENSC 810-3 Digital Signal Processing Calendar Description This course covers advanced digital signal processing techniques. The main

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel

More information

SPEECH enhancement algorithms are often used in communication

SPEECH enhancement algorithms are often used in communication IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 25, NO. 2, FEBRUARY 2017 397 An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation Robert Rehr, Student

More information