Modeling speech signals in the time frequency domain using GARCH

Similar documents
New Statistical Model for the Enhancement of Noisy Speech

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

MANY digital speech communication applications, e.g.,

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Modifying Voice Activity Detection in Low SNR by correction factors

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

Digital Signal Processing

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

2D Spectrogram Filter for Single Channel Speech Enhancement

Volatility. Gerald P. Dwyer. February Clemson University

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Improved noise power spectral density tracking by a MAP-based postprocessor

CEPSTRAL analysis has been widely used in signal processing

Proc. of NCC 2010, Chennai, India

Bootstrap tests of multiple inequality restrictions on variance ratios

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Generalized Autoregressive Score Models

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

SINGLE-CHANNEL speech enhancement methods based

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

13. Estimation and Extensions in the ARCH model. MA6622, Ernesto Mordecki, CityU, HK, References for this Lecture:

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

Analytical derivates of the APARCH model

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Recent Advancements in Speech Enhancement

Single Channel Signal Separation Using MAP-based Subspace Decomposition

ECONOMETRIC REVIEWS, 5(1), (1986) MODELING THE PERSISTENCE OF CONDITIONAL VARIANCES: A COMMENT

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

Lecture 8: Multivariate GARCH and Conditional Correlation Models

The Instability of Correlations: Measurement and the Implications for Market Risk

NOISE reduction is an important fundamental signal

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Time Series Models for Measuring Market Risk

Stochastic Volatility Models with Auto-regressive Neural Networks

Introduction to ARMA and GARCH processes

The Unscented Particle Filter

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

GARCH Models Estimation and Inference

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Nonlinear Parameter Estimation for State-Space ARCH Models with Missing Observations

DynamicAsymmetricGARCH

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

A priori SNR estimation and noise estimation for speech enhancement

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation

Application of the Tuned Kalman Filter in Speech Enhancement

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

Empirical properties of large covariance matrices in finance

International Journal of Advanced Research in Computer Science and Software Engineering

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

A Priori SNR Estimation Using Weibull Mixture Model

Multivariate GARCH models.

Testing for Regime Switching: A Comment

A Brief Survey of Speech Enhancement 1

A Heavy-Tailed Distribution for ARCH Residuals with Application to Volatility Prediction

Statistics of Stochastic Processes

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

A Priori SNR Estimation Using a Generalized Decision Directed Approach

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

Independent Component Analysis and Unsupervised Learning

ECON3327: Financial Econometrics, Spring 2016

GARCH processes probabilistic properties (Part 1)

DETECTION theory deals primarily with techniques for

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

DEPARTMENT OF ECONOMICS

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Analysis of audio intercepts: Can we identify and locate the speaker?

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Wavelet Based Image Restoration Using Cross-Band Operators

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Introduction to Econometrics

Convolutive Transfer Function Generalized Sidelobe Canceler

Automatic Speech Recognition (CS753)

Gaussian process for nonstationary time series prediction

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A radial basis function artificial neural network test for ARCH

ASYMPTOTIC NORMALITY OF THE QMLE ESTIMATOR OF ARCH IN THE NONSTATIONARY CASE

SPEECH ENHANCEMENT USING A LAPLACIAN-BASED MMSE ESTIMATOR OF THE MAGNITUDE SPECTRUM

BDS ANFIS ANFIS ARMA

Estimation of Cepstral Coefficients for Robust Speech Recognition

The GARCH Analysis of YU EBAO Annual Yields Weiwei Guo1,a

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

Forecasting exchange rate volatility using conditional variance models selected by information criteria

Voice Activity Detection Using Pitch Feature

Modelling non-stationary variance in EEG time. series by state space GARCH model

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Hidden Markov Models and Gaussian Mixture Models

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Heteroskedasticity in Time Series

Sparse Time-Frequency Transforms and Applications.

Robust Speaker Identification

Transcription:

Signal Processing () 53 59 Fast communication Modeling speech signals in the time frequency domain using GARCH Israel Cohen Department of Electrical Engineering, Technion Israel Institute of Technology, Technion City, Haifai 3, Israel Received July www.elsevier.com/locate/sigpro Abstract In this paper, we introduce a novel modeling approach for speech signals in the short-time Fourier transform (STFT) domain. We define the conditional variance of the STFT expansion coefficients, and model the one-frame-ahead conditional variance as a generalized autoregressive conditional heteroscedasticity (GARCH) process. The proposed approach offers a reasonable model on which to base the estimation of the variances of the STFT expansion coefficients, while taking into consideration their heavy-tailed distribution. r Elsevier B.V. All rights reserved. Keywords: Speech modeling; Time frequency analysis; GARCH 1. Introduction Modeling speech signals in the short-time Fourier transform (STFT) domain is a fundamental problem in designing speech processing systems. The Gaussian model, proposed by Ephraim and Malah [7], describes the individual STFT expansion coefficients of the speech signal as zero-mean statistically independent Gaussian random variables. It facilitates a mathematically tractable design of useful speech enhancement algorithms in the STFT domain. However, the Gaussian approximation can be very inaccurate in the tail Tel.: +97 9731; fax: +97 95757. E-mail address: icohen@ee.technion.ac.il (I. Cohen). regions of the probability density function [9,11,1]. Martin [11] proposed to model the real and imaginary parts of the expansion coefficients as either Gamma or Laplacian random variables. Lotter and Vary [1] proposed a parametric probability density function (pdf) for the magnitude of the expansion coefficients, which approximates, with a proper choice of the parameters, the Gamma and Laplacian densities. Statistical models based on HMP try to circumvent the assumption of specific distributions []. The probability distribution is estimated from long training sequences of the speech samples. In this paper, we introduce a novel modeling approach for speech signals in the STFT domain. 15-1/$ - see front matter r Elsevier B.V. All rights reserved. doi:1.11/j.sigpro..9.1

5 I. Cohen / Signal Processing () 53 59 Autoregressive conditional heteroscedasticity (ARCH) models, introduced by Engle [] and generalized by Bollerslev [1], are widely used in various financial applications such as risk management, option pricing, foreign exchange, and the term structure of interest rates []. They explicitly parameterize the time-varying volatility in terms of past conditional variances and past squared innovations (prediction errors), while taking into account excess kurtosis (i.e., heavy tail behavior) and volatility clustering, two important characteristics of financial time-series. Speech signals in the STFT domain demonstrate both variability clustering and heavy tail behavior. When observing a time series of successive expansion coefficients in a fixed frequency bin, successive magnitudes of the expansion coefficients are highly correlated, whereas successive phases can be assumed uncorrelated. Hence, the variability of the expansion coefficients is clustered in the sense that large magnitudes tend to follow large magnitudes and small magnitudes tend to follow small magnitudes, while the phase is unpredictable. Modeling the time trajectories of the expansion coefficients as generalized ARCH (GARCH) processes offers a reasonable model on which to base the variance estimation, while taking into consideration the heavy-tailed distribution. This paper is organized as follows. In Section, we review the GARCH model. In Section 3, we define conditional variance in the STFT domain, and model the one-frame-ahead conditional variance as a GARCH process. In Section, we address the problem of estimating the model parameters. Finally, in Section 5, we demonstrate the application of the proposed model to speech enhancement.. GARCH model Let fy t g denote a real-valued discrete-time stochastic process, and let c t denote the information set available at time t (e.g., fy t g may represent a sequence of observations, and c t may include the observed data through time t). Then, the innovation (prediction error) t at time t in the minimum mean-squared error (MMSE) sense is obtained by subtracting from y t its conditional expectation given the information through time t 1; t ¼ y t Efy t jc t 1 g: (1) The conditional variance (volatility) of y t given the information through time t 1 is by definition the conditional expectation of t ; s t ¼ varfy tjc t 1 g¼ef t jc t 1g: () The ARCH and GARCH models provide a rich class of possible parametrization of conditional heteroscedasticity (i.e., time-varying volatility). They explicitly recognize the difference between the unconditional variance Ef½y t Efy t gš g and the conditional variance s t ; allowing the latter to change over time. The fundamental characteristic of these models is that magnitudes of recent innovations provide information about future volatility. Let fz t g be a zero-mean unit-variance white noise process with some specified probability distribution. Then a GARCH model of order ðp; qþ; denoted by t GARCHðp; qþ; has the following general form: t ¼ s t z t ; (3) s t ¼ f ðs t 1 ;...; s t p ; t 1...; t qþ; () where s t is the conditional standard deviation given by the square root of (). That is, the conditional variance s t is determined by the values of p past conditional variances and q past squared innovations, and the innovation t is generated by scaling a white noise sample with the conditional standard deviation. The ARCHðqÞ model is a special case of the GARCHðp; qþ model with p ¼ : The most widely used GARCH model specifies a linear function f in () as follows: s t ¼ k þ Xq i¼1 a i t i þ Xp j¼1 b j s t j ; (5) where the values of the parameters are constrained by k; a i X; b j X; i ¼ 1;...; q; j ¼ 1;...; p; X q i¼1 a i þ Xp j¼1 b j o1:

I. Cohen / Signal Processing () 53 59 55 The first three constraints are sufficient to ensure that the conditional variances fs t g are strictly positive. The fourth constraint is a covariance stationarity constraint, which is necessary and sufficient for the existence of a finite unconditional variance of the innovations process [1]. Many financial time-series such as exchange rates and stock returns exhibit a volatility clustering phenomenon, i.e., large changes tend to follow large changes of either sign and small changes tend to follow small changes. Eq. (5) captures the volatility clustering phenomenon, since large innovations of either sign increase the variance forecasts for several samples. This in return increases the likelihood of large innovations in the succeeding samples, which allows the large innovations to persist. The degree of persistence is determined by the lag lengths p and q; as well as the magnitudes of the coefficients fa i g and fb j g: Furthermore, the innovations of financial timeseries are typically distributed with heavier tails than a Gaussian distribution. Bollerslev [1] showed that GARCH models are appropriate for heavytailed distributions. where fv tk jh tk 1 g are statistically independent complex Gaussian random variables with zero-mean, unit variance, and independent and identically distributed (IID) real and imaginary parts: EfV tk jh tk 1 g¼; EfjV tkj jh tk 1 g¼1: () Accordingly, the expansion coefficients fx tk jh tk 1 g are conditionally zero-mean statistically independent Gaussian random variables given their standard deviations fs tk g: The real and imaginary parts of X tk under H tk 1 are conditionally IID random variables given s tk : Let X t ¼fX tkjt ¼ ;...; t; k ¼ ;...; K 1g represent the set of expansion coefficients up to frame t; and let s tkjt ¼n EfjX tk j jh tk 1 ; Xt g denote the conditional variance of X tk under H tk 1 given the expansion coefficients up to frame t: Then, we assume that the one-frame-ahead conditional variance s tkjt 1 evolves according to a GARCHð1; 1Þ process: 3. Speech modeling Let xðnþ denote a speech signal, and let its STFT expansion coefficients be given by X tk ¼ XK 1 xðn þ tmþhðnþ e jðp=kþnk ; () n¼ where t is the time frame index (t ¼ ; 1;...), k is the frequency-bin index (k ¼ ; 1;...; K 1), hðnþ is an analysis window of size K (e.g., Hamming window), and M is the framing step (number of samples separating two successive frames). Let H tk and H tk 1 denote, respectively, hypotheses of signal absence and presence, and let s tk denote a binary state variable which indicates signal presence, i.e., s tk ¼ when H tk ; and s tk ¼ 1 when H tk 1 : Let s tk ¼n EfjX tk j jh tk 1 expansion coefficient X tk under H tk g denote the variance of an We assume that given fs tk g and fs tk g; the expansion coefficients fx tk g are generated by X tk ¼ s tk V tk ; (7) 1 : s tkjt 1 ¼ s min þ mjx t 1;kj þ dðs t 1;kjt s min Þ; (9) where s min; mx; dx; m þ do1 (1) are the standard constraints imposed on the parameters of the GARCH model. The parameters m and d are, respectively, the moving average and autoregressive parameters of the GARCH(1,1) model, and s min is a lower bound on the variance of X tk under H tk 1 : The variances fs tkg are hidden from direct observation, in the sense that even under perfect conditions of zero noise, their values are not directly observable, but have to be estimated from the available information. If the available information includes the set of expansion coefficients up to frame t 1; then an MMSE estimator for s tk can be obtained by cs tk ¼ Efs tk jhtk 1 ; Xt 1 g: (11)

5 I. Cohen / Signal Processing () 53 59 From (7), () and the definition of the conditional variance s tkjt ; we have s tkjt 1 ¼n EfjX tk j jh tk 1 ; Xt 1 g ¼ Efs tk jv tkj jh tk 1 ; Xt 1 g ¼ Efs tk jhtk 1 ; Xt 1 gefjv tk j jh tk 1 g ¼ s c tk : ð1þ Therefore, given X t 1 ; the conditional variance s tkjt 1 is an MMSE estimator for s tk : It should be noted that in speech enhancement applications, the available information is generally the set of noisy spectral components up to frame t; rather than the clean spectral components up to frame t 1: In that case, an estimate for s tk is derived from the available noisy data [].. Model estimation In this section we address the problem of estimating the model parameters s min ; m and d: The maximum-likelihood (ML) estimation approach is commonly used for estimating the parameters of a GARCH model. We derive the ML function of the model parameters by using the expansion coefficients in some interval t ½; TŠ: For simplicity, we assume that the parameters are constant during the above interval and are independent of the frequency-bin index k: In practice, the speech signal can be divided into short time segments and split in frequency into narrow subbands, such that the parameters can be assumed to be constant in each time frequency region. Let H 1 ¼ftkjs tk ¼ 1g denote the set of time - frequency bins where the signal is present, and let / ¼½s min ; m; dš denote the vector of unknown parameters. Then for tk H 1 ; the conditional distribution of X tk given its standard deviation s tk is Gaussian: pðx tk js tk Þ¼ 1 ps exp jx tkj tk s ; tk H 1 : tk (13) Furthermore, fx tk js tk ; tk H 1 g are statistically independent. We showed in Section 3 that the MMSE estimate of s tk given the expansion coefficients up to frame t 1 is s tkjt 1 : The conditional variance s tkjt 1 can recursively be calculated from past expansion coefficients X t 1 by using (9) and the parameter vector /: Hence, the logarithm of the conditional density of X tk given the expansion coefficients up to frame t 1 can be expressed for tk H 1 as log pðx tk jx t 1 ; /Þ ¼ jx tkj s log s tkjt 1 log p: tkjt 1 (1) It is convenient to regard the expansion coefficients in the first frame fx ;k g as deterministic, and initialize the conditional variances in the first frame fs ;kj 1 g to their minimal values s min : Then, the log-likelihood can be maximized when conditioned on the first frame (for sufficiently large sample size, the expansion coefficients of the first frame make a negligible contribution to the total likelihood). The log-likelihood conditional on the expansion coefficients of the first frame is given by X Lð/Þ ¼ log pðx tk jh tk 1 ; Xt 1 ; /Þ: (15) tkh 1 \t½1;tš Substituting (1) into (15) and imposing the constraints in (1) on the estimated parameters, the ML estimates of the model parameters can be obtained by solving the following constrained minimization problem: minimize ^s min ;^m;^d P tkh 1 \t½1;tš jx tk j s tkjt 1 þ log s tkjt 1 subject to ^s min ; ^mx; ^dx; ^m þ ^do1: (1) Such a problem is generally referred to as constrained nonlinear optimization or nonlinear programming. For given numerical values of the parameters, the sequences of conditional variances fs tkjt 1g can be calculated from (9) and used to evaluate the series in (1). The result can then be minimized numerically as in Bollerslev [1]. Alternatively, the function fmincon of the Optimization Toolbox in MATLAB s can be used to find the minimum of the constrained nonlinear function of the model parameters, similar to its use within the function garchfit of the GARCH Toolbox. The

I. Cohen / Signal Processing () 53 59 57 latter function provides ML estimates for the parameters of a univariate (scalar) one-state GARCH process. It cannot be used directly in the present work, since the expansion coefficients are complex and generated from a two-state model (speech presence and absence states). 5. Experimental results In this section, we demonstrate the application of the proposed model to speech enhancement. The evaluation includes three objective quality measures, namely segmental SNR, log-spectral distortion (LSD) and perceptual evaluation of speech quality (PESQ) score (ITU-T P.). The speech signals, taken from the TIMIT database, include different utterances from different speakers, half male and half female. The signals are sampled at 1 khz and degraded by white Gaussian noise with SNR in the range ½; Š db: The noisy signals are transformed into the STFT domain using half overlapping Hamming analysis windows of 3 ms length. The parameters of the GARCH model are estimated independently for each speaker from the clean signal of that speaker, as described in Section. A speech enhancement algorithm, which is based on the proposed model and log-spectral amplitude (LSA) estimation [5], is applied to each noisy speech signal (the details of the algorithm can be found in [], and are not presented here due to space limitations). Alternatively, the variances of the speech STFT expansion coefficients are estimated by using the decisiondirected method [7] with parameters as suggested in [3] (specifically, the weighting factor is a ¼ :9 and the lower bound on the a priori signal-to-noise ratio (SNR) is x min ¼ 15 db). Table 1 summarizes the results of the segmental SNR, the LSD and the PESQ scores achieved by both algorithms. It shows that speech enhancement based on the proposed GARCH modeling method yields a higher segmental SNR, lower LSD, and higher PESQ scores than that based on the decision-directed method. A subjective study of speech spectrograms and informal listening tests confirm that the quality of the enhanced speech obtained by using the proposed modeling method is better than that obtained by using the decisiondirected method. Fig. 1 demonstrates the spectrograms and waveforms of the clean signal, noisy signal ðsnr ¼ 5 dbþ and the enhanced speech signals obtained by using the two methods. It shows that weak speech components and unvoiced sounds are more emphasized in the signal enhanced by the proposed method than in the signal enhanced by using the decision-directed method.. Conclusion We have proposed a novel approach for statistically modeling speech signals in the STFT domain. It provides an explicit model for the conditional variance and conditional distribution of the expansion coefficients. It offers a reasonable model on which to base the estimation of the variances of the STFT expansion coefficients, while taking into consideration their heavy-tailed distribution. To capture a more significant heavy tail behavior, the Gaussian distribution of fv tk jh tk 1 g may be replaced with a heavy tail Table 1 Segmental SNR, log-spectral distortion, and PESQ scores obtained by using the GARCH modeling and the decision-directed methods Input SNR (db) GARCH modeling method Decision-directed method SegSNR LSD PESQ SegSNR LSD PESQ 7.9.7.55.73.7.1 5 1.7 3.15.9 9..7.1 1 1.9. 3.39 1. 3.5.9 15 1.9 1.1 3.9 1.3. 3.31 3.3 1.1 3.9.3.13 3.

5 I. Cohen / Signal Processing () 53 59.... 1 1. (a).... 1 1. (b).... 1 1. (c).... 1 1. (d) Fig. 1. Speech spectrograms and waveforms. (a) Original clean speech signal: Now forget all this other. ; (b) noisy signal (SNR =5 db, SegSNR =.35 db, LSD =13.75 db, PESQ = 1.7); (c) speech enhanced using the decision-directed method (SegSNR = 11.7 db, LSD = 3. db, PESQ =.5); (d) speech enhanced using the GARCH modeling method (SegSNR = 1.5 db, LSD =3. db, PESQ =.). distribution, such as Gamma, Laplacian or Student-t: Furthermore, GARCH models of higher orders may be utilized. However, the choice of the particular distribution and order of the GARCH model is a matter of trial and error. Acknowledgements The author thanks Prof. Murat Kunt and the anonymous reviewers for their helpful comments. References [1] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31 (3) (April 19) 37 37. [] T. Bollerslev, R.Y. ChouKenneth, F. Kroner, ARCH modeling in finance: A review of the theory and empirical evidence, J. Econometrics 5 (1 ) (April May 199) 5 59. [3] O. Cappé, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Acoust. Speech Signal Process. () (April 199) 35 39. [] I. Cohen, Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity model, also Technical Report, EE PUB 15, Technion, Israel Institute of Technology, Haifa, Israel, April, submitted for publication. [5] I. Cohen, B. Berdugo, Speech enhancement for nonstationary noise environments, Signal Process. 1 (11) (November 1) 3 1. [] R.F. Engle, Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation, Econometrica 5 () (July 19) 97 17.

I. Cohen / Signal Processing () 53 59 59 [7] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-3 () (December 19) 119 111. [] Y. Ephraim, N. Merhav, Hidden Markov processes, IEEE Trans. Inf. Theory () (June ) 151 15. [9] S. Gazor, W. Zhang, Speech probability distribution, IEEE Signal Process. Lett. 1 (7) (July 3) 7. [1] T. Lotter, C. Benien, P. Vary, Multichannel speech enhancement using bayesian spectral amplitude estimation, in: Proceedings of the th IEEE International Conference on Acoustics Speech Signal Processing, ICASSP-3, Hong Kong, 1 April 3, pp. I_3 I_35. [11] R. Martin, Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors, in: Proceedings of the 7th IEEE International Conference on Acoustics Speech Signal Processing, ICASSP-, Orlando, Florida, 13 17 May, pp. I-53 I-5. [1] J. Porter, S. Boll, Optimal estimators for spectral restoration of noisy speech, in: Proceedings of the IEEE International Conference on Acoustics Speech, Signal Processing, San Diego, California, 19 1 March 19, pp. 1A..1 1A...