NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Similar documents
Relative Transfer Function Identification Using Convolutive Transfer Function Approximation

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Convolutive Transfer Function Generalized Sidelobe Canceler

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

Acoustic MIMO Signal Processing

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

NOISE reduction is an important fundamental signal

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

An Adaptive Sensor Array Using an Affine Combination of Two Filters

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Improved noise power spectral density tracking by a MAP-based postprocessor

Modifying Voice Activity Detection in Low SNR by correction factors

BLOCK-BASED MULTICHANNEL TRANSFORM-DOMAIN ADAPTIVE FILTERING

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

ON THE NOISE REDUCTION PERFORMANCE OF THE MVDR BEAMFORMER IN NOISY AND REVERBERANT ENVIRONMENTS

New Statistical Model for the Enhancement of Noisy Speech

AN APPROACH TO PREVENT ADAPTIVE BEAMFORMERS FROM CANCELLING THE DESIRED SIGNAL. Tofigh Naghibi and Beat Pfister

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo

Speaker Tracking and Beamforming

Digital Signal Processing

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

1584 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization

2D Spectrogram Filter for Single Channel Speech Enhancement

Source localization and separation for binaural hearing aids

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Co-prime Arrays with Reduced Sensors (CARS) for Direction-of-Arrival Estimation

Overview of Beamforming

2-D SENSOR POSITION PERTURBATION ANALYSIS: EQUIVALENCE TO AWGN ON ARRAY OUTPUTS. Volkan Cevher, James H. McClellan

Detecting Parametric Signals in Noise Having Exactly Known Pdf/Pmf

Comparison of RTF Estimation Methods between a Head-Mounted Binaural Hearing Device and an External Microphone

System Identification in the Short-Time Fourier Transform Domain

Adaptive Filter Theory

Time-domain noise reduction based on an orthogonal decomposition for desired signal extraction

Chapter 2 Fundamentals of Adaptive Filter Theory

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Median Filter Based Realizations of the Robust Time-Frequency Distributions

A. Ideal model. where i is an attenuation factor due to propagation effects,

II. BACKGROUND. A. Notation

Polynomial Matrix Formulation-Based Capon Beamformer

A Priori SNR Estimation Using Weibull Mixture Model

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION

APPLICATION OF MVDR BEAMFORMING TO SPHERICAL ARRAYS

International Journal of Advanced Research in Computer Science and Software Engineering

Efficient Use Of Sparse Adaptive Filters

Sensor Tasking and Control

Modeling speech signals in the time frequency domain using GARCH

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

A Generalization of Blind Source Separation Algorithms for Convolutive Mixtures Based on Second-Order Statistics

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

New Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution

15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, September 3-7, 2007, copyright by EURASIP

Adaptive Systems Homework Assignment 1

Overview of Single Channel Noise Suppression Algorithms

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

LOW COMPLEXITY COVARIANCE-BASED DOA ESTIMATION ALGORITHM

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

On the INFOMAX algorithm for blind signal separation

Performance Analysis of an Adaptive Algorithm for DOA Estimation

Detection and Localization of Tones and Pulses using an Uncalibrated Array

Feature extraction 2

Comparison of DDE and ETDGE for. Time-Varying Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

SNR Features for Automatic Speech Recognition

BLIND SOURCE EXTRACTION FOR A COMBINED FIXED AND WIRELESS SENSOR NETWORK

MANY digital speech communication applications, e.g.,

SPEECH enhancement algorithms are often used in communication

Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

An Observer for Phased Microphone Array Signal Processing with Nonlinear Output

Parametric Method Based PSD Estimation using Gaussian Window

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

A Fast Algorithm for. Nonstationary Delay Estimation. H. C. So. Department of Electronic Engineering, City University of Hong Kong

A Local Model of Relative Transfer Functions Involving Sparsity

A Priori SNR Estimation Using a Generalized Decision Directed Approach

Acoustic Source Separation with Microphone Arrays CCNY

A comparative study of time-delay estimation techniques for convolutive speech mixtures

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

System Identification in the Short-Time Fourier Transform Domain

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS

6.435, System Identification

Adaptive Stereo Acoustic Echo Cancelation in reverberant environments. Amos Schreibman

A New High-Resolution and Stable MV-SVD Algorithm for Coherent Signals Detection

Introduction to distributed speech enhancement algorithms for ad hoc microphone arrays and wireless acoustic sensor networks

INTRODUCTION Noise is present in many situations of daily life for ex: Microphones will record noise and speech. Goal: Reconstruct original signal Wie

Transcription:

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll sikora}@nue.tu-berlin.de ABSTRACT Microphone arrays are suitable for a large range of applications. Two important applications are speaker localization and speech enhancement. For both of these the transfer functions from one microphone to the other microphones are needed to form potential algorithms for these applications. In this paper we present a new transfer function estimator optimized for speech sources in a noisy environment. To achieve this, we integrate a new covariance matrix estimation algorithm for the noisy speech as well as for the adaptive and correlated noise signals as received by the microphones. Results indicate that our algorithm outperforms other state-of-the-art algorithms. 1. INTRODUCTION System identifications is a very useful tool for signal processing tasks. In particular, in speech signal processing system identification is needed for various applications. Two important applications are beamforming algorithms [1] where the system identification is used to form the blocking matrix in adaptive beamformer, and acoustic source localization where system identification is used to determine the time difference of arrival (TDOA) as a prior step to the localization [2],[3]. The traditional method for estimating a transfer system is the cross-correlation method. This method has the disadvantage that it is biased in the presence of noise. Weinstein and Shalvi proposed in [4] an unbiased estimator of the transfer function by assuming non-stationarity of the desired signal whereas the cross-correlation between the additive interfering signals at the sensors has to be stationary. Then they divide the observed time interval into subintervals and analyze in each subinterval the cross power spectral density (PSD). Hence, they get an equation for each subinterval leading to an overdetermined set of linear equations. With a weighted least-squares (WLS) approach, the error variance is minimized. Unless the problem of the biased estimator was solved the error variance of the estimator is increased because of the reduced number of samples used for the estimation of the cross PSD. Cohen extended the algorithm from Weinstein and Shalvi and adapted the algorithm to noisy speech signals [5]. Therefore, he used a single channel noise reduction algorithm [6]. He reduces the error variance of the cross PSD estimation with a recursive smoothing algorithm. This increases the correlation between the cross PSD estimation at adjacent subintervals and the obtained equations will be correlated. In this paper we introduce a new estimation algorithm for the covariance matrices optimized for noisy speech signals. Our new estimation algorithm derives an estimation of the covariance matrix of the noisy speech signals as well as for the covariance matrix of the additive noise. We incorporate all the informations comprised in these covariance matrices to improve the estimation of an unbiased transfer function. 2. PROBLEM FORMULATION Consider a two-microphone system with: x 1 (t) = h 1 (t) s(t) + n 1 (t) (1) x 2 (t) = h 2 (t) s(t) + n 2 (t) where x 1 (t) and x 2 (t) represent the microphone signals. s(t) is the desired speech signal. h 1 (t) and h 2 (t) are modeling the impulse response functions from a point source (in this case a speaker) to the microphone 1 and 2, respectively. The additive noise signals in the microphone signals n 1 (t) and n 2 (t) comprise a correlated part and an uncorrelated part: n 1 (t) = h n1 (t) n(t) + n d1 (t) (2) n 2 (t) = h n2 (t) n(t) + n d2 (t) where n(t) is a directed noise source with the transfer functions to the microphones h n1 (t) and h n2 (t). The operation denotes the convolution. The parts n d1 (t) and n d1 (t)

are uncorrelated diffuse noise. The speech source s(t) is assumed to be uncorrelated with the additive noise parts n 1 (t) and n 2 (t). 2.1. Relative transfer function system identification In the relative transfer function system identification one microphone signal is used as a reference signal and is subtracted from the filtered second microphone signal. Figure 1 shows such a system where x 2 (t) is used as the reference signal. density of the first microphone signal. X 1(ω) is the complex conjugate of X 1 (ω). With the above made assumptions, cross PSD and auto PSD become (for simplicity the parameter ω is omitted in the following): φ x1 x 2 = H 1 H 2φ ss + φ n1 n 2 (8) = H 1H 2 φ ss + H n 1 H n2 φ nn x 1 (t) w(t) + e(t) = H 1 2 φ ss + φ n1 n 1 (9) = H 1 2 φ ss + H n1 2 φ nn + φ nd1 n d1 x 2 (t) Fig. 1. Reference based system identification The aim of the system identification (calculation of w(t)) is that the error signal e(t) = w(t) x 1 (t) x 2 (t) (3) does not contain any part of the speech source signal s(t). Rewriting equation 3 using equation 1 yields: ( ) e(t) = w(t) h 1 (t) h 2 (t) s(t)+w(t) n 1 (t) n 2 (t) (4) To eliminate the speech signal s(t) in e(t) the system identification has to estimate w(t) = h 2 (t) h 1 (t) 1. (5) Thus, w(t) is the transfer function from the first microphone to the second microphone in the time domain. Transforming equation 5 into the frequency domain we obtain the relative transfer function (RTF): W(ω) = H 2(ω) H 1 (ω). (6) The simplest algorithm to identify the RTF is the so called cross-correlation method: Ŵ CC (ω) = φ (ω) (ω) (7) Equation 8 and 9 show that Ŵ CC (ω) is a biased estimator of W in a noisy environment. The cross correlation method is still biased even if only diffuse noise (n d1 and n d2 ) is present. For an unbiased estimator of W(ω) it is necessary to estimate φ x1 x 2,, φ n1 n 2, and φ n1 n 1. Then an unbiased estimator is given by: Ŵ 1 unbiased = φ φ n1 n 2 φ n1 n 1 (1) another unbiased estimation of W is: Ŵ 2 unbiased = φ x 2 x 2 φ n2 n 2 φ x2 x 1 φ n2 n 1 (11) where the estimations of φ x2 x 2 and φ n2 n 2 are also used. The final unbiased estimation is then mean of both: Ŵ unbiased = (f 1 Ŵ 1 unbiased + f 2 Ŵ 2 unbiased) (12) where f 1 and f 2 are weighting functions depending on the SNR. They are defined as follows: f i (SNR) = φ xi x i φ ni n i φ n1 n 1 + φ x 2 x 2 φ n2 n 2 (13) for i = 1, 2. The covariance matrix of the noisy speech signals and covariance matrix of the noise signal contain all desired cross and auto power spectra. C x1 x 2 = ( φx1 x 1 φ x1 x 2 φ x2 x 1 φ x2 x 2 ) (14) in which φ x1 x 2 (ω) = E{X 1(ω)X 2 (ω)} is the cross power spectral density (PSD) of the two microphone signals and (ω) = E{ X 1 (ω) 2 } is the auto power spectral ( ) φn1 n C n1 n 2 = 1 φ n1 n 2 φ n2 n 1 φ n2 n 2 (15)

3. IMPLEMENTATION In our implementation the system identification is carried out in the frequency domain. Therefore, we use the shorttime Fourier Transformation (STFT) where the parameters k and l will denote the frequency and time block index, respectively. As we have seen in the previous section we need to estimate the covariance matrix of the noisy speech signals and the covariance matrix of the pure noise signals. The estimation is carried out with a recursive smoothing. If X(k, l) and Y (k, l) are the short-time spectrum of the signals x(n) and y(n) then smoothing of the covariance matrix can be expressed as C XY (k,l) = αc XY (k,l 1)+(1 α)c inst XY (k, l) (16) where α is( a smoothing ) factor and X C inst XY (k, l) = (k, l) (X(k, ) l) Y (k, l) is the instantaneous estimation of the covariance matrix. Y (k, l) For the traditional algorithms the smoothing factor is kept constant α =.95. In our scenario we deal with speech signals corrupted with background noise. Therefore, we have developed a new optimized covariance matrix estimation for the noisy speech and for the background noise based on a recursive smoothing. In contrast to the previous estimation, where the smoothing factor remains constant, the smoothing factor becomes time and frequency dependent, i. e. α(k, l). Depending on the actual signal-to-noise ratio SN R(k, l) the smoothing factor for the noisy speech can be defined as follows: α Opt (k, l) = (17) 1 if SNR(k, l) SNR min α min if SNR(k, l) SNR max 1 (1 α min ) SNR(k,l) SNR min SNR max SNR min otherwise In [7] an estimator of the a priori SNR was derived which is used here. Finally, the optimized estimation of the noisy speech covariance matrix is given by: C Opt (k, l) = α Opt (k, l)c Opt (k, l 1) (18) +(1 α Opt (k, l))c inst (k, l) If SNR(k, l) is low the adaptation of the covariance matrix is stopped because the smoothing factor α(k, l) is set to one. On the other hand, the instantaneous estimation of covariance matrix C inst (k, l) gets a high weight because α(k, l) is set small. The estimation of the SNR(k, l) is coming from a one channel noise reduction system. The estimation of the noise covariance matrix is also controlled by the smoothing factor. The speech presence probability p(k, l) defined in [8] controls the update equation for the noise covariance matrix estimation. The algorithm [8] is extended with the optimally smoothed power spectral density estimation according to [9]. The update rule for the noise covariance matrix estimation is proposed by: C n1 n 2 (k, l) = α N (k, l)c n1 n 2 (k, l 1) (19) +(1 α N (k, l))c inst (k, l) with α N (k, l) = p(k, l)(1 α min ) + α min. (2) Figure 2 illustrates the performance of the used noise tracking algorithm. The noisy signal is a speech signal with additive cockpit noise. The SNR of the noisy signal is 1 db. The thine line represents the optimally smoothed PSD of the noisy speech [9] and the bold line shows the tracking of the noise PSD. 18 16 14 12 1 8 6 4 2 2 4 6 8 1 12 14 16 18 2 Time index l Fig. 2. Non-stationary noise tracking example for the frequency bin k = 15. Thin line: optimally smoothed PSD of noisy signal; bold line: estimated noise PSD 4. EXPERIMENTAL SETUP AND RESULTS For the evaluation of the different system identification algorithms we generated two microphone signals with the following parameters. The different impulse response answers are given by discrete sequences: h 1 (n) = [1,.4,.1,.3,.2,.1,, ] h 2 (n) = [, 1,.5,.3,,.2,,.1] h n1 (n) = [, 1,,.5,.3,.1,.2] h n2 (n) = [1,,.5,.1,,.2, ] In figure 3 an example of a reverberated clean speech signal (upper signal waveform) and the noisy speech signal

(lower signal waveform) with a SNR of -2,54 db are plotted. The additive noise is a stationary white gaussian noise..5.5.5.5.5 1 1.5 2 2.5 3.5 1 1.5 2 2.5 3 Time [sec] in [5], the cross correlation method with the proposed covariance matrix estimation optimized for noisy speech signals, and the proposed algorithm implementing equation 12. The parameter α min is set to.95 and the step size µ in the adaptive online algorithm is chosen to.1. The algorithms are tested with two different noise types, stationary white gaussian and non-stationary cockpit noise. For both noise types two different noise levels are created. The global SNR is the mean SNR between the first and the second microphone signal. In figure 4 the resulting error signals from the different system identification algorithms are visualized. The generated microphone signals are corrupted with the non-stationary noise at a level of 1 db. The final estimation of the RTF is used for the whole sequence. The adaption process is not considered in the error signals. Fig. 3. Signal waveforms, upper plot: reverberated clean speech, lower plot: noisy speech signal As an objective measure we use the signal blocking factor (SBF) defined in [5] as: SBF = 1log 1 n (h 1(n) s(n)) 2 n e(n)2 (21) n (h 1(n) s(n)) 2 is the energy of the speech signal contained in the first microphone signal. This measure can be pulled up to evaluate the usability of the algorithm for a blocking matrix in an adaptive beamformer. SBF [db] stationary non-stationary global SNR -.52 db 8.97 db.96 db 8.92 db W CC -.12-5.14-1.71-7.69 W Cohen -6.47-9.63-8.3-16.45 W proposed1-1.21-17.77-15.11-22.91 W proposed2-12.54-18.84-17.42-25.1 Table 1. Comparison of the different system identification algorithms with respect to their SBF performance. W CC is the traditional biased cross correlation method, W Cohen represents the adaptive online algorithm from [5], W proposed1 uses the proposed covariance matrix estimation from equation 18, and W proposed2 is the implementation of unbiased estimator from equation 12 Table 1 summarizes the results of the performance evaluation of four different system identification algorithms with respect to their SBF. The tested algorithms are the biased cross correlation method with a constant smoothing factor (W CC ), the adaptive online algorithm proposed by Cohen Amplitude.5.5.5 1 1.5 2 2.5 3.5.5.5 1 1.5 2 2.5 3.5.5.5 1 1.5 2 2.5 3.5.5.5 1 1.5 2 2.5 3 Time [sec] Fig. 4. First plot: reverberated clean signal, second plot: error signal from system identification algorithm after [5], third plot: error signal from with system identification algorith optimized covariance matrix estimation, fourth plot: error signal from unbiased system identification algorithm 5. CONCLUSIONS In this paper we have introduced a new relative transfer function estimation optimized for noisy speech signals for a two-microphone system. Components from a noise reduction system are used to design an covariance matrix estimation algorithm. It has been shown that this algorithm outperforms other state-of-the-art algorithms. Changes in the transfer function resulting from a speaker movement can be handled since the update rule for the covariance is adaptive. This algorithm is able to cope with competing noise sources. If there is a competing speaker this algorithm will fail because it relies on the assumption to differentiate between noise and speech. In this case a multiple input multiple output (MIMO) system has to be considered.

For future work, we plan to extend the proposed algorithm to a multi-channel system. This will increase the degree of liberty for the identification process. Hence, we expect a better performance in minimizing the final error signal of the system. 6. REFERENCES [1] S. Gannot, I. Cohen, and E. Weinstein, Signal enhancement using beamforming and nonstationarity with applications to speech, Trans. Signal Processing, vol. 49, pp. 1614 1626, August 21. [2] Yiteng Huang, Jacob Benesty, and Gary W. Elko, An efficient linear-correction least-squares approach to source localization, in Applications of Signal Processing to Audio and Acoustics, October 21, pp. 67 7. [3] J. Benesty, Adaptive eigenvalue decomposition algorithm for passive acoustic source localization, JASA, vol. 17(1), pp. 384 391, January 2. [4] O. Shalvi and E. Weinstein, System identification using nonstationary signals, IEEE Transactions Signal Processing, vol. 44, pp. 255 263, August 1996. [5] I. Cohen, Relative transfer function identification using speech signals, IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 451 459, September 24. [6] Israel Cohen, On speech enhancement under signal presence uncertainty, in International Conference on Acoustic and Speech Signal Processing, May 21, pp. 167 17. [7] Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, in IEEE Trans. Acoust., Speech, Signal Processing, April 1985, vol. ASSP-33, pp. 443 445. [8] Israel Cohen, Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 5, pp. 466 475, September 23. [9] Rainer Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Transactions on speech and audio processing, vol. 9, no. 5, pp. 54 512, July 21.