DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS. Sakari Tervo

Similar documents
Estimation of Reflections from Impulse Responses

Source localization and separation for binaural hearing aids

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Acoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions

Cepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials

arxiv: v1 [cs.sd] 30 Oct 2015

Experimental investigation of multiple-input multiple-output systems for sound-field analysis

Characterisation of the directionality of reflections in small room acoustics

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Proceedings of Meetings on Acoustics

Acoustic Signal Processing. Algorithms for Reverberant. Environments

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

A Probability Model for Interaural Phase Difference

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

A SPARSENESS CONTROLLED PROPORTIONATE ALGORITHM FOR ACOUSTIC ECHO CANCELLATION

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Acoustic MIMO Signal Processing

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

APPLICATION OF MVDR BEAMFORMING TO SPHERICAL ARRAYS

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo

A LINEAR OPERATOR FOR THE COMPUTATION OF SOUNDFIELD MAPS

International Journal of Advanced Research in Computer Science and Software Engineering

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

A LOCALIZATION METHOD FOR MULTIPLE SOUND SOURCES BY USING COHERENCE FUNCTION

THEORY AND DESIGN OF HIGH ORDER SOUND FIELD MICROPHONES USING SPHERICAL MICROPHONE ARRAY

Modeling Measurement Uncertainty in Room Acoustics P. Dietrich

LIMITATIONS AND ERROR ANALYSIS OF SPHERICAL MICROPHONE ARRAYS. Thushara D. Abhayapala 1, Michael C. T. Chan 2

Proceedings of Meetings on Acoustics

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

Introduction to Acoustics Exercises

ON THE NOISE REDUCTION PERFORMANCE OF THE MVDR BEAMFORMER IN NOISY AND REVERBERANT ENVIRONMENTS

Source counting in real-time sound source localization using a circular microphone array

Spatial sound. Lecture 8: EE E6820: Speech & Audio Processing & Recognition. Columbia University Dept. of Electrical Engineering

Loudspeaker Choice and Placement. D. G. Meyer School of Electrical & Computer Engineering

Efficient Use Of Sparse Adaptive Filters

A comparative study of time-delay estimation techniques for convolutive speech mixtures

Sparseness-Controlled Affine Projection Algorithm for Echo Cancelation

AN SVD-BASED MIMO EQUALIZER APPLIED TO THE AURALIZATION OF AIRCRAFT NOISE IN A CABIN SI- MULATOR

Sound Listener s perception

Reverberation Impulse Response Analysis

BLIND SOURCE EXTRACTION FOR A COMBINED FIXED AND WIRELESS SENSOR NETWORK

TinySR. Peter Schmidt-Nielsen. August 27, 2014

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS

AN INVESTIGATION ON THE TRANSITION FROM EARLY REFLECTIONS TO A REVERBERATION TAIL IN A BRIR

Proceedings of Meetings on Acoustics

MANY digital speech communication applications, e.g.,

2-D SENSOR POSITION PERTURBATION ANALYSIS: EQUIVALENCE TO AWGN ON ARRAY OUTPUTS. Volkan Cevher, James H. McClellan

DESIGN OF MULTI-DIMENSIONAL DERIVATIVE FILTERS. Eero P. Simoncelli

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Single Channel Signal Separation Using MAP-based Subspace Decomposition

A R T A - A P P L I C A T I O N N O T E

Last time: small acoustics

ODEON APPLICATION NOTE Calibration of Impulse Response Measurements

Research Article Doppler Velocity Estimation of Overlapping Linear-Period-Modulated Ultrasonic Waves Based on an Expectation-Maximization Algorithm

A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS. Koby Todros and Joseph Tabrikian

E : Lecture 1 Introduction

Co-Prime Arrays and Difference Set Analysis

Sound, acoustics Slides based on: Rossing, The science of sound, 1990, and Pulkki, Karjalainen, Communication acoutics, 2015

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Echo cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa

ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING

ACOUSTIC SOURCE LOCALIZATION BY FUSING DISTRIBUTED MICROPHONE ARRAYS MEASUREMENTS

Headphone Auralization of Acoustic Spaces Recorded with Spherical Microphone Arrays. Carl Andersson

Analysis and synthesis of room reverberation based on a statistical time-frequency model

STATISTICAL MODELLING OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION ERRORS. Felicia Lim, Patrick A. Naylor

NIH Public Access Author Manuscript Proc Soc Photo Opt Instrum Eng. Author manuscript; available in PMC 2013 December 17.

A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

DIFFUSION-BASED DISTRIBUTED MVDR BEAMFORMER

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

Research Letter An Algorithm to Generate Representations of System Identification Errors

New Statistical Model for the Enhancement of Noisy Speech

Robust Speaker Identification

BAYESIAN INFERENCE FOR MULTI-LINE SPECTRA IN LINEAR SENSOR ARRAY

ECE 598: The Speech Chain. Lecture 5: Room Acoustics; Filters

Improved Method for Epoch Extraction in High Pass Filtered Speech

FUNDAMENTAL LIMITATION OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION FOR CONVOLVED MIXTURE OF SPEECH

Spherical harmonic analysis of wavefields using multiple circular sensor arrays

A wavenumber approach to characterizing the diffuse field conditions in reverberation rooms

Free-Field Pressure-Based Sound Power Measurement Procedure with Low Spatial-Sampling- and Near-Field-Induced Uncertainty

STONY BROOK UNIVERSITY. CEAS Technical Report 829

ROBUSTNESS OF PARAMETRIC SOURCE DEMIXING IN ECHOIC ENVIRONMENTS. Radu Balan, Justinian Rosca, Scott Rickard

On Information Maximization and Blind Signal Deconvolution

Optimal Time Division Multiplexing Schemes for DOA Estimation of a Moving Target Using a Colocated MIMO Radar

A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS. Luke Dahl, Jean-Marc Jot

Sound Source Tracking Using Microphone Arrays

Horizontal Local Sound Field Propagation Based on Sound Source Dimension Mismatch

TIME DOMAIN ACOUSTIC CONTRAST CONTROL IMPLEMENTATION OF SOUND ZONES FOR LOW-FREQUENCY INPUT SIGNALS

A low intricacy variable step-size partial update adaptive algorithm for Acoustic Echo Cancellation USNRao

Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector

Modal Beamforming for Small Circular Arrays of Particle Velocity Sensors

Chirp Transform for FFT

Course content (will be adapted to the background knowledge of the class):

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES

Transcription:

7th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 4-8, 9 DIRECTION ESTIMATION BASED ON SOUND INTENSITY VECTORS Sakari Tervo Helsinki University of Technology Department of Media Technology P.O.Box 54, FI-5 TKK, Finland sakari.tervo@tkk.fi ABSTRACT The direction of a sound source in an enclosure can be estimated with a microphone array and some proper signal processing. Earlier, in applications and in research the use of time delay estimation methods, such as the cross correlation, has been popular. Recently, techniques for direction estimation that involve sound intensity vectors have been developed and used in applications, e.g. in teleconferencing. Unlike in time delay estimation, these methods have not been compared widely. In this article, five methods for direction estimation in the concept of sound intensity vectors are compared with real data from a concert hall. The results of the comparison indicate that the methods that are based on convolutive mixture models perform slightly better than some of the simple averaging methods. The convolutive mixture model based methods are also more robust against additive noise.. INTRODUCTION Direction or location of a sound source is of interest in several applications that aim to capture or to reproduce the sound [,, ]. This is of interest for example in teleconferencing []. Moreover, when exploring concert hall impulse responses it is of interest from where and when the direct sound and the so-called early reflections arrive [4]. Direction estimation with traditional methods, such as the cross correlation, has been researched widely over several decades (see e.g. [5] and references within). Nowadays, sound intensity vectors are used for direction estimation in increasingly many applications [,,, 6]. Not much research, if any, has been published in comparing the different methods used in direction estimation from sound intensity vectors. In this article, five direction estimation methods are compared in a real concert hall environment. A sound intensity vector has a length and a direction and it is a function of time and frequency. In this work, the focus is on the direction estimation over some set of frequencies during a certain time frame. Direction estimation methods can be broadly classified into two classes: ) direct ) and mixture estimation. The first class includes methods such as averaging. The second class contains convolutive mixture models where the direction of the sound source is found by fitting two or more probability distribution of a certain shape to the directional data, as for example in [] and [7]. Naturally, the methods in the second class are much more complex than in the first, since some optimization method has to be used to obtain the mixtures. Generally, sound intensity vectors are obtained either with microphone pair measurements or from B- format signals [4]. In practice both of these measurement techniques introduce a bias to the direction of the intensity vector. For example, in Soundfield microphones, the bias is caused by the non-idealities in the directivity patterns of the microphones [4]. In microphone pair measurements the direction estimation is biased, since the gradient of the sound pressure is not constant within the sensor array [6]. In the case of microphone pair measurement, also the non-idealities of the pressure microphones affect the measurement. The article is organized as follows. In Sec. the calculation of sound intensity vectors as well the bias compensation of the vectors are formulated. Section introduces the direction estimation methods. In Sec. 4 the experimental setup is described and in Sec. 5 the results from the experiments are presented and discussed. In Sec. 6 the final conclusions of this study are given.. THEORY In a room environment the sound s(t) traveling from the sound source to the receiver n is affected by the path h n (t): p n (t) = h n (t) s(t) + w(t), () where denotes convolution and w(t) is measurement noise, independent and identically distributed for each receiver.. Sound Intensity On a certain axis a, the sound intensity is given in the frequency domain as I a (ω) = Re{P (ω)u a (ω)}, () where P(ω) and U a (ω) are the frequency presentations of the sound pressure and of the particle velocity with angular frequency ω. In addition, Re{ } and is the real part of a complex number and denotes the complex conjugate [4]. Here, Cartesian coordinate system with x -and y coordinate axis, shown in Fig., is used. Corresponding polar coordinate presentation of the Cartesian coordinates is denoted with azimuth angle θ and radius r. The procedure for obtaining the sound intensity proceeds as follows and is similar for both axes. The sound pressure at the center point between four microphones, shown in Fig., can be approximated as EURASIP, 9 7

the average pressure of the microphones [6]: P(ω) 4 4 P n (ω). () n= For x-axis, in frequency domain the particle velocity is estimated as U x (ω) j ωρ d [P (ω) P (ω)], (4) where d is the distance between the two receivers and ρ =. kg/m is the median density of the air and j is the imaginary unit. Now, the sound intensity in () can be estimated with the approximations in () and (4). For obtaining the y-component of the sound intensity, the microphones and are replaced in (4) with microphones and 4.. Bias Compensation for the Square Grid Since the pressure signals are subtracted from each other in the approximation made in (4), the azimuth angle θ of the intensity suffers from a systematic bias [6]. This bias can be formulated for a four-microphone square grid as [6] θ biased = sin(ω d c sin(θ)) sin(ω d (5) c cos(θ)), where c = 4 m/s is the speed of sound. The bias can be compensated by finding the inverse of (5). However, this equation does not have any closed form solution. Kallinger et al. [6] estimate the inverse function with linear interpolation. In addition, the inverse function is known [6] to have an upper limit at f max = c/(d ), where ω = πf, i.e. < ω d c < π. In this work the linear interpolation for finding the solution to the inverse function of (5) is included and used for compensating the azimuth angles. The unbiased estimate, i.e. the compensated angle at i:th frequency bin is denoted with θ i.. METHODS FOR DIRECTION ESTIMATION In a certain time window, sound intensity vectors are estimated over a set of frequencies. Each intensity vector at frequency bin i, consists of a radial component r i and of an angular component θ i. In this article, the focus is on finding the direction of a sound source from a set of radial and angular components. Next, five methods for direction estimation are formulated.. Circular Mean and Median In direction estimation one has to take into account the fact that the data is circular. Therefore, for example the mean of a set of angles θ = [θ, θ,..., θ N ], θ i ( π, π] is defined as the circular mean (CME) [8]: { N ˆθ CME = arg w i e }, jθi (6) y x Figure : The square grid with four microphones and the coordinate system. d where arg{ } is the argument of a complex number, and w i = /N is the weighting function. In principle, the weighting function can be chosen freely. If the weighting function is chosen as w i = r i, with r i =, where r i is the corresponding radial component of an intensity vector, then CME is equal to the mean of the Cartesian presentation (MCA) of the sound intensity, i.e.: { N arg r i e jθi } = arctan 4 d { } E{Iy } := E{I x } ˆθ MCA, (7) where E{ } is the expected value, I x = [x, x,..., x N ], and I y = [y, y,..., y N ] are the Cartesian presentation of the vectors. Here, the circular median (CMD) is defined analogously to CME: { { } { ˆθ CMD = arg Me Re{w i e jθi } + jme Im{w i e }} } jθi, (8) where Me{ } is the median. The mode could also be used instead of the median, but this is not considered in this article. Again, the weighting function can be selected freely. Here, w i = /N is selected as earlier with CME.. Mixture Models When inspecting the distribution of the azimuth angle of the intensity vectors, for example in Fig., one can notice that the azimuth angle is a mixture of two distributions rather than one. In the example in Fig. the distribution that has smaller variance (or higher concentration) is caused by the sound source. The second distribution models the noise floor. The shape of these distributions is defined by the impulse response, i.e. the room and the frequency content of the source signal. A new sound source in the room always introduces a new distribution to the total azimuth distribution [, 7]. Since the azimuth angle is circular, wrapped distributions have to be used for the fitting. Here, two distributions are tested. The first one is the von Mises probability distribution (VM) [, 8]: 7 f VM (θ µ, κ) = eκcos(x µ)) πi(, κ), (9)

PDF pvmm(θ) pwgm(θ).4....4....4... -5 - -5 5 5 θ, [ ] Figure : Examples of the normalized histogram of the estimated azimuth angles (top, ), the von Mises mixture model (middle, - -), and the wrapped Gaussian mixture model (bottom, - -), with mixture components shown separately ( ). The vertical lines ( ) illustrate the true angle θ s = 9 (top), and estimated source directions ˆθ VMM = (middle), ˆθWGM = (bottom). Source position S and receiver position R was used (see Sec. 4). where κ is a measure of concentration, µ is the mean, and I(, κ) is the modified Bessel function of order. The second tested distribution is the wrapped Gaussian probability distribution (WG)[9]: f WG (θ µ, σ) = πσ K k= K e (θ µ πk) σ, () where σ is the variance, µ is the mean, and K + is the number of Gaussian to be wrapped, here K =. The mixture model is formulated as the sum of the distributions: p(θ µ, ρ) = M m = a m f (m) (θ µ m, ρ m ), () where, m indicates the index of a mixture, f = f WG for the wrapped Gaussian mixture model (WGM), and f = f VM for the von Mises mixture model (VMM). Parameters µ = [µ, µ,..., µ M ] and ρ = [ρ, ρ,..., ρ M ] depend also on the model. The weighting factor is here selected as a m = /M and the number of mixtures is M =, since there is only a single sound source. Thus, one of the mixture components models the noise floor and the other one the sound source. The direction of the source is then estimated as the mean parameter µ of the mixture component which has smaller variance (WGM) or higher concentration (VMM). These estimated directions are noted here with ˆθ WGM and ˆθ VMM according to the used model. In order to find the parameters of the mixture model, an optimization algorithm has to be used. The search criteria is the same as in maximum-likelihood estimation, where the goal is to maximize the likelihood L(µ, ρ) = N log p(θ i µ, ρ). () Here, the parameters are sought with MATLAB s fminsearch which uses the Nelder-Mead method []. Other optimization algorithms that are more optimal for this problem could be used as well, e.g. the expectationmaximization algorithm. Figure shows examples of distributions estimated with both of the models and an example of the original probability distribution function (PDF), i.e. the normalized histogram of the estimated directions. 4. EXPERIMENTAL SETUP A microphone array consisting of two four-microphone grids (see Fig. ) was used. The four-microphone grids share the same center point, having d = mm and d = mm. The smaller grid is used for the frequencies from khz to 5 khz and the larger for frequencies from Hz to khz. The bias compensation is done separately for both arrays. The methods were tested with real concert hall data. The impulse response measurement setup in the concert hall (Pori, located in Finland) is given in Fig.. This concert hall has 7 seats and a reverberation time of approximately. seconds. The signals were recorded at 48 khz with three sound source locations (S), and five receiving (R) locations for the array. Here, the first three receiver positions are used (R-R in Fig. ). The sound source was an omnidirectional loudspeaker of 6 cm diameter, consisting of driver elements. Two source signals, seconds of violin playing a signal tone and seconds of white noise, were convolved with the measured impulse responses. Then, signal-tonoise ratio (SNR) was varied by adding white noise (w(t) in ()) to the signals. Next, sound intensity vectors were calculated and compensated as stated in Sec.. The direction was estimated from the intensity vectors with the methods introduced in Sec. on a frame by frame basis. Frames with 4 samples in length and 5 % overlap were used. For each method and test condition, this leads to 9 87 = 68 direction estimates. Moreover, 89 bins was used in the fast Fourier transform. This implies that the direction was estimated from 87 frequency bins, since the frequency band was from to 5 khz. 5. RESULTS AND DISCUSSION In order to compare the methods, the estimated directions are processed as follows. Firstly, the anomalies are 7

4 CME MCA CMD VMM WGM µθ, [ ] Figure : Receiver (R) and source (S) positions in the concert hall of Pori. Receiver positions R, R, and R, and source positions S, S, and S were used in the experiments. removed from the direction estimates. Here, the threshold criteria for an anomaly is, i.e. if ˆθ i θ s >, where θ s is the true angle of the sound source, the estimate ˆθ i is considered to be an anomaly. The goodness of a method is evaluated with three measures. The first measure is the percentage of anomalies P AN, the second measure is the circular bias of the non-anomalous estimates θ i : { Ñ µ θ = arg e j( θ i θ s)} and the third is the circular standard deviation: [ ( σ θ = Ñ Ñ e j( θ i θ s) / )], where Ñ is the number of non-anomalous estimates, is the absolute value of a real number and is the length of a complex number. The results of the experiments for a white noise source signal are presented in Fig. 4 and for a violin source signal in Fig. 5. As expected, since white noise has energy in all frequencies it provides more robustness against additive noise and gives more accurate direction estimation results than a violin source signal. As one can see from Figs. 4 and 5, the mixture model based estimation methods, VMM and WGM, perform the best in all conditions, VMM having lowest number of anomalies in total, and therefore performing the best of all the methods. The best estimator from the first class (introduced in Sec..) is CME. However, the differences in the performance between the mixture model based estimation methods and two simple averaging methods, CME, and CMD, are not drastical. The worst estimator in all conditions is clearly MCA. Even with the highest SNR the percentage of anomalies is high (P AN 8 %). σθ, [ ] PAN, [%] 5 5 8 6 4 5 5 5 5 4 SNR, [db] Figure 4: Bias (top) and standard deviation (middle) of the non-anomalous estimates, as well as the percentage of anomalies (bottom) against SNR. White noise was used as the source signal and the reverberation time is about. seconds. VMM is the most robust method against additive noise. In Fig. 5, VMM has less than 5 % of anomalies in all the test conditions. Thus, even with high noise level VMM is able to find the distribution caused by the sound source. One reason for this might be that, although the total energy of the noise is higher than the energy of the sound source, the distribution caused by the noise has much smaller concentration than the distribution caused by the sound source. The bias of all methods is less than 4 in all cases. With noise source the bias increases in general when the SNR increases. Thus, the error distribution is biased more with high SNR. The reasons behind this unexpected behaviour are not clear. When the SNR is high the bias is introduced by the reverberation. Perhaps, this is caused by the non-idealities in the microphones. However, with violin source signal the trend of the bias is the opposite. The general trend of the standard deviation is that it decreases when the SNR increases, as expected. As the results indicated, MCA is not a good method to estimate the direction of arrival from a continuous signal. MCA gives very often anomalous estimates. This is caused by the weighting with the radial components, since when the weighting is not used, as in CME, the 7

ACKNOWLEDGEMENTS µθ, [ ] 4 CME MCA CMD VMM WGM The author wishes to thank Dr. Tapio Lokki for the supervision of this work. The research leading to these results has received funding from the Academy of Finland, project no. [99], the European Research Council under the European Community s Seventh Framework Programme (FP7/7-) / ERC grant agreement no. [66], and Helsinki Graduate School in Computer Science and Engineering. σθ, [ ] PAN, [%] 5 5 8 6 4 5 5 5 5 4 SNR, [db] Figure 5: Bias (top) and standard deviation (middle) of the non-anomalous estimates, as well as the percentage of anomalies (bottom) against SNR. The source signal was violin and the reverberation time is about. seconds. estimation is close to the actual direction. So, at least in thecase of circular mean it is not a good idea to use the radial components as the weighting function. Perhaps, if one would select only a certain set of the azimuth and of the radial components one would arrive to better result. Also, one could use a maximum-likelihood weighting (with respect to the spectrums of the signal and the noise) for the cross spectral components, as in time delay estimation [5]. 6. CONCLUSIONS Five methods for estimating the direction of a sound source from a set of sound intensity vectors was considered. There were two classes of methods in test. First class includes simple methods such as the circular mean, and the second class contains methods that are based on convolutive mixture models. The methods were tested in a real concert hall environment. The results indicate that the methods from the second class perform better and provide more robustness against additive noise than the methods from the first class. Especially, the von Mises distribution was found to suit well for the problem. References [] J. Merimaa and V. Pulkki, Spatial Impulse Response Rendering I: Analysis and Synthesis, J. Audio Eng. Soc., vol. 5, no., pp. 5 7, 5. [] B. Günel, H. Haclhabiboğlu, and A.M. Kondoz, Acoustic Source Separation of Convolutive Mixtures Based on Intensity Vector Statistics, IEEE Trans. Audio, Speech, and Language Processing, vol. 6, no. 4, pp. 748 756, 8. [] V. Pulkki, Spatial Sound Reproduction with Directional Audio Coding, J. Audio Eng. Soc., vol. 55, no. 6, pp. 5 56, 7. [4] J. Merimaa, Analysis, Synthesis and Perception of spatial sound-binaural Auditory modeling and multichannel loudspeaker reproduction, Ph.D. thesis, Helsinki University of Technology, 6. [5] J. Chen, J. Benesty, and Y. Huang, Time delay estimation in room acoustic environments: an overview, EURASIP J. Applied Signal Processing, vol. 6, no., pp. 7 7, 6. [6] M. Kallinger, F. Kuech, R. Schultz-Amling, G. del Galdo, J. Ahonen, and V. Pulkki, Enhanced Direction Estimation Using Microphone Arrays for Directional Audio Coding, in Proc. Hands-Free Speech Communication and Microphone Arrays, 8, pp. 45 48. [7] N. Madhu and R. Martin, A Scalable Framework for Multiple Speaker Localization and Tracking, in Proc. Int. Workshop on Acoustic Echo and Noise Cancellation, 8. [8] N.I. Fisher, Statistical Analysis of Circular Data, New York: Cambridge University Press, 99. [9] Y. Agiomyrgiannakis and Y. Stylianou, Stochastic Modeling and Quantization of Harmonic Phases in Speech using Wrapped Gaussian Mixture Models, in Proc. Int. Conf. Acoustics, Speech and Signal Processing, 7, vol. 4, pp. 4. [] J.A. Nelder and R. Mead, A Simplex Method for Function Minimization, The Computer Journal, vol. 7, no. 4, pp. 8, 965. 74