Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza

Size: px

Start display at page:

Download "Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza"

Bryce Holmes
5 years ago
Views:

1 Albenzio Cirillo INFOCOM Dpt. Università degli Studi di Roma, Sapienza ET2010 XXVI Riunione Annuale dei Ricercatori di Elettrotecnica Napoli, 9-11 giugno 2010

2 Motivation Sound Source Localization (SSL): Time-Delay Estimation (TDE) approach; Steered Response Power (SRP) approach. TDE anomalies and secondary peaks: Smoothed Likelihood Function (SLF); Optimal Line Selection (OLS): parameters and robustness. Simplified Optimal Line Selection: Computational savings. Conclusion 2

3 Ultimate goal of engineering is to create context-aware machines context-awareness depends on the knowledge of basic information useful to describe events: the position of a sound source is a fundamental information to refine further audio signal processing algorithms; Performance of state-of-art SSL estimators are hampered by the presence of reverberation (T60), so there is still need for a robust solution. Effect of reverberation is commonly represented by a FIR filter and the signal acquired from each microphone is corrupted by additive white noise. A typical approach consists of estimating the time-delay between microphone pairs and combining the delays to retrieve the position information. Source s(t) s m1 m2 mn x1(t)=h1(t)*s(t)+n1(t) x2(t)=h2(t)*s(t)+n2(t) xn(t)=hn(t)*s(t)+nn(t) SSL estimator 3

4 Blind Channel Identification (BCI) Method Source s(t) x1[n] x2[n] hˆ2 [ n] hˆ1 [ n] + - e[n] h ˆ [ n] arg max h ˆ [ ] ˆ arg max 2 n 1 n n Pro: Reverberation is considered by modeling the channel as a FIR. Cons: h1[n] and h2[n] possibly have common zeros, so there is not unique solution to the BCI problem. Filter length needs to be increased with a higher reverberation time. Speech signal does not guarantee the convergence of the adaptative algorithm. 4

5 Generalized Cross Correlation (GCC) G x1 x 2 (f ) CrossPower Spectrum Density; g (f ) Generic Pre-filtering Function. Phase Transform (PHAT) PHAT mitigates the effect of reverberation on GCC.. ( g ) 2 ( ) ( ) ( ) j f x1 x 2 g x1 x2 R f G f e df PHAT g ( f ) ˆ arg max 1 G ( f) xx 1 2 PHAT R x ( ) 1x 2 T 60 =0.15 s T 60 =0.9 s 5

6 There is a TDE anomaly whenever PHAT ( PHAT Rx kt) R ( ), with kt T 1x2 x1x 2 TRUE TRUE Probability of Anomaly (PA) can be evaluated considering GCC samples as variables of a normal distribution whose variance depends on noise level [Ianniello,1982] or on the reverberation time [Gustafsson,Trivedi,Rao 2003] P A L1 z0 z 0 zldz l 1 dz L is the number of samples in the feasible range of delays; z 0 z l is the variable associated to the max GCC value; is the variable associated to the generic GCC value. 0 Effects of reverberation depend on the distance between the source and the microphones r i [ m] i 6

7 A different approach to SSL is SRP-PHAT, that does not require TDE. Given the mathematical delay SRP-PHAT function for M microphone pairs SRP M j f s m1i m2i ( s) ( ) x ( ) 1ix2i i1 Sound map of the ISPAC Lab s m1 s m2 T(s,m 1,m2), c speed of c PHAT g 2 (,, ) f G f e df propagation, ŝ arg max This filter-and-delay beamformer is robust to reverberation. It is needed an exhaustive search on a grid of point in the investigated environment. Scaling SRP values, it is possible to obtain an image known as Sound Map of the acoustic event. 7 s F SRP (s)

8 Smoothed Likelihood Function (SLF) Given the first k peaks of GCC-PHAT for each of the P microphone pairs P ( p) ( p) 2 SLF ( s) max k Vk ( ( s, m1p, m2p); k, ) p1 PHAT V R ( ) k x x k 1 2 SLF for a single mic pair with k=3 8

9 SLF is much more a regular function than SRP-PHAT. ISPAC Lab 4.22x5.5x2.93 m; T 60 =0.3s; SLF SRP-PHAT SLF SRP-PHAT 9

10 OLS is an extension of the Linear Intersection (LI) technique for SSL[Brandstein,1995]. Mic quadruple: 2 orthogonal pairs. 1 c cos 2 2 d s.t. cos ( 12) cos ( 34) 1 10

11 LI Mic quadruple Considering k delays for each mic pair, it is possible to bear up to k 2 lines for each quadruple. Points at minimum distance between skew lines 11

Select the set of lines that generates the most compact set of points at minimum distance; C L 1 N / 2 2 2 ( i, j) L b sij, point at min distance between line i

12 Select the set of lines that generates the most compact set of points at minimum distance; C L 1 N / ( i, j) L b sij, point at min distance between line i and j;; N, the number of mic pairs bl, mid-point of the L-th set of points s ij L 2 Up to k 2*Q sets of lines, with Q quadruples; It is needed a sub-optimal search. 12

13 Simplified OLS (S-OLS) is a sub-optimal version of the OLS estimator. Considering just 2 quadruples, it is possible to select up to k 4 initial guesses as source position and use them to find the best set of lines. The number of comparisons to select the best set is: (Q-2)k 2 *k 4 OLS requires k 2Q comparisons, so let us try to compare it to S-OLS: ( Q 2) k * k k 2 4 2Q 6 2Q ( Q2) k 1 Example k 3, Q4 2Q OLSev k S-OLSev ( Q 2) k * k 14 S-OLSev 0.2 OLSev 58 13

14 Root Mean Square Error of Localization versus Reverberation Time Room dimension : Lx=10m, Ly=6.6m, Lz=3m OLS and S-OLS performance are similar to SRP-PHAT. 14

15 In case none of the set of lines produces a set of points that is compact enough, there is a missed localization event. K(1:3) K(1:3) Increasing k does make sense! 15

16 Are we really solving the problem of time-delay ambiguities? In most cases, increasing k, time-delays are disambiguated.. Results on 6 different mic pairs K(1:3) 16

OLS is a real-time algorithm that allow us to have a good robustness to varying environment condition, but still requires a good microphone displacement.

Frames from the Automatic Cameraman Demo http://ispac.ing.uniroma1.it/albenzio/in dex.

17 OLS is a real-time algorithm that allow us to have a good robustness to varying environment condition, but still requires a good microphone displacement. Next goal is to produce a joint audio-video estimator, in order to build up a generic framework for audio-video processing and classification. Frames from the Automatic Cameraman Demo dex.htm Venue: SmartSpaces Lab, CalIt2 Building, University of California, San Diego. Description: the red square is located around the sound source that has been estimated thanks to the OLS method. The system is made of 16 microphones. 17

18 I. Parisi R., Cirillo A., Panella M., Uncini A., SOURCE LOCALIZATION IN REVERBERANT ENVIRONMENTS BY CONSISTENT PEAK SELECTION, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2007), Honolulu, Hawaii, USA, April 15-20, II. Musolino A., Raugi M., Turcu F., Parisi R., Uncini A., Cirillo A., LOCALIZATION OF DEFECTS IN CONCRETE STRUCTURES VIA THE CROSS POWER SPECTRUM PHASE, Proc. Of PIERS 2007, Prague, Czech Republic, August 27-30, III. Cirillo A., Parisi R., Uncini A., SOUND MAPPING IN REVERBERANT ROOMS BY A ROBUST DIRECT METHOD, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2008), Las Vegas, Nevada, USA, March 30-April 4, IV. Cirillo A., Parisi R., Uncini A., PREFILTERING TECHNIQUES ON CONSISTENT PEAK SELECTION FOR TALKER POSITION ESTIMATION IN REVERBERANT ROOMS, Proc. Of Hands-Free Speech Communication and Microphone Arrays, 2008, Trento, Italy, May 6-8, V. Cirillo A., Parisi R., Uncini A., A NEW CONSISTENCY MEASURE FOR LOCALIZATION OF SOUND SOURCES IN THE PRESENCE OF REVERBERATION, Proceedings of the IEEE Digital Signal Processing conference (DSP 2009), Santorini, Greece, July 5-7, VI. VII. Zannini C. M., Cirillo A., Parisi R., Uncini A., IMPROVED TDOA DISAMBIGUATION TECHNIQUES FOR SOUND SOURCE LOCALIZATION IN REVERBERANT ENVIRONMENTS, Proceedings of IEEE International Symposium on Circuits And Systems, Paris, France, May 28-Jun 2, Cirillo A., Scarpiniti M., Parisi R., Uncini A., SIMPLIFIED OPTIMAL LINE SELECTION FOR ACOUSTIC LOCALIZATION IN THE PRESENCE OF REVERBERATION, EUSIPCO 2010, Aalborg, Denmark, August 23-27,

A comparative study of time-delay estimation techniques for convolutive speech mixtures

A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es