Outlin Advancd Multimdia Signal Procssing #5:Spch Signal Procssing -Procssing- Intllignt Elctronic Systms Group Dpt. of Elctronic Enginring, UEC Basis of Spch Procssing Nois Rmoval Spctral Subtraction Microphon Array DOA Bamformr Spch Sparation Frquncy Assignmnt Multimodal Spch Sparation Takayuki Nagai Why spch procssing? Spch signal procssing 3 Improv th spch rcognition rat Improv th spch quality spch communications (cll phon tc. nois rmoval drvrbration blind sourc sparation comprssion (coding How many microphons? singl channl charactristics harmonic structur, smoothnss dnoising, spch sparation, drvrbration visual information multi-channlmicrophon array spatial information sourc localization, dnoising, spch sparation, drvrbration Location of th sound sourc and snsors nar fild sphrical wav far fild plan wav Nois charactristics many mthods assum th nois is uncorrlatd with spch signals 4 whit nois E [ s( t n( t ] = 0
Th first stp in spch procssing Th first stp in SP (cont d fram dcomposition spch signals ar nonstationary, but can b stationary within a short priod of tim procssing fram basis ida of tim-frquncy rprsntation windowing window function is utilizd for th dcomposition fram lngth530ms shift normally, half of th fram lngth window function fram 5 x sg ( = x( w( 6 Th first stp in SP (cont d Spctral Subtraction (SS w 7 w window function Rctangular window Hamming window hamming Hanning window hanning ( n = ( n = 0.54 0.46 cos 0 w rc ( n = : 0 n L 0 : othrwis π n : 0 n L L : othrwis π n cos : 0 n L L : othrwis 0 8 wll-known dnoising mthod asy to implmnt and low computational load stimat th nois magnitud spctrum and subtract it from th input signal in ach fram! S.F.Boll 979 obsrvation x ( = s( w( additiv nois modl clan spchdsird nois th magnitud of nois spctrum W( is stimatd in any way Sˆ ( = ( X( W( (
Spctral Subtraction (cont d Spctral Subtraction (cont d simpl but works prtty good! vry fast (asy X ( = S( W( x ( = s( w( X ( = S( W( Th paramtr α controls th amount of nois subtractd from th noisy signal Sˆ ( = ( X( α W( ( ˆ = if X( α W( < 0 thn S( 0 can b X( W( < 0 9 0 Spctral Subtraction (cont d flow graph Spctral Subtraction (cont d maning of SS W ( ω sˆ ( x( Sˆ ( ω X ( ω α W ( ω X ( ω Sˆ( = ( X( α W( = X( ( α W( α W( = FFT[ x( ] X( = H( X( ( ( SS is nothing but filtring! tim varying (dpnds on input and nois spctrum
An xampl of SS Magnitud & Phas problm Combination of magnitud and phas info. from two diffrnt spch signals MATLAB sourc cod can b downloadd (MMSP wb sit 4.47 [db] 7.7dB 3 Aftr subtraction thr rmain paks in th nois spctrum. Th narrowr paks ar prcivd as tim varying tons which w rfr to as musical nois. 4 Multi-channl spch procssing two-channl spch mixing modl Multi-usr nvironmnt 5 Man-machin intrfac Human-Robot communication Multi-channl spch sparation Cllular phon 6 tim-frquncy domain Y Y H = H H H S S
thr xist many mthods Microphon array Microphon array (multi channl 7 spatial information Indpndnt Componnt Analysis:ICA (multi channl statistical information W-disjoint orthogonality (singl or multi channl tim-frquncy sparsity (mask Multi-channl SAFIA, DUET, Frquncy Assignmnt Singl-channl Factorial HMM 8 Dirction of Arrival (DOA important cus ITD(intraural tim diffrnc IID(intraural intnsity diffrnc HRTF(Had-Rlatd Transfr Functio algorithms Dlay and sum Phas Transform (PHAT Crosspowr Spctrum Phas (CSP High-rsolution mthod Minimum varianc mthod MUSIC (MUltipl SIgnal Classificatio Bam forming Dlay and sum Adaptiv microphon array CSP mthod Dlay and sum bamformr Tim dlay stimation using crosspowr spctrum phas 9 cross corrlation τ x ( x ( t t θ d CSP p=argmax(csp τ = p T θ θ = sin CSP = IFFT cτ d * X ( X( X( X( X( = S( idal cas X( = S( jωk jωm 0 x ( x ( t x 3( t x M (t t D D D3 y(t DM x ( t x t x t x t D D ( D 3( D3 M ( M Di = ( Li Lmi / c y( t = M i= x ( t i D i
Adaptiv array Adaptiv array (cont'd targt sourc τ x ( x ( t t if w know th DOA of th nois, "Null" can b strd to that dirction! θ d - y (t θ D x ( t D compnsation (filtr yˆ ( t nois H compnsation (filtr D compnsation (filtr H adaptiv controllr H H H3 HM adaptiv controllr - D D D3 H H H3 y(t - - - DM griffith-jim array minimiz th powr of output Charactristics of spch signals Exampls Assumption (W-d orthogonality signals from diffrnt sourcs do not shar th sam frquncy at th sam tim mask channl or channl? channl or channl? Mixd input spch mask 3 frquncy 4
Frquncy Assignmnt Frquncy Assignmnt (cont d Input signal wight = Output signal Input signal Frquncy Assignmnt wight = 0 Output signal wight = wight = 0 5 6 Frquncy Assignmnt (cont d Frquncy Assignmnt (cont d τ i : tim dlay of arrival from sourc to th microphon i Transfr function H = phas diffrnc τ : s s tim dlay ij ij ωτ θ θ ωτ θ θ jθ ω s tim dlay τ : s τ ω should b an idal phas diff. at ω ij Z = Y Y Z whr = α( l, ( = Y Y = α( l, ( α( l, = H β = H S S jωτ β ( jω ( τ τ jωτ β ( jω ( τ τ 7 8
Word rcognition rsult multimodal(mdia information word rcognition rat [%] original mixd spch procssd spch 9 location of th othr spakr 30 Intgration (spakr modl FA Visual information probabilistic intgration log Pr( ω S = log Pr( ω S y, y, λ i y, λ i, λ, λ i i log Pr( y y, ω S spakr modl frquncy assign mnt Us of visual infromation fac dtction rcognition tracking (color-basd τ i Estimation of (spakr localizatio CSP mthod (half sampl accuracy find multipl paks according to th fac locations Phas rror i.i.d. Gaussian PDF log Pr( y y, ω S = ( θ ( ωτ Σ 3 FA const 3
Multimodal spch Exprimntal sttings locations 33 34 Rsults Rsults Exampl 35 36