Chinese Journal of Scientific Instrument. High frequency we ighted M FCC extraction for noise robust speaker ver if ication

29 3 20083 Chinese Journal of Scientific Instrument Vol129 No13 Mar. 2008 M FCC 1, 1, 2 (1 400044; 2 400044) : MFCC Mel,,,,, MFCC,,, : ; ; ; ; MFCC : TP192. 3 : A: 520. 2040 High frequency we ighted M FCC extraction for noise robust speaker ver if ication Chen D i 1, Gong W eiguo 1, L i Bo 2 ( 1 Key Laboratory for O ptoelectronic Technology and System of the Education M inistry of China, Chongqing U niversity, Chongqing 400044, China; 2 M odern Physics Centre, M aterial Science and Engineering Post2doctorial W orkstation, Chongqing 400044, China) Abstract: This paper p roposes a high frequency weighted M FCC extraction method to imp rove the perform ance of speaker verification in noise conditions. A s the M el frequency has a logarithm ic relationship w ith linear frequency, spectral resolution in high frequency domain would decline. Fram es of purely periodic speech signal can avoid har2 monic leakage, and more high frequency information would be reserved. To get speech enhancement, high frequency energy amp litude weighted method is p roposed. This method was app lied in p itch synchronous p rep rocessing M FCC feature extraction, and speaker verification experiments were conducted. The results show that the recognition rates are imp roved in several kinds of noise environm ents even when the SNR is low. Key words: high frequency weighted; speaker verification; p itch synchronous; robust; M FCC 1,,,, 3: [ 1 ] [ 2 ] [ 3 ],, : 2007204 Received Date: 2007204,,, 2,,,,

3 : MFCC 669, Mel [ 4 ] (MFCC),,,MF2 CC,Mel,, MFCC, NTT 2M el, [ 5 ], 1 000 Hz, 1 000 Hz, Mel 1 Mel1 000 Hz 1 /1 000 fmel : M el( f) = 2 595 log 10 ( 1 + f 700 ) (1) 1f, Hz, B Mel,Mel 1Mel Fig. 1 Mel frequency VS linear frequency s p ( n) s r ( n), s p ( n), s ( n) t( n), : D ( S, T) = [ log S () - log T () ] d (2) - s p ( n)t p ( n), : D ( S p, T p ) = [ log S - p () - log T p () ] d (3),, [ 7 ],, 210 ( 5, 5) s ( n), 200 t 1 ( n) t 2 ( n) t 200 ( n) : D 1 ( S, T) 200 = j =1 [ - ( log S () - log T j () ) d] (4) : S () T j () s ( n) t j ( n ) : D 2 (S, T) 200 = j =1 [ - ( log S p () - log T pj () ) d] : S p () T pj () s p ( n) t pj ( n) (5) (4 0008 000 Hz) (04 000 Hz), 2 1, Mel,, Mel, Mel, 3 MFCC, [ 6 ],s ( n) ( a) 2 Fig. 2 Spectral distances of speech

670 2 9 ) [ 8 ] 5 3 Fig. 3 Sound p ressure level in noise environments 5M FCC 2,o,+ 2 ( a),2 ( b),2 ( c) 2 (a), 10,,2 ( b),2 (c) 10 5,, 2: (1), (2) 4,, 3 5(, MFCC,,( 4 000 8 000 Hz),,,, 4 MFCC, [ 9210 ],,,,, Mel Mel,,,, (DCT), MFCC ( PSPWMFCC) 4MFCC Fig. 4 H igh frequency weighted MFCC extraction,4 000 8 000 Hz A e f (A, f, A e f > 1),, A,,A 1. 1 1. 2 2. 010, 10

3 : MFCC 671,161. 3, 5,A 1. 4, 5A Fig. 5 Spectral distances for different values of A 6 NTT, 20, 10, 10, 10, 0 db 10 db 20 db 30 db 40 db, MFCC GMM, 0 db 5 db 10 db,,, 6 MFCC MFCC (% ), 3, MFCC,MF2 CC 6,, 0 db 5 db 10 db,, 6 Fig. 6 Recognition rates in different noise environments 7 MFCC,,,, Mel, Mel,,, MFCC,,, [ 1 ] GALES M F J. Predictive model2based compensation schemes for robust speech recognition [ J ]. Speech Com2 munication, 1998, 25 ( 123) : 49274. [ 2 ] W E INSTE IN E, OPPENHE IM A V, FEDER M, et al. Iterativeand sequential algorithm s for multisensor signal enhancement[ C ]. IEEE Trans. on Signal Processing, 1994, 42 (4) : 8462859. [ 3 ] XU T, CAO Z G. Combination of feature weight and speech enhancement for robust ASR at low SNR s [ C ]. Proceedings of IEEE TENCON 02, 2002: 4412444.

672 2 9 [ 4 ] DAV IES S B, MERMELSTE IN P. Comparison of para2 metric rep resentations for monosyllabic word recognition in continuously spoken sentences[ C ]. IEEE Trans. A2 coustics, (4) : 3752366. Speech and Signal Processing, 1980, ASSP228 [ 5 ],,. [M ]. :, 2003: 236. CA IL H, HUANG D ZH, CA I R. Groundwork and ap2 p lication of modern speech technology[m ]. Beijing: Ts2 inghua University Publishing House, 2003: 236. [ 6 ] KIM S, ER IKSSON T. A p itch synchronous feature ex2 traction method for speaker recognition [ C ]. IEEE, A2 coustics, Speech and Signal Processing Proceedings, 2004, 1: 4052408. [ 7 ],. [M ]. :, 2003: 79. YI K CH, TIAN B. Speech signal p rocessing[m ]. Bei2 jing: National Defence Industry Publishing House, 2003: 79. [ 8 ]YANG L P, GONG W G. Multi2SNR GMM s2based noise2 Robust speaker verification using 1 / fnoises [ C ]. IEEE, The 18th International Conference on Patter Recognition, 2006, 4: 2412244. [ 9 ],. [ J ]., 1998, 19 (10) : 27231. BAO CH CH, FAN CH X. Pitch detection algorithm based on normalized cross2correlation function [ J ]. nal of Communication 1998, 19 ( 10) : 27231. Jour2 [ 10 ],. [M ]. :, 2003: 60266. YI K CH, TIAN B. Speech signal p rocessing[m ]. Bei2 jing: National Defence Industry Publishing House, 2003: 60266., 2007,, : 150, 400700 : 023268348536; E2mail: cdw869@163. com Chen D i got master degree from College of Op toelectronic En2 gineering, Chongqing University, China in 2007. H is research areas are speech recognition. He is an engineer in Chongqing De2 partment, Coal Science Research Institute. Address: 150, Long Feng Er Cun, Beibei D istrict, Chongqing 400700, China Tel: + 86223268348536; E2mail: cdw869@163. com, 1996 (), : A1303, 400044 Gong W e iguo, Technology, Japan in 1996. PhD, obtained PhD from Tokyo Institute of He is a p rofessor and supervisor for PhD candidate in College of Op toelectronic Engineering, Chongqing University, China. H is research areas are pattern rec2 ognition, machine vision, system. intelligent information technology and Address: 1303, main building, A district, Chongqing University, Chongqing 400044, China