Chinese Journal of Scientific Instrument. High frequency we ighted M FCC extraction for noise robust speaker ver if ication

Similar documents
Journal of Beijing University of Aeronautics and A stronautics PCNN, PCNN. Nove l adap tive deno ising m e thod fo r extrem e no ise ba sed on PCNN

( Stationary wavelet transform, SW T) [ 5 ]

China Academic Journal Electronic Publishing House. All rights reserved JOURNAL OF NATURAL RESOURCES Aug, 2009

China Academic Journal Electronic Publishing House. All rights reserved.

The Electron ic PSC Testing System

Study on disturbance torques compensation in high precise servo turn table control system

Voice Activity Detection Using Pitch Feature

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

(2009) Journal of Rem ote Sensing (, 2006) 2. 1 (, 1999), : ( : 2007CB714402) ;

Dominant Feature Vectors Based Audio Similarity Measure

Improving estimations of a robot s position and attitude w ith accelerom eter enhanced odometry

Feature extraction 2

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

Automatic Speech Recognition (CS753)

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

TERAHERTZ WAVE REFLECTION IMAGING SYSTEM BASED ON BACKWARD WAVE OSC ILLATOR AND ITS APPL ICATION

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Simulation of PM SM Vector Control System Based on MATLAB / SIMUL I NK. ( Permanent M agnetic Synchronization Motor) has a w ide app li2

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

QUATERNARY SC IENCES

Full-covariance model compensation for

674 JVE INTERNATIONAL LTD. JOURNAL OF VIBROENGINEERING. MAR 2015, VOLUME 17, ISSUE 2. ISSN

Double closed2control of active filter using repetitive algorithm

Time-delay feedback control in a delayed dynamical chaos system and its applications

[ 4 ], [ 13 ], [ 3 ] [ 5 ] [ 7 ] China Academic Journal Electronic Publishing House. All rights reserved.

Speech Signal Representations

Con struction and applica tion of m odeling tendency of land type tran sition ba sed on spa tia l adjacency

Feature extraction 1

我国一次能源消费的人均碳排放重心 移动及原因分析

A Low-Cost Robust Front-end for Embedded ASR System

On the a ssessm en t standards for nutr ition sta tus in the Three Gorge Reservo ir

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Lecture 7: Feature Extraction

Effect of different foliar fertilizers on growth of Capsicum annuum L.

Robust Speaker Identification

Presented By: Omer Shmueli and Sivan Niv

ERROR MODEL FOR SPATIAL SPECTRUM ESTIMATION OF M ILL IM ETER2WAVE THERMAL RAD IATION ARRAY

arxiv: v1 [cs.sd] 25 Oct 2014

1. 1 M oo 3-TiO 2gSiO 2. Perk in E lm er2l am bda 35 UV 2V is Spectrom eter : E2m ail: tp ṫ tj. cn

An Evolutionary Programming Based Algorithm for HMM training

Face Recognition Using Global Gabor Filter in Small Sample Case *

JOURNAL OF NATURAL RESOURCES Mar., 2008 : X24 : A : (2008) : ; : : ( )

Harmonic Structure Transform for Speaker Recognition

: O646 : A (DFAFC),,,, DFAFC. PdCl 2, Pd2NH 3, H 2. CO 2,. : Pd, Vol. 15 No. 4 Nov ELECTROCHEM ISTRY : (2009)

Laser on-line Thickness Measurement Technology Based on Judgment and Wavelet De-noising

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

NLFM Interference Suppressing Based on Time-Varying AR Modeling and Adaptive IIR Notch Filter

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Discriminant Feature Space Transformations for Automatic Speech Recognition

Fast RANSAC with Preview Model Parameters Evaluation

A discussion on methodologies for research into complex system s

34 12 Journal of Sou th C hina U n iversity of Technology V ol. 34 N o. 12. : SimSci/PRO.

A comparative study of time-delay estimation techniques for convolutive speech mixtures

D ynam ic S im ula tion of the A ir2cond ition ing System w ith Inverter Ba sed on the M ov ing2boundary M odel

Designing a quality gain-loss function for smaller-the-better characteristic under not neglecting the linear term loss 1

Geometric Predicates P r og r a m s need t o t es t r ela t ive p os it ions of p oint s b a s ed on t heir coor d ina t es. S im p le exa m p les ( i

Growth of A lgan f ilm s w ith d ifferen t A l fraction on A ln tem pla te

Digital Signal Processing

Statistical NLP Spring Digitizing Speech

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Logging characteristic analysis of basalt in eastern depression of Liaohe Oilfield

10 2 ( ) Vol. 10 No. 2

IMPROVEMENT OF RECIPROCITY MEASUREMENTS IN ACOUSTICAL SOURCE STRENGTH

HSCCC, (H igh - Speed Countercurrent Chro2 matography, HSCCC) 2 ,, HSCCC, HPLC HSCCC

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Spacec raft au tom a tic te st and spacecraft te st language

Four-dimensional hyperchaotic system and application research in signal encryption

Electron ic pole changing techn ique of multi2phase induction motor

Research into effects of rural land transferring on dual econom ic structure improvem en t

Vol112, No11 Feb1, 2010 JOURNAL OF GEO2INFORMATION SC IENCE , CBERS IRS - P5, ;, : ; : E2mail: lreis1ac1cn [ 6-13 ]

1845. A novel approach for the evaluation of frequency-band loss factor based on last decay rate of vibrational amplitude

A new structure for nonlinear narrowband active noise control using Volterra filter

An Improved Blind Spectrum Sensing Algorithm Based on QR Decomposition and SVM

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

Curriculum Vitae Wenxiao Zhao

China Academic Journal Electronic Publishing House. All rights reserved.

Frog Sound Identification System for Frog Species Recognition

Robust Sound Event Detection in Continuous Audio Environments

Features of Acidic Gases in Background A ir of Yangtze River Delta

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

EL ECTR IC MACH IN ES AND CON TROL. Study on rotor broken2bar fault in induction m otors based on spectrum analysis of Hilbert modulus

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS

Research and Application of Sun Shadow Location Technology in Video Big Data

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

Efficient Use Of Sparse Adaptive Filters

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Model-Based Approaches to Robust Speech Recognition

Consolidation properties of dredger fill under surcharge preloading in coast region of Tianjin

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

JOURNAL OF CATASTROPHOLOGY : X43 : A : X (2008) ,,,

Modifying Voice Activity Detection in Low SNR by correction factors

Proc. of NCC 2010, Chennai, India

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Roots Blower with Gradually Expanding Outlet Gap: Mathematical Modelling and Performance Simulation Yingjie Cai 1, a, Ligang Yao 2, b

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization

Detection-Based Speech Recognition with Sparse Point Process Models

Transcription:

29 3 20083 Chinese Journal of Scientific Instrument Vol129 No13 Mar. 2008 M FCC 1, 1, 2 (1 400044; 2 400044) : MFCC Mel,,,,, MFCC,,, : ; ; ; ; MFCC : TP192. 3 : A: 520. 2040 High frequency we ighted M FCC extraction for noise robust speaker ver if ication Chen D i 1, Gong W eiguo 1, L i Bo 2 ( 1 Key Laboratory for O ptoelectronic Technology and System of the Education M inistry of China, Chongqing U niversity, Chongqing 400044, China; 2 M odern Physics Centre, M aterial Science and Engineering Post2doctorial W orkstation, Chongqing 400044, China) Abstract: This paper p roposes a high frequency weighted M FCC extraction method to imp rove the perform ance of speaker verification in noise conditions. A s the M el frequency has a logarithm ic relationship w ith linear frequency, spectral resolution in high frequency domain would decline. Fram es of purely periodic speech signal can avoid har2 monic leakage, and more high frequency information would be reserved. To get speech enhancement, high frequency energy amp litude weighted method is p roposed. This method was app lied in p itch synchronous p rep rocessing M FCC feature extraction, and speaker verification experiments were conducted. The results show that the recognition rates are imp roved in several kinds of noise environm ents even when the SNR is low. Key words: high frequency weighted; speaker verification; p itch synchronous; robust; M FCC 1,,,, 3: [ 1 ] [ 2 ] [ 3 ],, : 2007204 Received Date: 2007204,,, 2,,,,

3 : MFCC 669, Mel [ 4 ] (MFCC),,,MF2 CC,Mel,, MFCC, NTT 2M el, [ 5 ], 1 000 Hz, 1 000 Hz, Mel 1 Mel1 000 Hz 1 /1 000 fmel : M el( f) = 2 595 log 10 ( 1 + f 700 ) (1) 1f, Hz, B Mel,Mel 1Mel Fig. 1 Mel frequency VS linear frequency s p ( n) s r ( n), s p ( n), s ( n) t( n), : D ( S, T) = [ log S () - log T () ] d (2) - s p ( n)t p ( n), : D ( S p, T p ) = [ log S - p () - log T p () ] d (3),, [ 7 ],, 210 ( 5, 5) s ( n), 200 t 1 ( n) t 2 ( n) t 200 ( n) : D 1 ( S, T) 200 = j =1 [ - ( log S () - log T j () ) d] (4) : S () T j () s ( n) t j ( n ) : D 2 (S, T) 200 = j =1 [ - ( log S p () - log T pj () ) d] : S p () T pj () s p ( n) t pj ( n) (5) (4 0008 000 Hz) (04 000 Hz), 2 1, Mel,, Mel, Mel, 3 MFCC, [ 6 ],s ( n) ( a) 2 Fig. 2 Spectral distances of speech

670 2 9 ) [ 8 ] 5 3 Fig. 3 Sound p ressure level in noise environments 5M FCC 2,o,+ 2 ( a),2 ( b),2 ( c) 2 (a), 10,,2 ( b),2 (c) 10 5,, 2: (1), (2) 4,, 3 5(, MFCC,,( 4 000 8 000 Hz),,,, 4 MFCC, [ 9210 ],,,,, Mel Mel,,,, (DCT), MFCC ( PSPWMFCC) 4MFCC Fig. 4 H igh frequency weighted MFCC extraction,4 000 8 000 Hz A e f (A, f, A e f > 1),, A,,A 1. 1 1. 2 2. 010, 10

3 : MFCC 671,161. 3, 5,A 1. 4, 5A Fig. 5 Spectral distances for different values of A 6 NTT, 20, 10, 10, 10, 0 db 10 db 20 db 30 db 40 db, MFCC GMM, 0 db 5 db 10 db,,, 6 MFCC MFCC (% ), 3, MFCC,MF2 CC 6,, 0 db 5 db 10 db,, 6 Fig. 6 Recognition rates in different noise environments 7 MFCC,,,, Mel, Mel,,, MFCC,,, [ 1 ] GALES M F J. Predictive model2based compensation schemes for robust speech recognition [ J ]. Speech Com2 munication, 1998, 25 ( 123) : 49274. [ 2 ] W E INSTE IN E, OPPENHE IM A V, FEDER M, et al. Iterativeand sequential algorithm s for multisensor signal enhancement[ C ]. IEEE Trans. on Signal Processing, 1994, 42 (4) : 8462859. [ 3 ] XU T, CAO Z G. Combination of feature weight and speech enhancement for robust ASR at low SNR s [ C ]. Proceedings of IEEE TENCON 02, 2002: 4412444.

672 2 9 [ 4 ] DAV IES S B, MERMELSTE IN P. Comparison of para2 metric rep resentations for monosyllabic word recognition in continuously spoken sentences[ C ]. IEEE Trans. A2 coustics, (4) : 3752366. Speech and Signal Processing, 1980, ASSP228 [ 5 ],,. [M ]. :, 2003: 236. CA IL H, HUANG D ZH, CA I R. Groundwork and ap2 p lication of modern speech technology[m ]. Beijing: Ts2 inghua University Publishing House, 2003: 236. [ 6 ] KIM S, ER IKSSON T. A p itch synchronous feature ex2 traction method for speaker recognition [ C ]. IEEE, A2 coustics, Speech and Signal Processing Proceedings, 2004, 1: 4052408. [ 7 ],. [M ]. :, 2003: 79. YI K CH, TIAN B. Speech signal p rocessing[m ]. Bei2 jing: National Defence Industry Publishing House, 2003: 79. [ 8 ]YANG L P, GONG W G. Multi2SNR GMM s2based noise2 Robust speaker verification using 1 / fnoises [ C ]. IEEE, The 18th International Conference on Patter Recognition, 2006, 4: 2412244. [ 9 ],. [ J ]., 1998, 19 (10) : 27231. BAO CH CH, FAN CH X. Pitch detection algorithm based on normalized cross2correlation function [ J ]. nal of Communication 1998, 19 ( 10) : 27231. Jour2 [ 10 ],. [M ]. :, 2003: 60266. YI K CH, TIAN B. Speech signal p rocessing[m ]. Bei2 jing: National Defence Industry Publishing House, 2003: 60266., 2007,, : 150, 400700 : 023268348536; E2mail: cdw869@163. com Chen D i got master degree from College of Op toelectronic En2 gineering, Chongqing University, China in 2007. H is research areas are speech recognition. He is an engineer in Chongqing De2 partment, Coal Science Research Institute. Address: 150, Long Feng Er Cun, Beibei D istrict, Chongqing 400700, China Tel: + 86223268348536; E2mail: cdw869@163. com, 1996 (), : A1303, 400044 Gong W e iguo, Technology, Japan in 1996. PhD, obtained PhD from Tokyo Institute of He is a p rofessor and supervisor for PhD candidate in College of Op toelectronic Engineering, Chongqing University, China. H is research areas are pattern rec2 ognition, machine vision, system. intelligent information technology and Address: 1303, main building, A district, Chongqing University, Chongqing 400044, China