The Electron ic PSC Testing System

Similar documents
Presented By: Omer Shmueli and Sivan Niv

Chinese Journal of Scientific Instrument. High frequency we ighted M FCC extraction for noise robust speaker ver if ication

Hidden Markov Model and Speech Recognition

A discussion on methodologies for research into complex system s

Journal of Beijing University of Aeronautics and A stronautics PCNN, PCNN. Nove l adap tive deno ising m e thod fo r extrem e no ise ba sed on PCNN

Study on disturbance torques compensation in high precise servo turn table control system

Deep Learning for Speech Recognition. Hung-yi Lee

N-gram N-gram Language Model for Large-Vocabulary Continuous Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Geoffrey Zweig May 7, 2009

Automatic Speech Recognition (CS753)

Feature-Space Structural MAPLR with Regression Tree-based Multiple Transformation Matrices for DNN

Robust Sound Event Detection in Continuous Audio Environments

JOURNAL OF NATURAL RESOURCES Mar., 2008 : X24 : A : (2008) : ; : : ( )

MAP adaptation with SphinxTrain

Vol112, No11 Feb1, 2010 JOURNAL OF GEO2INFORMATION SC IENCE , CBERS IRS - P5, ;, : ; : E2mail: lreis1ac1cn [ 6-13 ]

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION USING FUZZY LOGIC CONTROL FOR EIGENVOICE-BASED SPEAKER ADAPTATION.

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Mixtures of Gaussians with Sparse Structure

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Full-covariance model compensation for

M odeling and sim ula ting the forag ing system in multi2source groups w ith random d isturbances

A Low-Cost Robust Front-end for Embedded ASR System

Comparing linear and non-linear transformation of speech

Dominant Feature Vectors Based Audio Similarity Measure

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Results as of 30 September 2018

Spacec raft au tom a tic te st and spacecraft te st language

Why DNN Works for Acoustic Modeling in Speech Recognition?

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

w h e r e e v e r t h e y live. It is an i n d u s t r i a l i z e d form of t e a c h i n g and

Multi-level Gaussian selection for accurate low-resource ASR systems

Hidden Markov Modelling

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Robust Speaker Identification

Pattern Recognition Applied to Music Signals

O verv iew on Con trol Stra teg ies of Brushless D oubly - Fed M ach ines. L IU Hang - hang, HAN L i

Recent Developments in Statistical Dialogue Systems

Lecture 5: GMM Acoustic Modeling and Feature Extraction

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

China Academic Journal Electronic Publishing House. All rights reserved JOURNAL OF NATURAL RESOURCES Aug, 2009

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Rasch , 40 (9) : ,,, ,,,, B A cta Psychologica S in ica DO I: /SP. J

End-to-end Automatic Speech Recognition

Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea



( Stationary wavelet transform, SW T) [ 5 ]

Dynamic Time-Alignment Kernel in Support Vector Machine

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Model-Based Approaches to Robust Speech Recognition

M itchelson R L , (Wolfson index) ( Tsui - W ang index) : ; : : ( ) :,, E - mail: edu.

Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks

1., X37 A (2009)

c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

Compound rotor position self2sen sing method of PM SM

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

An Evolutionary Programming Based Algorithm for HMM training

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

I zm ir I nstiute of Technology CS Lecture Notes are based on the CS 101 notes at the University of I llinois at Urbana-Cham paign

GMM-Based Speech Transformation Systems under Data Reduction

Proc. of NCC 2010, Chennai, India

A L A BA M A L A W R E V IE W

M odeling and simulation of power assembly for single2axle para llel hybr id electr ic veh icles

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Monaural speech separation using source-adapted models

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

[ 4 ], [ 13 ], [ 3 ] [ 5 ] [ 7 ] China Academic Journal Electronic Publishing House. All rights reserved.

4A (Automatized A t2 mospheric Absorp tion A tlas) , 4A, NOVELTIS Laboratoire de. MetOp 4A /OP 3 IASI, AR ID LAND GEOGRAPHY Jan.

A Direct Criterion Minimization based fmllr via Gradient Descend

QUATERNARY SC IENCES

INFRARED TARGET EXTRACTION ALGORITHM BY USING PARTICLE SWARM OPTIM IZATION PARTICLE FILTER

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

Hierarchical Multi-Stream Posterior Based Speech Recognition System

, kw, kw 3176%,, JOURNAL OF NATURAL RESOURCES Aug., , : F42612 : A : (2009)

Generalized Cyclic Transformations in Speaker-Independent Speech Recognition

A Comparative Study of Histogram Equalization (HEQ) for Robust Speech Recognition

A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY

WaveNet: A Generative Model for Raw Audio

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Dept. of Linguistics, Indiana University Fall 2009

( name, ), 1 ( a), (p lay2scrip t) ( act) ( b),

Pattern Classification

COMPILATION OF AUTOMATA FROM MORPHOLOGICAL TWO-LEVEL RULES

Hidden Markov Models. Dr. Naomi Harte

Double closed2control of active filter using repetitive algorithm

Symmetric Distortion Measure for Speaker Recognition

Detection-Based Speech Recognition with Sparse Point Process Models

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

BLACK BOX OPTIMIZATION FOR AUTOMATIC SPEECH RECOGNITION. Shinji Watanabe and Jonathan Le Roux

ASPEAKER independent speech recognition system has to

Electron ic pole changing techn ique of multi2phase induction motor

Lecture 3: ASR: HMMs, Forward, Viterbi

Transcription:

20 6 JO URAL O F CH IESE IFO RM AT IO PROCESS IG Vol120 o16 : 1003-0077 (2006) 06-0089 - 08,,, (, 230027) : 100,,, 500,, (2144)(2130) :; ;; ; ; : TP391: A The Electron ic PSC Testing System W E I Si, L IU Q ing2sheng, HU Yu, WAG Ren2hua ( Man Machine Voice Communication Laboratory, University of Science&Technology of China, Hefei, Anhui 230027, China) Abstract: This paper develop s an automatic PSC testing system aim ing at efficiently evaluating the spoken Chinese. On the basis of 100 hoursstandard Chinese database, this paper uses the characteristic of Chinese and linguist s expert knowledge to op tim ize the traditional speech evaluation algorithm. A t the same time, a corpus2adap tive method is p ro2 pose to enhance the robustness and performance of the algorithm. Experiments on 500 personspsc testing database p rove that the new algorithm is much better than the original algorithm. After linear mapp ing, the error between the machine score and the human score is almost equal to the error between humans, that is 2144. The result indicates that the automatic PSC testing system can rep lace the human to evaluating spoken Chinese under text2dependent con2 dition. Key word: computer app lication; Chinese information p rocessing; Putonghua shuip ing ceshi; p ronunciation evalua2 tion; PSC testing database, automatic testing system. 1,,,,,, 100,,,, : 2005211202 : 2006206219 : ( ZD I105 - B02) : (1982),,,. 89

( SR I) V ILT [ 1, 2 ], SR I SC ILL [ 3, 4 ], V ICK [ 5, 6 ],,, [ 9, 10 ],,,,,,,,,,,,(2144) (2130),,, 2,, 1 1 1,,,, 90

3 311,,,,,, 16K, 16bit 1 1 30 15 15 203025 305 3000/ 4500/ 60/( 400) 3 /, 100,,,, 312,,,,, 16K, 16bit 2 2 500 236 290 259 251 120 84 3 223 277 6% 71% 23%,,, 3, 313 ( ),,264,236 3 A, B { S i, i = 1, 2,, n},(1) : [ (S A i - S A ) (S B i - S B ) ] (S A i - S A ) 2 [ (S B i - S B ) 2 (1) 91

, S A i A i, S B i B i, S A A, S B B 4 ( )/ 1 2 3 1 2 3 (110, 010) / (110, 010) (0191, 1188) / ( 0190, 1197) (0188, 2154) / ( 0189, 2147) (0191, 1188) / ( 0190, 1197) (110, 010) / (110, 010) (0191, 2119) / ( 0189, 2147) (0190, 2120) / (0189, 2130) (0188, 2154) / ( 0189, 2147) (0191, 2119) / ( 0189, 2147) (110, 010) / (110, 010) 4,018,3 4,, 411 HMM, 25m s, 10m s MFCC,39 HMM, TO P O T, P O T O HMM T,HMM [ 13, 5 ] O TP T O,, P P ( T O ) T O (2) [ 3 ] = ( log ( P ( T i O ( T i ) ) ) / F ( T i ) ) / = ( log ( P (O ( T i ) T i ) P ( T i ) qq P (O ( T i ) q) p ( q) ) / F ( T i ) ) / (2) P (O ( T i ) T i ) ( log ( max qq P (O ( T i ) q) ) / F ( T i ) ) /, Q, qt i, F ( T i )T i,, P (O ( T i ) T i ) T i O ( T i ), : 0158,0188 412 41211 92 (2),,

,,,,,, (3) : P ( T O ) = ( log ( P ( T i O ( T i ) ) ) / F ( T i ) ) / = ( log ( P (O ( T i ) T i ) P ( T i ) T qq i error P (O ( T i ) q) p ( q) ) / F ( T i ) ) / (3) P (O ( T i ) T i ) ( log ( max T qq i P (O ( T i ) q) ) / F ( T i ) ) / error (3) (2), ( 2), (3) [ 7 ], 41212, [ 8 ],(4) G sen t = G i / G i = G i in itia l + G i fina l G sent, G i i G i in itia l i G i fina l i,,, (5) G sen t = i / G G i = G i in itia l (1 + D u ri f ina l D u r i in itia l CO EF) + G i fina l G sen t, G i i, D ur i fina l i, D ur i in itia l i CO EF,, CO EF,, 41213,,,, MLLR (Maximum L ikelihood L ine2 ar Regression) [ 11 ], MLLR,,,,,,, : (4) (5) 93

,, T i HTKHMM,,, (6), T i THR ESH i T i < THR ESH i THR ESH,,, MLLR, 5, 511 (3),,,, HMM,, 5 5 / 0165 /0161 0177 /0173 512 (6) 5,,,,,, [ 8 ],, 6, HMM, /, 6 0177 /0173 0181 /0177 6,,[ 8 ] 513 41213,7 7, / HMM, 0177 /0173 0178 /0173 0182 /0179 94

7,, 514,,, 8: HMM,,, 8 8 (/) + + VS 0165 /0161 0183 /0181 VS 0190 /0189 8,, 6,,,,,,,, :,, (7) S core m ach ine S core m ach ine S core m ach ine = 3 = 3 = 3 1i P ( o i ) + S core 4 C = 2i P ( o i ) + S core 4 C = 3i P ( o i ) + S core 4 C, P ( o i )i,, 1 i, Score 4, C, Score m ach ine 9 9 ( )/ + + VS (0183, - ) / (0181, - ) (0195, 1128) / (0184, 2144) VS (0190, 2120) / (0189, 2130) 9,,,, (7) 95

,, (2144)(2130) 7,,,,,,,, 0165 /0161 (/, )0183 /0181,,, 0195 /0184,1128 /2144, 0190 /01892120 /2130,,,,,,, : [ 1 ] H. L. Franco, L. eumeyer, Y. Kim, O. Ronen. Automatic p ronunciation scoring for language instruction[a ]. ICASSP[ C ], 1997, 1465-1468. [ 2 ] L. eumeyer, H. Franco, V. D igalakis, M. W eintraub. Automatic scoring of p ronunciation quality. Speech Communication 30 [ J ], 2000, 83-93. [ 3 ] S. M. W itt, S. J. Young. Phone2level p ronunciation scoring and assessment for interactive language learning [A ]. In: Speech Communication 30, 2000, 95-108. [ 4 ] S. M. W itt, U se of speech recognition in computer2assisted language learning, Doctor s D issertation of Cam2 bridge[d ], 1999. [ 5 ] C. Cucchiarini, F. D. W et, H. Strik, L. Boves. A ssessment of Dutch p ronunciation by means of automatic speech recognition technology[a ]. ICSLP, Vol. 5 [ C ], 1998, 1739-1742. [ 6 ] C. Cucchiarini, H. Strik, L. Boves. Automatic evaluation of dutch p ronunciation by using speech recognition technology[a ]. Proceedings of the IEEE workshop ASRU [ C ], Santa Barbara. 1997, 622-629. [ 7 ] A ijun L i, Xia W ang, A Contrastive Investigation of Standard Mandarin and Accented [A ]. EuroSpeech [ C ], 2003, 1139-1142. [ 8 ],,,. [A ]. [ C ], 2005, 22-25. [ 9 ],. [A ]. [ J ], 1998, 48-53. [ 10 ],,. [A ]. [ C ], 2005, 26-30. [ 11 ] C. J. Leggetter, P. C. Woodland, Maximum L ikelihood L inear Regression for Speaker Adap tation of Contin2 uous Density H idden M arkov Models, Computer Speech and Language[ J ], 1995, 171-185. 96