ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

Similar documents
Hidden Markov Modelling

Hidden Markov Model and Speech Recognition

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

A New OCR System Similar to ASR System

STA 414/2104: Machine Learning

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

STA 4273H: Statistical Machine Learning

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Estimation of Cepstral Coefficients for Robust Speech Recognition

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Lecture 3: ASR: HMMs, Forward, Viterbi

SPEECH ANALYSIS AND SYNTHESIS

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

The Comparison of Vector Quantization Algoritms in Fish Species Acoustic Voice Recognition Using Hidden Markov Model

Hidden Markov Models

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Brief Introduction of Machine Learning Techniques for Content Analysis

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Hidden Markov Model Based Robust Speech Recognition

Hidden Markov Models and Gaussian Mixture Models

c 2014 Jacob Daniel Bryan

Markov processes on curves for automatic speech recognition

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models

Graphical models for part of speech tagging

Introduction to Markov systems

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

An Evolutionary Programming Based Algorithm for HMM training

Computational Genomics and Molecular Biology, Fall

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

CS 188: Artificial Intelligence Fall 2011

1. Markov models. 1.1 Markov-chain

Artificial Intelligence Markov Chains

Temporal Modeling and Basic Speech Recognition

Lecture 11: Hidden Markov Models

Robust Speaker Identification

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

CSCE 471/871 Lecture 3: Markov Chains and

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Hidden Markov Models and Gaussian Mixture Models

ADVANCED SPEAKER RECOGNITION

L11: Pattern recognition principles

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Dept. of Linguistics, Indiana University Fall 2009

Chapter 9 Automatic Speech Recognition DRAFT

order is number of previous outputs

[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Hidden Markov models

Statistical NLP: Hidden Markov Models. Updated 12/15

Hidden Markov Models Part 2: Algorithms

Graphical Models Seminar

Speech Recognition HMM

A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces

Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

Automatic Speech Recognition (CS753)

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition

CS838-1 Advanced NLP: Hidden Markov Models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Pattern Recognition and Machine Learning

Hidden Markov Models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

TinySR. Peter Schmidt-Nielsen. August 27, 2014

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

Lab 9a. Linear Predictive Coding for Speech Processing

Multiscale Systems Engineering Research Group

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

Automatic Phoneme Recognition. Segmental Hidden Markov Models

Evaluation of the modified group delay feature for isolated word recognition

Expectation Maximization (EM)

Machine Learning for natural language processing

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Statistical Methods for NLP

HMM part 1. Dr Philip Jackson

Hidden Markov Models

Transcription:

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition is always looked upon as a fascinating field in human computer interaction. It is one of the fundamental steps towards understanding human cognition and their behaviour.this report explicates the theory and implementation of ASR, which is a speaker-dependent real time isolated word recognizer.the major logic used was to first obtain the feature vectors using LPC which was followed by vector quantization. The quantized vectors were then recognized by a suitable modeling technique namely HMM.In the recognition phase the Baum Welch algorithm was used. However it was soon realized that unless some normalization or scaling was carried out the results were highly inaccurate. The paper proposes certain significant modifications to the already existing scaling algorithms.these modifications were brought about after an extensive research work. The results suggest that optimal scaling computations significantly improve the recognition. The schema proposes this modified algorithm which leads to a new insight in the speech recognition techniques. Speech Recognition, LPC, VQ, HMM, Language Processing 1. INTRODUCTION Contemporary ASR systems are composed of a feature preprocessing stage, which aims at extracting the linguistic message while suppressing non-linguistic sources of variability, and a classification stage (including language modelling), that identifies the feature vectors with linguistic classes. The ultimate goal is to estimate the sufficient statistics to discriminate among different phonetic units while minimizing the computational demands of the classifier.

2 Mayukh Bhaowal and Kunal Chawla Steps in ASR : 1. convert audio/wave files to sequences of multi-dimensional feature vectors. (eg. DFT, PLP,LPC, etc) 2. quantize feature vectors into sequences of symbols (eg. VQ) 3. train a model for each recognition object (ie. word,phoneme) from the sequences of symbols. (eg. HMM) 4. constrain models using grammar information. Paper organization: Implementation section gives the details of the inplementation and theories used to achieve the result.the section of results and conclusions deal with discussions of the results of the tests conducted. 2. IMPLEMENTATION 2.1 Noise Elimination and Word boundary detection : We have to isolate the word utterance from the starting and trailing noise. This was done by using Energy Threshold comparison method. Whenever, the energy in a frame of speech exceeds a certain threshold, we can mark this point as the start of seech. 2.2 Pre emphasis The digitized (sampled) speech signal s(n) is put through a low order digital system to spectrally flatten the signal. The first order filter used had the transfer function H(z) = 1 - az -1 where a = 0.9375. 2.3 Frame Blocking In this step the pre-emphasised speech signal is blocked into frames of N samples,with adjacent frames being separated by M samples.thus frame blocking s done to reduce the mean squared predictation error over a short segment of the speech wave form. 2.4 Windowing The pre emphasized speech is then blocked into frames by using Hamming windows. Hamming windows of length 256 was used. To have a smooth estimate we need more windows. So, an overlap of 156 samples was also incorporated. The hamming window used was [1] w(n) = 0.54-0.46cos(2 * pi * n / N - 1)

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE 3 USING LPC,VQ AND HMM 2.5 Autocorrelation Analysis Each frame of the windowed signal is then autocorrelated.the highest autocorrelation value p is the order of the LPC analysis. 2.6 LPC Analysis The next processing step is the LPC analysis,which converts each frame of p+1 autocorrelatons[1] into an LPC parameter set in which the set consists of LPC coefficients. The formal method of converting from autocorrelation coefficients to a LPC parameter set is known as DURBIN s method. 2.7 Cepstral Coefficients Extraction Cepstral coefficients are the coefficients of the fourier transform representation of the log magnitude spectrum. These are more robust and reliable than the LPC coefficients. The cepstral coefficients can be estimated from the LPC. 2.8 Paramter Weighting Low order cepstral coefficients are sensitive to overall spectral slope and higher order spectral coefficients are sensitive to noise. So, it has become a standard technique to weight the cepstral coefficients by a tapered window so as to minimize the sensitivities. c^ = w(m) * c Figure 1. Noise Speech Detection

4 Mayukh Bhaowal and Kunal Chawla Figure 2. Original Signal, Filtered Signal and pre emphasized Signal Figure 3. Frame Blocked Signal,Windowed Signal and Auto correlated Signal 3. VECTOR QUANTIZATION The results of the feature extraction are a series of vectors characteristic of the time-varying spectral properties of the speech signal. These vectors are 24 dimensional and are continuous. We can map them to discrete vectors by quantizing them. However, as we are quantizing vectors this is Vector Quantization. VQ is potentially an extremely efficient representation of spectral information in the speech signal. 3.1 Clustering Algorithms: Assume that we have a set of L training vectors and we need a codebook of size M. A procedure that does the clustering is the K-Means clustering algorithm[3]. Initialization: Arbitrarily choose M vectors ( we can choose these from the training set ) as the initial set of code words in the codebook. Nearest-Neighbor Search: For each training vector, find the codeword in the current codebook that is closest in terms of spectral distance and assign that vector to the corresponding cell. Centorid Update: Update the code word in each cell using the centroid of the training vector assigned to the cell.

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE 5 USING LPC,VQ AND HMM Iteration: Repeat steps 2 and 3 until the average distance(distortion) falls below a preset threshold The disadvantage of this method is in the fact that we have to get a very good initial estimate of the codebook vectors. It may so happen that the random intial selection is clustered in one area of the vector space. If this happens then the final codebook will not be global. This can be a serious problem. An altenative procedure is binary split algorithm. 3.2 The Binary Split algorithm [3]. Design a 1-vector codebook; this is the centroid of the entire set of training vectors. Double the size of the codebook by splitting each current codebook y according to the rule y+ = y(1 + e) y- = y(1 - e) where n varies from 1 to the current size of the codebook, and e is a splitting parameter chosen in the range 0.001 <= e <= 0.05.We chose e=.001 Use the K-Means iterative algorithm to get the best set of centroids for the split codebook. Iterate steps 2 and 3 until the required size of codebook is reached. Figure 4. Partitioned Vector Space 4. HMM ( HIDDEN MARKOV MODEL) A hidden Markov model is defined as a pair of stochastic processes(x,y). The X process is a first order Markov chain, and is not

6 Mayukh Bhaowal and Kunal Chawla directly observable, while the Y process is a sequence of random variables taking values in the space of acoustic parameters, or observations. Two formal assumptions characterize HMMs as used in speech recognition. The first-order Markov hypothesis states that history has no influence on the chain's future evolution if the present is specified, and the output independence hypothesis states that neither chain evolution nor past observations influence the present observation if the last chain transition is specified. Letting y Υ be a variable representing observations and i,j Χ be variables representing model states, the model can be represented by the following parameters: A {a i, j i, j Χ} transition probabilities. B {b i, j i, j Χ} output distributions {π i i Χ} initial probabilities with the following definitions: i, p( Χ = j X t = i) i (y) p( = y X = I, π p(x = i) a j t 1 b, j Υt t 1 i 0 Χ t = j) A - Transition Probability matrix (N x N) B - Observation symbol Probability Distribution matrix (N x M) PI - Initial State Distribution matrix (N x 1) where N = Number of states in the HMM M = Number of Observation symbols ASR used a feed-forward ( or Bakis model ) of HMM for recognization.the model of obtaining the general HMM out of the models ` of individual utterances can be critical to the recognition level and the speed of learnng of the model.a Hidden Markov Model is a Finite State Machine having a fixed number of states. [1] Figure 5. Finite State Machine for HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE 7 USING LPC,VQ AND HMM 4.1 HMMs and Speech Recognition In order to apply HMMs for speech recognition, we need to address 3 problems [1][2] 4.1.1 PROBLEM 1 Given the observation sequence O = {o1,o2,o3,o4,...} and the model L = (A,B,PI), how do we efficiently compute P(O L), the probability of the observation sequence, given the model? This problem is solved by calculating forward and backward variables. The Forward procedure [1] is explicated in the algorithm given below : The Backward Procedure[1] 4.1.2 PROBLEM 2 Finding the optimal sequence associated with a given observation. In this case We want to find the most likely state sequence for a given sequence of observations, Ο = o 1, o2,..., oτ and a model, λ = ( A, B,π ). The solution to this problem depends upon the way most likely state sequence'' is defined. One approach is to find the most likely state q t at t=t

8 Mayukh Bhaowal and Kunal Chawla and to concatenate all such ' q t 's. But some times this method does not give a physically meaningful state sequence. Therefore we would go for another method which has no such problems. In this method, commonly known as Viterbi algorithm, the whole state sequence with the maximum likelihood is found. Viterbi Algorithm[2][1] finds the single best sequence q for the given observation sequence O. The following equations are presented which is the Viterbi algorithm. The Veterbi algorithm is as follows : 4.1.3 PROBLEM 3 This is the problem of parameter estimation. This by far is the toughest problem of HMM. There is no way to analytically solve for the model parameters set that maximizes the probability of the observation sequence in a closed form. Baum welch algorithm is the solution to this problem[1].in this algorithm the parameters are recalculated as : 4.2 Scaling Computation α () () Because t i and β t i are probabilities between 0 and 1 and a large amount of multiplications are computed,the output probability decrease exponentially to 0 when the length sequence T increases.then,we need to scale the forward and the backward vatiables to avoid ^ underflow ^ in computation. The new forward and backward variables are : α t () i β t () i and we define a scale coefficient variable as :

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE 9 USING LPC,VQ AND HMM c t = N t = 1 1 α t () i Using these definitions the scaled forward algorithm is: Initialization 2. Recursion 3. Termination Because the denominator of scales output probability is too small,we will use the probability algorithm 4.3 Scaled Backward Algorithm: 1. Initialization 2. Recursion The scaled Baum Welch algorithm is modified as:

10 Mayukh Bhaowal and Kunal Chawla 5. RESULTS AND DISCUSSIONS The training set for the vector quantizer was obtained by recording the utterances of a set country names.the recordings were done for three male speakers. The recognition vocabulary consisted of the names (India,Spain,Germany,Zambia,Mexico).The results obtained are shown in the table below : Table 1. Results Word Speaker 1 Speaker 2 Speaker 3 India 60% 64% 64.5% Spain 97% 99.2% 98.3% Germany 95.5% 91% 92% Zambia 65% 70% 61.4% Mexico 98% 99% 97.23% Much of the error can be attributed to the presence of plosives in the beginning and end of some of the words.for example India and Zambia are similar sounding,have same vowel part but differs only in their unvoiced beginning..hence it recorded a low 65% accuracy. Words like Spain and Mexico being different from others recorded very high percentage 6. CONCLUSION In the successful implementation the results were found to be satisfactory considering less number of training data under different and varied conditions. The accuracy of the real time system can be increased can be increased significantly by using an improved speech detection/noise elimination algorithm.further improvement can be also achieved by a better VQ codebook design, with training set of utterances from a large number of speakers with variation of age and accent. The scaling computation which was a part of some original work modifies the present scaling algorithms to ameliorate the results further and to remove the inaccuracies in the results. 7. REFERENCES [1] L.R.Rabiner and B.H.Juang, Fundamentals of speech recognition, Prentice Hall (Signal Processing series ) 1993. [2] Richard O.Duda,Peter E.Hart,David G.Stork,Pattern Classification,John Wiley & Sons(ASIA) Pte Ltd. [3] Y.Linde,A.Buzo and R.M.Gray, An algorithm for vector quantizer design,ieee Trans.COM-28,January 1980.