Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Size: px
Start display at page:

Download "Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm"

Transcription

1 EngOpt International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm Dmitry Shalymov Department of Mathematics & Mechanics, Saint-Petersburg State University, Saint-Petersburg, Russia shalydim@mail.ru. Abstract This paper represents the using of the new simultaneous perturbation stochastic approximation algorithm (SPSA) for the solving of the noise robust isolated words recognition problem. The noise robust speech recognition method which is based on mel-frequency cepstral coefficients (MFCC) is briefly described. Main features of SPSA algorithm are shown. The effectiveness of the proposed method is demonstrated. 2. Keywords: Speech Recognition, Stochastic Optimization, Artificial Intelligence 3. Introduction Problems of the speech recognition are still important today. Many of modern methods which are used to solve this problem are computationally resource-intensive. The capacity of such resources is often bounded. For many algorithms it is impossible to use it in portable devices. This moves researches to find more effective methods. This paper represents the using of the new simultaneous perturbation stochastic approximation algorithm (SPSA) for the solving of the noise robust isolated words recognition problem. Due to SPSA s simplicity and small number of operations per each iteration, this algorithm can be used as alternative method for real time speech recognition. The noise robust speech recognition method which is based on mel-frequency cepstral coefficients (MFCC) is briefly described. Each sound-wave that entered in the recognition system includes some noise. In case of noisy measurements of loss function SPSA algorithm keeps reliable estimations under almost arbitrary noise. It is very important to the speech recognition problem where the noise represents often the phase or spectrum shifts of signal, or external environment, or recording device settings, etc. SPSA algorithm is based on trial simultaneous perturbations which provide appropriate estimations under almost arbitrary noise. The main characteristic of SPSA algorithm is that only two measurements of function to approximate loss function gradient are needed for any dimension of an unknown feather vector. Based on this characteristic it is convenient to use SPSA algorithm in speech recognition problem where feature vectors of large dimensions are used. It is simple to use this kind of algorithm in optimization problems with large number of variables. In that way we have an opportunity to operate with many words at once. Moreover its realization is simple for understanding and embedding in electronic devices. 4. Isolated words recognition problem Digital processing of acoustic signal supposes that analog speech signal is performed in digital shape. As a result of A/D transformation continuous signal is converted in sequence of discrete time intervals. Each time interval represents one value (signal measurement). This value characterizes signal in a point with a defined precision. Accuracy of representation depends on width of range of obtained numbers and hence it depends on capacity of A/D transformation. The process of numeric values extraction from signal is called quantization. The signal time intervals fragmentation is called sampling. Digital processing of acoustic signal is shown in Fig.. Fig.. Steps of acoustic signal processing The analogue acoustic signal that comes from microphone is exposed to quantization and sampling due to A/D transformation. Word achievement is occurred. It means that digital record of pronouncing word is performed in form of sequence of acoustic signal measurements {s k}. Word achievement is divided into frame sequence {X i} during digital processing. Frame X (with length N) is a sequence of acoustic signal measurements s, s 2,..., s N. Length of each frame is strictly fixed in time. For example, if N=00 and sampling rate is 8000 Hz then frame length is equal to 2.5 msec. Often frames are shifted relative to each other to prevent information losses in place of frame borders.

2 Frame shift step number of signal measurements between beginnings of two frames that follow one another. Shift step that is less than N (length of frame) means that frames are overlapped. Further, in a series of tasks such as speech recognition or personal identification, each frame is compared with several data values that characterize sound in a best way. Such data organizes feature vector (or attribute vector). From the mathematical point of view it M could be a vector of R space, a group of functions, or one function. The objective of the recognition system is to identify each word that comes in entry with one of the providential classes. Unfortunately, there is a great number of various factors that could reduce accuracy of the recognition system. For example, the mood and state of the speaker, external environment noise, the rate of phrase pronunciation etc. The recognition system is speaker independent in case of correct word recognition regardless of person who is pronouncing. It is hard to implement such system in practice because acoustic signals are strongly depend on loudness and timbre of voice, mood and state of the speaker. To extract information from such signals mel-scale filters are used quite often. These filters average spectral components of the signal in concrete frequency ranges. So the signal becomes less dependent on the speaker. Such filters lie in a base of the MFCC (Mel-Frequency Cepstral Coefficients) method. MFCC is used in the recognition system discussed in this paper. 5. Speech signal processing 5.. Preliminary filtration The speech signal should be passed through a low-frequency filter for spectral smoothing. The goal of this transformation is to reduce influence of local distortions. Low-frequency filtration is often implemented in low-level activity. Nevertheless, there are various mathematical methods that are successfully used in speech recognition problems. In the considered system no such methods were used. It is well known that most informative frequencies of human speech are concentrated in 00Hz 2KHz interval. That s why when solving problem of speech recognition as early as in initial state, only the frequencies of this interval remain in the signal spectrogram Cutting of a signal with an overlapping segments To extract feature vectors of the same length it is necessary to cut the speech signal into equal frames. After that it is necessary to make a transformation of each frame. Usually frames are selected so that they are overlapped for half of its length or for 2/3. Overlapping is used to reduce information loss in the border of the frames. Feature vector for observed region of speech signal consists of cepstral coefficients characterized for each frame separately. So if we increase frame overlapping then dimension of feature vector for entire region will be increased on default. The set of numbers that were extracted during spectral analysis of speech signal interval is called cepstral coefficients. Usually the length of the observed interval is selected so that it corresponds to ms interval Window signal processing The goal of this step is to reduce border effects that take place during segmentation process. To neutralize undesirable border effects, the speech signal s(n) is usually multiplied by w(n): x(n) = s(n)*w(n). As the w(n) function the Hamming window function is often used: 2πn cos,0 n < N w( n) = N 0, otherwise. 5.4 Feature vector extraction Each input speech signal is performed as a feature vector that characterizes the signal. There are several ways to construct the feature vector. In the discussed model we use a classical approach for cepstral coefficients. There are two possibilities to extract cepstral coefficients. One is based on Mel-Frequency Cepstral Coefficients (MFCC)[3]. The other is based on Linear Predictive Cepstral Coefficients (LPCC)[4]. MFCC is the most widespread method. Let us examine its major steps.. Input signal is broken into frames. For each frame Hamming window is used. 2. Pre-emphasis preliminary phrase selection (accentuation). It is performed by speech signal filtration with FIR (finite impulse response) filter. This is due to the necessity of spectral smoothing. It allows us to make signal less sensitive for different noises that happens while signal processing. 3. Then the spectrogram is examined. The set of frequencies that are presented in the spectrogram is divided into numbered intervals. The range of possible frequencies is strictly defined for each interval. Then average signal intensity in each interval is calculated to build a special diagram. In this diagram abscissa consists of interval numbers and ordinate axis consists of amplified amplitude values. This process is called mel-scale filtration. 4. Feature vectors are extracted with the methods based on human interpretation of sound, since a human ear interprets signal loudness in a logarithm scale. This step performs a signal amplitudes compression using the logarithm. 5. The final step is an adaptation of the Fourier inversion to spectrum. The result of this step is the cepstral coefficients extraction and feature vector construction. Cepstral coefficients could be described as follows: c n = K k= (log S( k)) e where S(k) is an averaged spectrum of signal with amplified amplitudes that characterizes frequency interval with number k in melscale filter; K is general number of intervals. ikn,

3 6. Randomized algorithm of stochastic approximation The exact solution of any problem can be found if there is a precise definition and mathematical description. But in reality the complicity of such connections and relationships make it impossible to give an exact mathematical description for many phenomena. The simply theoretic approach is to choose a mathematical model which is close to a real process and which includes different noises (disturbances). These noises represent some kind of roughness of the mathematical model from one side and represent the characteristics of outside uncontrolled perturbations of a system from the other. It is well known to specialists in the theory of the unknown parameters identification that if the noise is a deterministic unknown function or the observation noise is a probabilistically dependent sequence, then getting decisions is wrong. Then some theorists say that observation sequence is degenerate (not rich) and the solutions of such kind of problems are not studied. For the purpose of enriching information in the observation channel sometimes there is a possibility to include new simultaneous perturbation with well-known probabilistic properties into the input system channel to solve a set of problems. Sometimes the measurable random process that is already presented in a system can play a role of such simultaneous perturbation. In control systems it is natural to add the trial simultaneous perturbations (actions) through a control channel. One of the remarkable characteristics of such type of algorithms is a convergence under the almost arbitrary noise. A considerable restriction for using these algorithms is an assumption of weak correlation or independence of the measurement noise and the simultaneously perturbation which is added into the system, while there are no other assumptions about measurement noise properties. This restriction is natural in the case when the noise is generated from either an unknown, but bounded deterministic function (some unmodel dynamics). Let us suppose that there are l different words in our recognition system. Feature vectors of speech signal are input signals for SPSA algorithm. It is represented as a point in multidimensional Euclidean space. SPSA algorithm determines centers of l classes due to the classifying sequence. Each class corresponds to one of the words. Coordinates of the centers represents feature vectors of pattern words. Word is identified with class by distance between feature vector of signal and center of the class. Algorithm considered below is used to define pattern words (or class centers in the system). To recognize speech commands it is used traditional method of comparison with patterns and following minimal distance extraction. As initial class centers it is possible to take arbitrary l vectors of space. In general, the selection of words to be recognized is important. The more phonetic differences are between words, the easier its recognition. But often recognized words are conformable. That s why it is important to define centers of classes as far from each other, as it is only possible. From the mathematical point of view speech recognition problem can be reformulated as a problem of the automatic image classification. 7. Automatic image classification problem Suppose that the state-space is covered by a set of classes { X,, l X } (the number of classes is bounded with ). The k automatic image classification problem is to build a rule which for the each point x from the gives the correspondence class l belonged to { X,, X }. If several points are compared with the same class they have the common feature, and it naturally generates this class. Usually one can take as a common feature of a class the closeness to the specific center: for each point x the simple classification rule is to compare distances from the center of one class with others. To formalize classification rule the family of penalty functions (functions of cost) is considered, and a set of authenticity degree functions is defined: X Let s suppose that probability distribution is assigned in. The automatic classification problem is to find such sets of functions and vectors that minimize the mean risk functional: The state-space is divided into classes according to the rule: Indicator s functions of these classes are denoted as. We can rewrite the mean risk functional in the form where is -dimensioned vector with functions as component values and is -dimensioned vector arranged with functions values. This functional is characterized the performance of classification. Partition is optimal if the parameter minimizes the mean risk functional. Geometrically the automatic classification problem can be described by the following way. Suppose is a space of real numbers and. Penalty functions are. Each points located closer to the center are correspond to the class. The mean risk functional can be redefined: The automatic classification problem transforms into a problem of finding a set of centers which minimize

4 amount dispersal. The value of keeps unchangeable if vector transposition occurs in bundle. The usual way of some function F minimization is to find the bundle of centers for which the equation 0 is satisfied. But in the considered case the function F is not differentiable. That s why automatic classification problem solution may be not simple. 8. Trial perturbations and estimation algorithm Assume that probability distribution is unknown but we have qualifying sequence. In [] it was suggested the way to build estimation sequence which converges to the good approximation of bundle. The proposed new recursive algorithm for the classification of the huge amount of multidimensional data is based on former SPSA ideas. The new algorithms perform well in the real time environment. The SPSA algorithm is used the simultaneous trial perturbations. The main features of the SPSA algorithms are the following: the unknown function is measured not at the point of the previous estimate but at estimate's slightly excited position for all unknown vector components simultaneously, and there is the essential reduction of observations at each iteration in the multi-dimensional case. It means that necessary amount of iterations isn t increasing in comparison with a classical Kiefer-Wolfowitz procedure though number of observations is decreasing significantly. Let s penalty functions are not defined analytically. But values of these functions can be measured with some noise:. Define the as -dimensioned vector arranged with and as -dimensioned vector of noise. To build the estimation sequence of bundle we suggest using the SPSA algorithm. It is based on measurable stochastic independent vectors called trial simultaneous perturbation. These vectors consist of independent stochastic values. Let s fix the initial bundle and choose two zero-aimed sequences and. The proposed algorithm is described below:, where is -dimensioned vector arranged with functions and. are -dimensioned vectors of noise. is a set projector. 9. Practical application As an experiment, a simplified model of SPSA algorithm application for isolated words recognition problem was implemented. In Matlab 7.0. a create speaker-dependent self-qualifying system that is able to recognize four different words was created. Selection of words to be recognized is important in general. It is easier to recognize words that have many phonetic differences. To provide convergence of algorithm with penalty function q ( x, θ) = x θ it needs to satisfy special condition. Namely, distance between different classes should be greater than maximum radius of all classes. Hence it is desirable to have center of classes as far from each other as it is only possible. As initial centers of classes we could take any points of space R M. In considered recognition system feature vectors of first four different words from qualifying sequence were taken as initial centers. Let us consider part of speech signal that is correspond to one-second time interval. It consists of several frames. Each frame has 25 msec time length. So there are 40 frames in one second time interval at all. During spectral processing feature vector with dimension 24 was extracted from each frame. Spectrum was broken to 24 ranges. For each range average spectrum value was computed. Bundle of averaged spectrum values organizes feature vector. Dimension M of phase space is defined as sum of all dimensions correspond to feature vectors of frames in one-second time speech signal interval. Frame overlapping was not used. So phase space dimension M is equal to 40*24=960. For each class there were recorded more than one hundred samples that arranged qualifying sequence. Recording process was performed with 8000 KHz sampling rate and 6 bit quantization. While speech signal processing there were also used optimization methods concerned with peculiarity of microphone. Rate of algorithm convergence in practice is dependent from selection of sequences { α k n } and { β k n }. Important role in considered algorithm is played by simultaneous trial perturbations. It is not necessary to take ± accidental values. The main thing is that trial perturbations are finite and symmetric dispersed. Due to empirical issues as { α n } was taken sequence 3/n and as { β n } was taken / n. Simultaneous trial perturbations were selected as ± / 30. Convergence of considered algorithm for one word is shown in Fig. 2. In this illustration distances between input signals and approximated class center are demonstrated. Class center is approximated during SPSA algorithm launching. There were one hundred of signals entered the system. Feature vector of pattern word is correspond to class center when n=00. During feature vectors extraction some inaccuracies were permitted to simplify system implementation. In particular averaged spectrum values were roughly computed while mel-scale filtration. In spite of this it was succeeded to get 98% accuracy of recognition. To improve statistics cepstral coefficients extraction needs to be implemented in another way. 2

5 Fig. 2: SPSA algorithm convergence to the one class center 9. Conclusions This paper represents the application of the new simultaneous perturbation stochastic approximation algorithm SPSA for the solving of the noise robust isolated words recognition problem. SPSA provides appropriate estimations under almost arbitrary noise. One of its important features is the ability to retain simplicity and efficiency in spite of space dimension grows. Also it gives an opportunity to operate with many classes at once. Main steps of the isolated words recognition problem solving are described. To extract feature vectors Mel-Frequency Cepstral Coefficients was used. The recognition system accuracy is proved to be 98%. Performance of the system could be improved due to MFCC method realization improvement. 0. References. Granichin O. N. and Izmakova O. A., A Randomized Stochastic Approximation Algorithm for Self-Learning. Avtomatika i Telemekhanika, No. 8, 2005, pp Granichin O. N. and Polyak B.T., Randomized Algorithms of an Estimation and Optimization Under Almost Arbitrary Noises. M.: Nauka, Gold B., Morgan N., Speech and Audio Signal Processing. John Wiley and Sons, Inc, Rogina I., Automatic speech recognition. Carnegie Mellon University, Fomin V. N., Recursive estimation and adaptive filtration. M.: Nauka, 984

APPLYING QUANTUM COMPUTER FOR THE REALIZATION OF SPSA ALGORITHM Oleg Granichin, Alexey Wladimirovich

APPLYING QUANTUM COMPUTER FOR THE REALIZATION OF SPSA ALGORITHM Oleg Granichin, Alexey Wladimirovich APPLYING QUANTUM COMPUTER FOR THE REALIZATION OF SPSA ALGORITHM Oleg Granichin, Alexey Wladimirovich Department of Mathematics and Mechanics St. Petersburg State University Abstract The estimates of the

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Chirp Transform for FFT

Chirp Transform for FFT Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK

USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Volume 17 HYDROACOUSTICS USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Polish Naval Academy Smidowicza 69, 81-103 Gdynia, Poland a.zak@amw.gdynia.pl

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Stress detection through emotional speech analysis

Stress detection through emotional speech analysis Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or

More information

Frog Sound Identification System for Frog Species Recognition

Frog Sound Identification System for Frog Species Recognition Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

From Fourier Series to Analysis of Non-stationary Signals - II

From Fourier Series to Analysis of Non-stationary Signals - II From Fourier Series to Analysis of Non-stationary Signals - II prof. Miroslav Vlcek October 10, 2017 Contents Signals 1 Signals 2 3 4 Contents Signals 1 Signals 2 3 4 Contents Signals 1 Signals 2 3 4 Contents

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

ACOUSTICAL MEASUREMENTS BY ADAPTIVE SYSTEM MODELING

ACOUSTICAL MEASUREMENTS BY ADAPTIVE SYSTEM MODELING ACOUSTICAL MEASUREMENTS BY ADAPTIVE SYSTEM MODELING PACS REFERENCE: 43.60.Qv Somek, Branko; Dadic, Martin; Fajt, Sinisa Faculty of Electrical Engineering and Computing University of Zagreb Unska 3, 10000

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Signal Modeling Techniques In Speech Recognition

Signal Modeling Techniques In Speech Recognition Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

VID3: Sampling and Quantization

VID3: Sampling and Quantization Video Transmission VID3: Sampling and Quantization By Prof. Gregory D. Durgin copyright 2009 all rights reserved Claude E. Shannon (1916-2001) Mathematician and Electrical Engineer Worked for Bell Labs

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR

More information

Cepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials

Cepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials Cepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials Mehmet ÇALIŞKAN a) Middle East Technical University, Department of Mechanical Engineering, Ankara, 06800,

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

A New OCR System Similar to ASR System

A New OCR System Similar to ASR System A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

More information

[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY TAJWEED UTOMATION SYSTEM USING HIDDEN MARKOUV MODEL AND NURAL NETWORK Safaa Omer Mohammed Nssr*, Hoida Ali Abdelgader SUDAN UNIVERSITY

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Hidden Markov Model Based Robust Speech Recognition

Hidden Markov Model Based Robust Speech Recognition Hidden Markov Model Based Robust Speech Recognition Vikas Mulik * Vikram Mane Imran Jamadar JCEM,K.M.Gad,E&Tc,&Shivaji University, ADCET,ASHTA,E&Tc&Shivaji university ADCET,ASHTA,Automobile&Shivaji Abstract

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction: Op-amps in Negative Feedback

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction: Op-amps in Negative Feedback EECS 16A Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 18 18.1 Introduction: Op-amps in Negative Feedback In the last note, we saw that can use an op-amp as a comparator. However,

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Course content (will be adapted to the background knowledge of the class):

Course content (will be adapted to the background knowledge of the class): Biomedical Signal Processing and Signal Modeling Lucas C Parra, parra@ccny.cuny.edu Departamento the Fisica, UBA Synopsis This course introduces two fundamental concepts of signal processing: linear systems

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Last time: small acoustics

Last time: small acoustics Last time: small acoustics Voice, many instruments, modeled by tubes Traveling waves in both directions yield standing waves Standing waves correspond to resonances Variations from the idealization give

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

On the relationship between intra-oral pressure and speech sonority

On the relationship between intra-oral pressure and speech sonority On the relationship between intra-oral pressure and speech sonority Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves Interspeech 2005 1 We address the question of the relationship between

More information

Harmonic Structure Transform for Speaker Recognition

Harmonic Structure Transform for Speaker Recognition Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &

More information

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme

More information

A Model for Computer Identification of Micro-organisms

A Model for Computer Identification of Micro-organisms J. gen, Microbial. (1965), 39, 401405 Printed.in Great Britain 401 A Model for Computer Identification of Micro-organisms BY H. G. GYLLENBERG Department of Microbiology, Ulziversity of Helsinki, Finland

More information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract

More information

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power

More information

Chapter 3. Data Analysis

Chapter 3. Data Analysis Chapter 3 Data Analysis The analysis of the measured track data is described in this chapter. First, information regarding source and content of the measured track data is discussed, followed by the evaluation

More information

Echo cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa

Echo cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa Echo cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa Abstract This study concerns the mathematical modelling of speech related

More information

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter

Impulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter BIOSIGAL 21 Impulsive oise Filtering In Biomedical Signals With Application of ew Myriad Filter Tomasz Pander 1 1 Division of Biomedical Electronics, Institute of Electronics, Silesian University of Technology,

More information

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko 94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction

More information

CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals

CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University October 6, 2005 1 Sound Sound waves are longitudinal

More information

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

ODEON APPLICATION NOTE Calibration of Impulse Response Measurements

ODEON APPLICATION NOTE Calibration of Impulse Response Measurements ODEON APPLICATION NOTE Calibration of Impulse Response Measurements Part 2 Free Field Method GK, CLC - May 2015 Scope In this application note we explain how to use the Free-field calibration tool in ODEON

More information

Verification of contribution separation technique for vehicle interior noise using only response signals

Verification of contribution separation technique for vehicle interior noise using only response signals Verification of contribution separation technique for vehicle interior noise using only response signals Tomohiro HIRANO 1 ; Junji YOSHIDA 1 1 Osaka Institute of Technology, Japan ABSTRACT In this study,

More information

AdaptiveFilters. GJRE-F Classification : FOR Code:

AdaptiveFilters. GJRE-F Classification : FOR Code: Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Text Independent Speaker Identification Using Imfcc Integrated With Ica

Text Independent Speaker Identification Using Imfcc Integrated With Ica IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc

More information

2.161 Signal Processing: Continuous and Discrete Fall 2008

2.161 Signal Processing: Continuous and Discrete Fall 2008 IT OpenCourseWare http://ocw.mit.edu 2.6 Signal Processing: Continuous and Discrete Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. ASSACHUSETTS

More information

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

Analysis of methods for speech signals quantization

Analysis of methods for speech signals quantization INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways

Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways Marsland Press Journal of American Science 2009:5(2) 1-12 Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways 1 Khalid T. Al-Sarayreh, 2 Rafa E. Al-Qutaish, 3 Basil

More information

UNIT 1. SIGNALS AND SYSTEM

UNIT 1. SIGNALS AND SYSTEM Page no: 1 UNIT 1. SIGNALS AND SYSTEM INTRODUCTION A SIGNAL is defined as any physical quantity that changes with time, distance, speed, position, pressure, temperature or some other quantity. A SIGNAL

More information

Test Sample and Size. Synonyms. Definition. Main Body Text. Michael E. Schuckers 1. Sample Size; Crew designs

Test Sample and Size. Synonyms. Definition. Main Body Text. Michael E. Schuckers 1. Sample Size; Crew designs Test Sample and Size Michael E. Schuckers 1 St. Lawrence University, Canton, NY 13617, USA schuckers@stlawu.edu Synonyms Sample Size; Crew designs Definition The testing and evaluation of biometrics is

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information