Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
|
|
- Benedict Flowers
- 5 years ago
- Views:
Transcription
1 EngOpt International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm Dmitry Shalymov Department of Mathematics & Mechanics, Saint-Petersburg State University, Saint-Petersburg, Russia shalydim@mail.ru. Abstract This paper represents the using of the new simultaneous perturbation stochastic approximation algorithm (SPSA) for the solving of the noise robust isolated words recognition problem. The noise robust speech recognition method which is based on mel-frequency cepstral coefficients (MFCC) is briefly described. Main features of SPSA algorithm are shown. The effectiveness of the proposed method is demonstrated. 2. Keywords: Speech Recognition, Stochastic Optimization, Artificial Intelligence 3. Introduction Problems of the speech recognition are still important today. Many of modern methods which are used to solve this problem are computationally resource-intensive. The capacity of such resources is often bounded. For many algorithms it is impossible to use it in portable devices. This moves researches to find more effective methods. This paper represents the using of the new simultaneous perturbation stochastic approximation algorithm (SPSA) for the solving of the noise robust isolated words recognition problem. Due to SPSA s simplicity and small number of operations per each iteration, this algorithm can be used as alternative method for real time speech recognition. The noise robust speech recognition method which is based on mel-frequency cepstral coefficients (MFCC) is briefly described. Each sound-wave that entered in the recognition system includes some noise. In case of noisy measurements of loss function SPSA algorithm keeps reliable estimations under almost arbitrary noise. It is very important to the speech recognition problem where the noise represents often the phase or spectrum shifts of signal, or external environment, or recording device settings, etc. SPSA algorithm is based on trial simultaneous perturbations which provide appropriate estimations under almost arbitrary noise. The main characteristic of SPSA algorithm is that only two measurements of function to approximate loss function gradient are needed for any dimension of an unknown feather vector. Based on this characteristic it is convenient to use SPSA algorithm in speech recognition problem where feature vectors of large dimensions are used. It is simple to use this kind of algorithm in optimization problems with large number of variables. In that way we have an opportunity to operate with many words at once. Moreover its realization is simple for understanding and embedding in electronic devices. 4. Isolated words recognition problem Digital processing of acoustic signal supposes that analog speech signal is performed in digital shape. As a result of A/D transformation continuous signal is converted in sequence of discrete time intervals. Each time interval represents one value (signal measurement). This value characterizes signal in a point with a defined precision. Accuracy of representation depends on width of range of obtained numbers and hence it depends on capacity of A/D transformation. The process of numeric values extraction from signal is called quantization. The signal time intervals fragmentation is called sampling. Digital processing of acoustic signal is shown in Fig.. Fig.. Steps of acoustic signal processing The analogue acoustic signal that comes from microphone is exposed to quantization and sampling due to A/D transformation. Word achievement is occurred. It means that digital record of pronouncing word is performed in form of sequence of acoustic signal measurements {s k}. Word achievement is divided into frame sequence {X i} during digital processing. Frame X (with length N) is a sequence of acoustic signal measurements s, s 2,..., s N. Length of each frame is strictly fixed in time. For example, if N=00 and sampling rate is 8000 Hz then frame length is equal to 2.5 msec. Often frames are shifted relative to each other to prevent information losses in place of frame borders.
2 Frame shift step number of signal measurements between beginnings of two frames that follow one another. Shift step that is less than N (length of frame) means that frames are overlapped. Further, in a series of tasks such as speech recognition or personal identification, each frame is compared with several data values that characterize sound in a best way. Such data organizes feature vector (or attribute vector). From the mathematical point of view it M could be a vector of R space, a group of functions, or one function. The objective of the recognition system is to identify each word that comes in entry with one of the providential classes. Unfortunately, there is a great number of various factors that could reduce accuracy of the recognition system. For example, the mood and state of the speaker, external environment noise, the rate of phrase pronunciation etc. The recognition system is speaker independent in case of correct word recognition regardless of person who is pronouncing. It is hard to implement such system in practice because acoustic signals are strongly depend on loudness and timbre of voice, mood and state of the speaker. To extract information from such signals mel-scale filters are used quite often. These filters average spectral components of the signal in concrete frequency ranges. So the signal becomes less dependent on the speaker. Such filters lie in a base of the MFCC (Mel-Frequency Cepstral Coefficients) method. MFCC is used in the recognition system discussed in this paper. 5. Speech signal processing 5.. Preliminary filtration The speech signal should be passed through a low-frequency filter for spectral smoothing. The goal of this transformation is to reduce influence of local distortions. Low-frequency filtration is often implemented in low-level activity. Nevertheless, there are various mathematical methods that are successfully used in speech recognition problems. In the considered system no such methods were used. It is well known that most informative frequencies of human speech are concentrated in 00Hz 2KHz interval. That s why when solving problem of speech recognition as early as in initial state, only the frequencies of this interval remain in the signal spectrogram Cutting of a signal with an overlapping segments To extract feature vectors of the same length it is necessary to cut the speech signal into equal frames. After that it is necessary to make a transformation of each frame. Usually frames are selected so that they are overlapped for half of its length or for 2/3. Overlapping is used to reduce information loss in the border of the frames. Feature vector for observed region of speech signal consists of cepstral coefficients characterized for each frame separately. So if we increase frame overlapping then dimension of feature vector for entire region will be increased on default. The set of numbers that were extracted during spectral analysis of speech signal interval is called cepstral coefficients. Usually the length of the observed interval is selected so that it corresponds to ms interval Window signal processing The goal of this step is to reduce border effects that take place during segmentation process. To neutralize undesirable border effects, the speech signal s(n) is usually multiplied by w(n): x(n) = s(n)*w(n). As the w(n) function the Hamming window function is often used: 2πn cos,0 n < N w( n) = N 0, otherwise. 5.4 Feature vector extraction Each input speech signal is performed as a feature vector that characterizes the signal. There are several ways to construct the feature vector. In the discussed model we use a classical approach for cepstral coefficients. There are two possibilities to extract cepstral coefficients. One is based on Mel-Frequency Cepstral Coefficients (MFCC)[3]. The other is based on Linear Predictive Cepstral Coefficients (LPCC)[4]. MFCC is the most widespread method. Let us examine its major steps.. Input signal is broken into frames. For each frame Hamming window is used. 2. Pre-emphasis preliminary phrase selection (accentuation). It is performed by speech signal filtration with FIR (finite impulse response) filter. This is due to the necessity of spectral smoothing. It allows us to make signal less sensitive for different noises that happens while signal processing. 3. Then the spectrogram is examined. The set of frequencies that are presented in the spectrogram is divided into numbered intervals. The range of possible frequencies is strictly defined for each interval. Then average signal intensity in each interval is calculated to build a special diagram. In this diagram abscissa consists of interval numbers and ordinate axis consists of amplified amplitude values. This process is called mel-scale filtration. 4. Feature vectors are extracted with the methods based on human interpretation of sound, since a human ear interprets signal loudness in a logarithm scale. This step performs a signal amplitudes compression using the logarithm. 5. The final step is an adaptation of the Fourier inversion to spectrum. The result of this step is the cepstral coefficients extraction and feature vector construction. Cepstral coefficients could be described as follows: c n = K k= (log S( k)) e where S(k) is an averaged spectrum of signal with amplified amplitudes that characterizes frequency interval with number k in melscale filter; K is general number of intervals. ikn,
3 6. Randomized algorithm of stochastic approximation The exact solution of any problem can be found if there is a precise definition and mathematical description. But in reality the complicity of such connections and relationships make it impossible to give an exact mathematical description for many phenomena. The simply theoretic approach is to choose a mathematical model which is close to a real process and which includes different noises (disturbances). These noises represent some kind of roughness of the mathematical model from one side and represent the characteristics of outside uncontrolled perturbations of a system from the other. It is well known to specialists in the theory of the unknown parameters identification that if the noise is a deterministic unknown function or the observation noise is a probabilistically dependent sequence, then getting decisions is wrong. Then some theorists say that observation sequence is degenerate (not rich) and the solutions of such kind of problems are not studied. For the purpose of enriching information in the observation channel sometimes there is a possibility to include new simultaneous perturbation with well-known probabilistic properties into the input system channel to solve a set of problems. Sometimes the measurable random process that is already presented in a system can play a role of such simultaneous perturbation. In control systems it is natural to add the trial simultaneous perturbations (actions) through a control channel. One of the remarkable characteristics of such type of algorithms is a convergence under the almost arbitrary noise. A considerable restriction for using these algorithms is an assumption of weak correlation or independence of the measurement noise and the simultaneously perturbation which is added into the system, while there are no other assumptions about measurement noise properties. This restriction is natural in the case when the noise is generated from either an unknown, but bounded deterministic function (some unmodel dynamics). Let us suppose that there are l different words in our recognition system. Feature vectors of speech signal are input signals for SPSA algorithm. It is represented as a point in multidimensional Euclidean space. SPSA algorithm determines centers of l classes due to the classifying sequence. Each class corresponds to one of the words. Coordinates of the centers represents feature vectors of pattern words. Word is identified with class by distance between feature vector of signal and center of the class. Algorithm considered below is used to define pattern words (or class centers in the system). To recognize speech commands it is used traditional method of comparison with patterns and following minimal distance extraction. As initial class centers it is possible to take arbitrary l vectors of space. In general, the selection of words to be recognized is important. The more phonetic differences are between words, the easier its recognition. But often recognized words are conformable. That s why it is important to define centers of classes as far from each other, as it is only possible. From the mathematical point of view speech recognition problem can be reformulated as a problem of the automatic image classification. 7. Automatic image classification problem Suppose that the state-space is covered by a set of classes { X,, l X } (the number of classes is bounded with ). The k automatic image classification problem is to build a rule which for the each point x from the gives the correspondence class l belonged to { X,, X }. If several points are compared with the same class they have the common feature, and it naturally generates this class. Usually one can take as a common feature of a class the closeness to the specific center: for each point x the simple classification rule is to compare distances from the center of one class with others. To formalize classification rule the family of penalty functions (functions of cost) is considered, and a set of authenticity degree functions is defined: X Let s suppose that probability distribution is assigned in. The automatic classification problem is to find such sets of functions and vectors that minimize the mean risk functional: The state-space is divided into classes according to the rule: Indicator s functions of these classes are denoted as. We can rewrite the mean risk functional in the form where is -dimensioned vector with functions as component values and is -dimensioned vector arranged with functions values. This functional is characterized the performance of classification. Partition is optimal if the parameter minimizes the mean risk functional. Geometrically the automatic classification problem can be described by the following way. Suppose is a space of real numbers and. Penalty functions are. Each points located closer to the center are correspond to the class. The mean risk functional can be redefined: The automatic classification problem transforms into a problem of finding a set of centers which minimize
4 amount dispersal. The value of keeps unchangeable if vector transposition occurs in bundle. The usual way of some function F minimization is to find the bundle of centers for which the equation 0 is satisfied. But in the considered case the function F is not differentiable. That s why automatic classification problem solution may be not simple. 8. Trial perturbations and estimation algorithm Assume that probability distribution is unknown but we have qualifying sequence. In [] it was suggested the way to build estimation sequence which converges to the good approximation of bundle. The proposed new recursive algorithm for the classification of the huge amount of multidimensional data is based on former SPSA ideas. The new algorithms perform well in the real time environment. The SPSA algorithm is used the simultaneous trial perturbations. The main features of the SPSA algorithms are the following: the unknown function is measured not at the point of the previous estimate but at estimate's slightly excited position for all unknown vector components simultaneously, and there is the essential reduction of observations at each iteration in the multi-dimensional case. It means that necessary amount of iterations isn t increasing in comparison with a classical Kiefer-Wolfowitz procedure though number of observations is decreasing significantly. Let s penalty functions are not defined analytically. But values of these functions can be measured with some noise:. Define the as -dimensioned vector arranged with and as -dimensioned vector of noise. To build the estimation sequence of bundle we suggest using the SPSA algorithm. It is based on measurable stochastic independent vectors called trial simultaneous perturbation. These vectors consist of independent stochastic values. Let s fix the initial bundle and choose two zero-aimed sequences and. The proposed algorithm is described below:, where is -dimensioned vector arranged with functions and. are -dimensioned vectors of noise. is a set projector. 9. Practical application As an experiment, a simplified model of SPSA algorithm application for isolated words recognition problem was implemented. In Matlab 7.0. a create speaker-dependent self-qualifying system that is able to recognize four different words was created. Selection of words to be recognized is important in general. It is easier to recognize words that have many phonetic differences. To provide convergence of algorithm with penalty function q ( x, θ) = x θ it needs to satisfy special condition. Namely, distance between different classes should be greater than maximum radius of all classes. Hence it is desirable to have center of classes as far from each other as it is only possible. As initial centers of classes we could take any points of space R M. In considered recognition system feature vectors of first four different words from qualifying sequence were taken as initial centers. Let us consider part of speech signal that is correspond to one-second time interval. It consists of several frames. Each frame has 25 msec time length. So there are 40 frames in one second time interval at all. During spectral processing feature vector with dimension 24 was extracted from each frame. Spectrum was broken to 24 ranges. For each range average spectrum value was computed. Bundle of averaged spectrum values organizes feature vector. Dimension M of phase space is defined as sum of all dimensions correspond to feature vectors of frames in one-second time speech signal interval. Frame overlapping was not used. So phase space dimension M is equal to 40*24=960. For each class there were recorded more than one hundred samples that arranged qualifying sequence. Recording process was performed with 8000 KHz sampling rate and 6 bit quantization. While speech signal processing there were also used optimization methods concerned with peculiarity of microphone. Rate of algorithm convergence in practice is dependent from selection of sequences { α k n } and { β k n }. Important role in considered algorithm is played by simultaneous trial perturbations. It is not necessary to take ± accidental values. The main thing is that trial perturbations are finite and symmetric dispersed. Due to empirical issues as { α n } was taken sequence 3/n and as { β n } was taken / n. Simultaneous trial perturbations were selected as ± / 30. Convergence of considered algorithm for one word is shown in Fig. 2. In this illustration distances between input signals and approximated class center are demonstrated. Class center is approximated during SPSA algorithm launching. There were one hundred of signals entered the system. Feature vector of pattern word is correspond to class center when n=00. During feature vectors extraction some inaccuracies were permitted to simplify system implementation. In particular averaged spectrum values were roughly computed while mel-scale filtration. In spite of this it was succeeded to get 98% accuracy of recognition. To improve statistics cepstral coefficients extraction needs to be implemented in another way. 2
5 Fig. 2: SPSA algorithm convergence to the one class center 9. Conclusions This paper represents the application of the new simultaneous perturbation stochastic approximation algorithm SPSA for the solving of the noise robust isolated words recognition problem. SPSA provides appropriate estimations under almost arbitrary noise. One of its important features is the ability to retain simplicity and efficiency in spite of space dimension grows. Also it gives an opportunity to operate with many classes at once. Main steps of the isolated words recognition problem solving are described. To extract feature vectors Mel-Frequency Cepstral Coefficients was used. The recognition system accuracy is proved to be 98%. Performance of the system could be improved due to MFCC method realization improvement. 0. References. Granichin O. N. and Izmakova O. A., A Randomized Stochastic Approximation Algorithm for Self-Learning. Avtomatika i Telemekhanika, No. 8, 2005, pp Granichin O. N. and Polyak B.T., Randomized Algorithms of an Estimation and Optimization Under Almost Arbitrary Noises. M.: Nauka, Gold B., Morgan N., Speech and Audio Signal Processing. John Wiley and Sons, Inc, Rogina I., Automatic speech recognition. Carnegie Mellon University, Fomin V. N., Recursive estimation and adaptive filtration. M.: Nauka, 984
APPLYING QUANTUM COMPUTER FOR THE REALIZATION OF SPSA ALGORITHM Oleg Granichin, Alexey Wladimirovich
APPLYING QUANTUM COMPUTER FOR THE REALIZATION OF SPSA ALGORITHM Oleg Granichin, Alexey Wladimirovich Department of Mathematics and Mechanics St. Petersburg State University Abstract The estimates of the
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationCEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.
CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationChirp Transform for FFT
Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationUSEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK
Volume 17 HYDROACOUSTICS USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Polish Naval Academy Smidowicza 69, 81-103 Gdynia, Poland a.zak@amw.gdynia.pl
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM
ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationStress detection through emotional speech analysis
Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or
More informationFrog Sound Identification System for Frog Species Recognition
Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationFrom Fourier Series to Analysis of Non-stationary Signals - II
From Fourier Series to Analysis of Non-stationary Signals - II prof. Miroslav Vlcek October 10, 2017 Contents Signals 1 Signals 2 3 4 Contents Signals 1 Signals 2 3 4 Contents Signals 1 Signals 2 3 4 Contents
More informationLecture 7: Feature Extraction
Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of
More informationACOUSTICAL MEASUREMENTS BY ADAPTIVE SYSTEM MODELING
ACOUSTICAL MEASUREMENTS BY ADAPTIVE SYSTEM MODELING PACS REFERENCE: 43.60.Qv Somek, Branko; Dadic, Martin; Fajt, Sinisa Faculty of Electrical Engineering and Computing University of Zagreb Unska 3, 10000
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationFuzzy Support Vector Machines for Automatic Infant Cry Recognition
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,
More informationCommunication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi
Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationSignal Modeling Techniques In Speech Recognition
Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationAnalysis of polyphonic audio using source-filter model and non-negative matrix factorization
Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu
More informationVID3: Sampling and Quantization
Video Transmission VID3: Sampling and Quantization By Prof. Gregory D. Durgin copyright 2009 all rights reserved Claude E. Shannon (1916-2001) Mathematician and Electrical Engineer Worked for Bell Labs
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationMITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR
More informationCepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials
Cepstral Deconvolution Method for Measurement of Absorption and Scattering Coefficients of Materials Mehmet ÇALIŞKAN a) Middle East Technical University, Department of Mechanical Engineering, Ankara, 06800,
More informationrepresentation of speech
Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationA New OCR System Similar to ASR System
A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationLab 9a. Linear Predictive Coding for Speech Processing
EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)
More informationL6: Short-time Fourier analysis and synthesis
L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude
More information[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY TAJWEED UTOMATION SYSTEM USING HIDDEN MARKOUV MODEL AND NURAL NETWORK Safaa Omer Mohammed Nssr*, Hoida Ali Abdelgader SUDAN UNIVERSITY
More informationMel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationHidden Markov Model Based Robust Speech Recognition
Hidden Markov Model Based Robust Speech Recognition Vikas Mulik * Vikram Mane Imran Jamadar JCEM,K.M.Gad,E&Tc,&Shivaji University, ADCET,ASHTA,E&Tc&Shivaji university ADCET,ASHTA,Automobile&Shivaji Abstract
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationCorrespondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure
Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction: Op-amps in Negative Feedback
EECS 16A Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 18 18.1 Introduction: Op-amps in Negative Feedback In the last note, we saw that can use an op-amp as a comparator. However,
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationSpeaker Identification Based On Discriminative Vector Quantization And Data Fusion
University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationCourse content (will be adapted to the background knowledge of the class):
Biomedical Signal Processing and Signal Modeling Lucas C Parra, parra@ccny.cuny.edu Departamento the Fisica, UBA Synopsis This course introduces two fundamental concepts of signal processing: linear systems
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationLast time: small acoustics
Last time: small acoustics Voice, many instruments, modeled by tubes Traveling waves in both directions yield standing waves Standing waves correspond to resonances Variations from the idealization give
More informationTimbral, Scale, Pitch modifications
Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationIntroduction to Biomedical Engineering
Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationOn the relationship between intra-oral pressure and speech sonority
On the relationship between intra-oral pressure and speech sonority Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves Interspeech 2005 1 We address the question of the relationship between
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationSYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS
SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula
More informationMaximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems
Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of
More informationSpectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates
Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme
More informationA Model for Computer Identification of Micro-organisms
J. gen, Microbial. (1965), 39, 401405 Printed.in Great Britain 401 A Model for Computer Identification of Micro-organisms BY H. G. GYLLENBERG Department of Microbiology, Ulziversity of Helsinki, Finland
More informationSequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes
Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract
More informationwhere =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag
Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power
More informationChapter 3. Data Analysis
Chapter 3 Data Analysis The analysis of the measured track data is described in this chapter. First, information regarding source and content of the measured track data is discussed, followed by the evaluation
More informationEcho cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa
Echo cancellation by deforming sound waves through inverse convolution R. Ay 1 ward DeywrfmzMf o/ D/g 0001, Gauteng, South Africa Abstract This study concerns the mathematical modelling of speech related
More informationImpulsive Noise Filtering In Biomedical Signals With Application of New Myriad Filter
BIOSIGAL 21 Impulsive oise Filtering In Biomedical Signals With Application of ew Myriad Filter Tomasz Pander 1 1 Division of Biomedical Electronics, Institute of Electronics, Silesian University of Technology,
More informationEVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko
94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction
More informationCMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals
CMPT 889: Lecture 3 Fundamentals of Digital Audio, Discrete-Time Signals Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University October 6, 2005 1 Sound Sound waves are longitudinal
More informationHMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems
HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems Silvia Chiappa and Samy Bengio {chiappa,bengio}@idiap.ch IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland Abstract. We compare the use
More informationConvolutional Associative Memory: FIR Filter Model of Synapse
Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,
More informationODEON APPLICATION NOTE Calibration of Impulse Response Measurements
ODEON APPLICATION NOTE Calibration of Impulse Response Measurements Part 2 Free Field Method GK, CLC - May 2015 Scope In this application note we explain how to use the Free-field calibration tool in ODEON
More informationVerification of contribution separation technique for vehicle interior noise using only response signals
Verification of contribution separation technique for vehicle interior noise using only response signals Tomohiro HIRANO 1 ; Junji YOSHIDA 1 1 Osaka Institute of Technology, Japan ABSTRACT In this study,
More informationAdaptiveFilters. GJRE-F Classification : FOR Code:
Global Journal of Researches in Engineering: F Electrical and Electronics Engineering Volume 14 Issue 7 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationText Independent Speaker Identification Using Imfcc Integrated With Ica
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc
More information2.161 Signal Processing: Continuous and Discrete Fall 2008
IT OpenCourseWare http://ocw.mit.edu 2.6 Signal Processing: Continuous and Discrete Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. ASSACHUSETTS
More informationGMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System
GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux
More informationJorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function
890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth
More informationAnalysis of methods for speech signals quantization
INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationUsing the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways
Marsland Press Journal of American Science 2009:5(2) 1-12 Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways 1 Khalid T. Al-Sarayreh, 2 Rafa E. Al-Qutaish, 3 Basil
More informationUNIT 1. SIGNALS AND SYSTEM
Page no: 1 UNIT 1. SIGNALS AND SYSTEM INTRODUCTION A SIGNAL is defined as any physical quantity that changes with time, distance, speed, position, pressure, temperature or some other quantity. A SIGNAL
More informationTest Sample and Size. Synonyms. Definition. Main Body Text. Michael E. Schuckers 1. Sample Size; Crew designs
Test Sample and Size Michael E. Schuckers 1 St. Lawrence University, Canton, NY 13617, USA schuckers@stlawu.edu Synonyms Sample Size; Crew designs Definition The testing and evaluation of biometrics is
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More information