Wavelet Transform in Speech Segmentation

Size: px
Start display at page:

Download "Wavelet Transform in Speech Segmentation"

Transcription

1 Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl 2 Faculty of Applied Mathematics, AGH University of Science and Technology, Kraków, Poland, drwiega@wms.mat.agh.edu.pl Summary. A non-uniform speech segmentation method based on discrete wavelet transform is used for the localization of phoneme boundaries. A vector of real values representing the digital speech signal is decomposed into phone-like units by placing segment borders according to the result of the multiresolution analysis. The final decision on localization of boundaries is taken by analysis of the energy flow among the decomposition levels. Distribution-like event functions indicate events, regarded as the segment boundaries. 1 Introduction Many speech segmentation algorithms (see [1], [2]) have been used in systems built for the speech technology, but only a few use the wavelet spectra [1, 5]. Wavelet methods are known to be very useful in the time-frequency analysis of signals. Wavelet transform combines the best properties of classic frequency and time analysis in a common tool. Most of the segmentation methods utilise some kind of statistical modelling of the signals and use optimisation methods (Viterbi decoding or dynamic time warping (DTW))(see [4]). These methods can only be used if the proper models of the language are known. This assumption leads to the necessity of preparing such models what usually is rough and time-consuming task. The algorithm proposed in this paper is feature-driven and thus does not need any additional language models. Phonetically annotated database of spoken Polish - Corpora 97 was used for tuning and testing the method. 2 Wavelet Decomposition The discrete wavelet transformation (DWT) belongs to the group of frequency transformations and is used to obtain a time-frequency spectrum (see [3, 8]) of signal {s(n)}. This encourages us to use the DWT as an artificial method

2 2 M. Ziółko, J. Gałka and T. Drwięga of speech analysis. Dyadic frequency division makes the DWT much more compatible with the principles of the operation of human hearing system, equipped with subsystem for frequency analysis (to reveal the information important for speech recognition ability), than other methods. In order to obtain the DWT, the coefficient c m+1,i of series s(n) = i c m+1,i φ m+1,i (n) (1) are computed for m = M,M 1,...,1, where φ m,i (n) = 2 m 2 φ(2 m n t i) (2) is the ith wavelet function at the mth resolution level and t is the sampling density. An example of wavelet function φ(t) and its spectrum is presented in Fig. 1. Due to the orthogonality of wavelet functions {φ m+1,i } i we obtain c m+1,i = 2 m+1 2 = 2 m n= s a (t) φ ( 2 m+1 t i ) dt s a (n) + φ ( 2 m+1 t i ) sin(π (t n t) / t) dt, (3) π (t n t) / t where s a (t) is an analog signal and its samples create the digital signal, i.e. s a (n t) = s(n). Fig. 1. Spectrum (left figure) and its Meyer scale function with N = 33 samples (right figure) Formula (3) has two disadvantages very important from the computational point of view. Firstly, it is difficult to compute integrals numerically when wavelet supports are unlimited. Secondly, the numerical computations of integrals are time-consuming, because the high quality standard needs series (1) for each second of the recorded speech signal. Therefore instead of formula (3), we used approximation

3 Wavelet Transform in Speech Segmentation 3 c m+1,i = s(n) φ m+1,i (n), n D i (4) where D i are compact supports of φ m+1,i. The support of scale function φ(t) must be compact to provide the fast calculations in the real time. It is common feature of the scale functions that φ(t) 0 very fast as t +. In practice the support can be limited to the segment [ T,T] where T = max {t R : φ(t) h}. (5) The threshold h should depend on the extreme value of the scale function. We choose condition h = α max φ(t), where α can be taken arbitrary, e.g. t α = In that way, the support of scale function was bounded to obtain the reasonable compromise: fast computations in real time and relatively small errors. The number of samples should be the smallest integer value N which satisfies inequality (N 1) t 2T, that is N T because the sampling frequency f s = 1/ t = Hz. The sampling density in the frequency domain f = 0.5/T and (N 1) f Hz because the whole frequency band is spread from 8000 to 8000 Hz. The coefficients of the lower level are calculated by applying the well known (see [3, 9]) formulae c m,n = i d m,n = i h i 2n c m+1,i (6) g i 2n c m+1,i (7) where {h i } and {g i } are the coefficients which depend on the assumed pair: scale function φ and wavelet ψ. In other words, the speech spectrum is decomposed using digital filtering and downsampling procedures defined by (6) and (7). It means that given the wavelet coefficients c m+1,i of the (m + 1)th resolution level, (6) and (7) are applied to compute the coefficients of the mth resolution level. The coefficients of next resolution levels are calculated recursively by applying formulae (6) and (7). The multiresolution analysis gives a hierarchical and fast scheme for the computation of the wavelet spectrum for a given signal s. The undertaken experiments show that the speech signal decomposition into six levels is sufficient (see Table 1) to cover the frequency band of voice. The energy of the speech signal above 8 khz and below 125 Hz is very low and can be neglected. The above presented wavelet decomposition leads to series s(n) = i M c 1,i φ 1,i (n) + d m,i ψ m,i (n) (8) m=1 i

4 4 M. Ziółko, J. Gałka and T. Drwięga Decomposition level m Frequency band [Hz] Approximation Table 1. Frequency division obtained for M = 6 levels of dyadic wavelet decomposition. Sampling frequency f s = 16 khz where φ 1,i (n) = 2 (1 M)/2 { φ (( 2 1 M n i ) t ) if M n i N 1 0 for other 2 1 M n i (9) and ψ m,i (n) = 2 (m M)/2 { ψ (( 2 1 M n i ) t ) if 0 2 m M n i N 1 0 for other 2 m M n i (10) The elements of the DWT for a mth level may be collected into a vector d m = (d m,1,d m,2,...) T. In this way the values of DWT for M + 1 levels can be obtained. It means that discrete wavelet spectrum DWT (s) = {d M,d M 1,...,d 1,c 1 } (11) is created from the coefficients of series (8). 3 Segmentation Scheme The role of the segmentation algorithm is to detect significant transitions of the energy among the wavelet sub-bands. When significant enough transition is found, it is marked and scored as a spectral-phonetic event. It is assumed that events occur when the energy transition changes the order of the powersorted bands. The non-uniform segmentation algorithm consists of the following steps: 1. Decompose signal s into the six levels of DWT = {d 6,n,d 5,n,...,d 1,n }. 2. Calculate the sum of power samples in all frequency sub-bands according to rule B m,k = k 2 6 m n=(k 1) 2 6 m +1 d 2 m,n. (12)

5 Wavelet Transform in Speech Segmentation 5 3. Calculate the power envelopes as a running mean values B env m,k = 2 K 2 k+ 1 K n=k K 2 B m,n, (13) where K = 2 M t µ f s for expected mean duration t µ of the segment of speech. For the given t µ = 100 ms, f s = 16 khz and M = 6 we obtain K = 25 samples. 4. Generate importance matrix M = [M m,k ] R 6 L of frequency bands by sorting the envelopes in each time k position i.e. M k = {m i } 6 i=1 : Benv m 1,k Bm env 2,k Bm env 3,k Bm env 4,k Bm env 5,k Bm env 6,k where L depends on the length of the speech signal. 5. Compute event-function f (k) = 6 m=1 M m,k+1 M m,k. (14) m 6. Segment border s locations can now be extracted from f (k) by choosing its local maxima, which fulfill two conditions: each of the chosen maximum has to be the highest value within the neighborhood of t min milliseconds, which is related to minimal assumed segment duration, local maximum is greater than specified threshold f tr. Time-range condition rejects multiple changes related to the same border and segments shorter than t min. Threshold adjusts sensitivity of the segmentation. By increasing its value we reduce the number of chosen events. It is reasonable to set its value on-line, according to f tr (k) = β P n= P 2P f (k n) where P is adaptation range corresponding to 100 milliseconds., (15) 4 Conclusions Presented algorithm was tested using Polish annotated speech database - Corpora 97. The speech of five different persons, with 1825 utterances were used for evaluation. These utterances include all of the 37 phonemes of Polish language and its natural concatenations. Reference phonetic annotation of speech was known, since it had been prepared earlier. Various values of the detection

6 6 M. Ziółko, J. Gałka and T. Drwięga parameters t min and β were used in order to find the combination producing the less number of errors. The best results were obtained for parameter t min set in the range milliseconds. In this range phone recognition, insertion and deletion rates are taking their best values. Threshold adaptation factor β does not affect mentioned rates when is set within 0 1. When β obtains the values greater than 1, results degrade considerably because of increase the rate of deletions, which are the most corrupting errors in speech segmentation (see [6]). It must be mentioned, that segmentation procedure uses acoustic, not phonetic features of speech. It will result in increased level of insertion rate because some phonemes are not acoustically uniform. This feature, however, does not affect overall performance of speech recognition systems (see [6, 7]). The use of wavelet analysis turns out to be an effective tool in finding the boundaries between two phonemes. The use of non-uniform segmentation reduces total number of segments to be processed by higher-level parts of ASR systems (HMM modeling). The effect is a significant decrease of Viterbi decoding search-space and computational cost. 5 Acknowledgments We would like to thank Stefan Grocholewski form Institute of Computer Science, Poznań University of Technology for providing a corpus of spoken Polish - Corpora 97. This work was supported by grant R References 1. A. Alani and M. Deriche, Proceedings of The Fifth International Symposium on Signal Processing and its Applications, (1999) 2. S. Cheng and H. Wang, Proceedings of 8th European Conference on Speech Communication and Technology - EUROSPEECH, (2003) 3. I. Daubechies, Ten lectures on Wavelets (SIAM, 1992) 4. K. Demuynck and T. Laureys, Proceedings of the 5th International Conference on Text, Speech and Dialogue, (2002) 5. O. Farooq and S. Datta, IEE Proceedings: Vision, Image and Signal Processing, 151(3), (2004) 6. J. Gałka and B. Ziółko, NAUN International Journal Of Circuits, Systems And Signal Processing, 2(1), (2007) 7. S. Grocholewski, Proceedings of International Conference on Language Resources and Evaluation, (1998) 8. Y. Meyer, Wavelets and applications (Masson, 1991) 9. O. Rioul and M. Vetterli, IEEE Signal Processing Magazine, 8, (1991)

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Phoneme segmentation based on spectral metrics

Phoneme segmentation based on spectral metrics Phoneme segmentation based on spectral metrics Xianhua Jiang, Johan Karlsson, and Tryphon T. Georgiou Introduction We consider the classic problem of segmenting speech signals into individual phonemes.

More information

Detection-Based Speech Recognition with Sparse Point Process Models

Detection-Based Speech Recognition with Sparse Point Process Models Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

ECE472/572 - Lecture 13. Roadmap. Questions. Wavelets and Multiresolution Processing 11/15/11

ECE472/572 - Lecture 13. Roadmap. Questions. Wavelets and Multiresolution Processing 11/15/11 ECE472/572 - Lecture 13 Wavelets and Multiresolution Processing 11/15/11 Reference: Wavelet Tutorial http://users.rowan.edu/~polikar/wavelets/wtpart1.html Roadmap Preprocessing low level Enhancement Restoration

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Modeling the creaky excitation for parametric speech synthesis.

Modeling the creaky excitation for parametric speech synthesis. Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity

More information

446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and

446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and Vol. 46 No. 6 SCIENCE IN CHINA (Series F) December 003 Construction for a class of smooth wavelet tight frames PENG Lizhong (Λ Π) & WANG Haihui (Ξ ) LMAM, School of Mathematical Sciences, Peking University,

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Introduction to Wavelet. Based on A. Mukherjee s lecture notes

Introduction to Wavelet. Based on A. Mukherjee s lecture notes Introduction to Wavelet Based on A. Mukherjee s lecture notes Contents History of Wavelet Problems of Fourier Transform Uncertainty Principle The Short-time Fourier Transform Continuous Wavelet Transform

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Cochlear modeling and its role in human speech recognition

Cochlear modeling and its role in human speech recognition Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL Allen/IPAM

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Feature Extraction for ASR: Pitch

Feature Extraction for ASR: Pitch Feature Extraction for ASR: Pitch Wantee Wang 2015-03-14 16:55:51 +0800 Contents 1 Cross-correlation and Autocorrelation 1 2 Normalized Cross-Correlation Function 3 3 RAPT 4 4 Kaldi Pitch Tracker 5 Pitch

More information

MULTIRATE DIGITAL SIGNAL PROCESSING

MULTIRATE DIGITAL SIGNAL PROCESSING MULTIRATE DIGITAL SIGNAL PROCESSING Signal processing can be enhanced by changing sampling rate: Up-sampling before D/A conversion in order to relax requirements of analog antialiasing filter. Cf. audio

More information

FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes

FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION Xiao Li and Jeff Bilmes Department of Electrical Engineering University. of Washington, Seattle {lixiao, bilmes}@ee.washington.edu

More information

Wavelets in Pattern Recognition

Wavelets in Pattern Recognition Wavelets in Pattern Recognition Lecture Notes in Pattern Recognition by W.Dzwinel Uncertainty principle 1 Uncertainty principle Tiling 2 Windowed FT vs. WT Idea of mother wavelet 3 Scale and resolution

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Littlewood Paley Spline Wavelets

Littlewood Paley Spline Wavelets Proceedings of the 6th WSEAS International Conference on Wavelet Analysis & Multirate Systems, Bucharest, Romania, October 6-8, 26 5 Littlewood Paley Spline Wavelets E. SERRANO and C.E. D ATTELLIS Escuela

More information

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ). Wavelet Transform Andreas Wichert Department of Informatics INESC-ID / IST - University of Lisboa Portugal andreas.wichert@tecnico.ulisboa.pt September 3, 0 Short Term Fourier Transform Signals whose frequency

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

Assignment #09 - Solution Manual

Assignment #09 - Solution Manual Assignment #09 - Solution Manual 1. Choose the correct statements about representation of a continuous signal using Haar wavelets. 1.5 points The signal is approximated using sin and cos functions. The

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Wavelets and Multiresolution Processing (Wavelet Transforms) Christophoros Nikou cnikou@cs.uoi.gr University of Ioannina - Department of Computer Science 2 Contents Image pyramids

More information

Hierarchical Multi-Stream Posterior Based Speech Recognition System

Hierarchical Multi-Stream Posterior Based Speech Recognition System Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition

Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition Shivesh Ranjan Abstract In this paper, we propose a new scheme for recognition of isolated words in Hindi Language speech,

More information

Quadrature Prefilters for the Discrete Wavelet Transform. Bruce R. Johnson. James L. Kinsey. Abstract

Quadrature Prefilters for the Discrete Wavelet Transform. Bruce R. Johnson. James L. Kinsey. Abstract Quadrature Prefilters for the Discrete Wavelet Transform Bruce R. Johnson James L. Kinsey Abstract Discrepancies between the Discrete Wavelet Transform and the coefficients of the Wavelet Series are known

More information

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets

Evolutionary Power Spectrum Estimation Using Harmonic Wavelets 6 Evolutionary Power Spectrum Estimation Using Harmonic Wavelets Jale Tezcan Graduate Student, Civil and Environmental Engineering Department, Rice University Research Supervisor: Pol. D. Spanos, L.B.

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Artificial Intelligence Markov Chains

Artificial Intelligence Markov Chains Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010

More information

2D Wavelets. Hints on advanced Concepts

2D Wavelets. Hints on advanced Concepts 2D Wavelets Hints on advanced Concepts 1 Advanced concepts Wavelet packets Laplacian pyramid Overcomplete bases Discrete wavelet frames (DWF) Algorithme à trous Discrete dyadic wavelet frames (DDWF) Overview

More information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation

More information

Traditionally a small part of a speech corpus is transcribed and segmented by hand to yield bootstrap data for ASR or basic units for concatenative sp

Traditionally a small part of a speech corpus is transcribed and segmented by hand to yield bootstrap data for ASR or basic units for concatenative sp PROBABILISTIC ANALYSIS OF PRONUNCIATION WITH 'MAUS' Florian Schiel, Andreas Kipp Institut fur Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universitat Munchen ABSTRACT This paper describes

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

GMM-Based Speech Transformation Systems under Data Reduction

GMM-Based Speech Transformation Systems under Data Reduction GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex

More information

R E S E A R C H R E P O R T Entropy-based multi-stream combination Hemant Misra a Hervé Bourlard a b Vivek Tyagi a IDIAP RR 02-24 IDIAP Dalle Molle Institute for Perceptual Artificial Intelligence ffl

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *

IDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD * R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse LIKELIHOOD RATIO ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION Frederic BIMBOT + Dominique GENOUD * IDIAP{RR

More information

Unsupervised Vocabulary Induction

Unsupervised Vocabulary Induction Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After

More information

Markov processes on curves for automatic speech recognition

Markov processes on curves for automatic speech recognition Markov processes on curves for automatic speech recognition Lawrence Saul and Mazin Rahim AT&T Labs - Research Shannon Laboratory 180 Park Ave E-171 Florham Park, NJ 07932 {lsaul,rnazin}gresearch.att.com

More information

Logarithmic quantisation of wavelet coefficients for improved texture classification performance

Logarithmic quantisation of wavelet coefficients for improved texture classification performance Logarithmic quantisation of wavelet coefficients for improved texture classification performance Author Busch, Andrew, W. Boles, Wageeh, Sridharan, Sridha Published 2004 Conference Title 2004 IEEE International

More information

Comparison of Wavelet Families with Application to WiMAX Traffic Forecasting

Comparison of Wavelet Families with Application to WiMAX Traffic Forecasting Comparison of Wavelet Families with Application to WiMAX Traffic Forecasting Cristina Stolojescu 1,, Ion Railean, Sorin Moga, Alexandru Isar 1 1 Politehnica University, Electronics and Telecommunications

More information

Wavelet analysis on financial time series. By Arlington Fonseca Lemus. Tutor Hugo Eduardo Ramirez Jaime

Wavelet analysis on financial time series. By Arlington Fonseca Lemus. Tutor Hugo Eduardo Ramirez Jaime Wavelet analysis on financial time series By Arlington Fonseca Lemus Tutor Hugo Eduardo Ramirez Jaime A thesis submitted in partial fulfillment for the degree of Master in Quantitative Finance Faculty

More information

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power

More information

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function 890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth

More information

On Homogeneous Segments

On Homogeneous Segments On Homogeneous Segments Robert Batůšek, Ivan Kopeček, and Antonín Kučera Faculty of Informatics, Masaryk University Botanicka 68a, 602 00 Brno Czech Republic {xbatusek,kopecek,tony}@fi.muni.cz Abstract.

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Fuzzy quantization of Bandlet coefficients for image compression

Fuzzy quantization of Bandlet coefficients for image compression Available online at www.pelagiaresearchlibrary.com Advances in Applied Science Research, 2013, 4(2):140-146 Fuzzy quantization of Bandlet coefficients for image compression R. Rajeswari and R. Rajesh ISSN:

More information

Identification and Classification of High Impedance Faults using Wavelet Multiresolution Analysis

Identification and Classification of High Impedance Faults using Wavelet Multiresolution Analysis 92 NATIONAL POWER SYSTEMS CONFERENCE, NPSC 2002 Identification Classification of High Impedance Faults using Wavelet Multiresolution Analysis D. Cha N. K. Kishore A. K. Sinha Abstract: This paper presents

More information

Wavelet Packet Based Digital Image Watermarking

Wavelet Packet Based Digital Image Watermarking Wavelet Packet Based Digital Image ing A.Adhipathi Reddy, B.N.Chatterji Department of Electronics and Electrical Communication Engg. Indian Institute of Technology, Kharagpur 72 32 {aar, bnc}@ece.iitkgp.ernet.in

More information

1 Introduction to Wavelet Analysis

1 Introduction to Wavelet Analysis Jim Lambers ENERGY 281 Spring Quarter 2007-08 Lecture 9 Notes 1 Introduction to Wavelet Analysis Wavelets were developed in the 80 s and 90 s as an alternative to Fourier analysis of signals. Some of the

More information

On the relationship between intra-oral pressure and speech sonority

On the relationship between intra-oral pressure and speech sonority On the relationship between intra-oral pressure and speech sonority Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves Interspeech 2005 1 We address the question of the relationship between

More information

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology

More information

ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH

ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH Wolfgang Wokurek Institute of Natural Language Processing, University of Stuttgart, Germany wokurek@ims.uni-stuttgart.de, http://www.ims-stuttgart.de/~wokurek

More information

Let p 2 ( t), (2 t k), we have the scaling relation,

Let p 2 ( t), (2 t k), we have the scaling relation, Multiresolution Analysis and Daubechies N Wavelet We have discussed decomposing a signal into its Haar wavelet components of varying frequencies. The Haar wavelet scheme relied on two functions: the Haar

More information

Wavelets and Multiresolution Processing

Wavelets and Multiresolution Processing Wavelets and Multiresolution Processing Wavelets Fourier transform has it basis functions in sinusoids Wavelets based on small waves of varying frequency and limited duration In addition to frequency,

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Haar wavelets. Set. 1 0 t < 1 0 otherwise. It is clear that {φ 0 (t n), n Z} is an orthobasis for V 0.

Haar wavelets. Set. 1 0 t < 1 0 otherwise. It is clear that {φ 0 (t n), n Z} is an orthobasis for V 0. Haar wavelets The Haar wavelet basis for L (R) breaks down a signal by looking at the difference between piecewise constant approximations at different scales. It is the simplest example of a wavelet transform,

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Soundex distance metric

Soundex distance metric Text Algorithms (4AP) Lecture: Time warping and sound Jaak Vilo 008 fall Jaak Vilo MTAT.03.90 Text Algorithms Soundex distance metric Soundex is a coarse phonetic indexing scheme, widely used in genealogy.

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY

A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY Amir Hossein Harati Nead Torbati and Joseph Picone College of Engineering, Temple University Philadelphia, Pennsylvania, USA

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Reducing False Alarm Rate in Anomaly Detection with Layered Filtering

Reducing False Alarm Rate in Anomaly Detection with Layered Filtering Reducing False Alarm Rate in Anomaly Detection with Layered Filtering Rafa l Pokrywka 1,2 1 Institute of Computer Science AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

Module 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example

Module 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example The Lecture Contains: Wavelets Discrete Wavelet Transform (DWT) Haar wavelets: Example Haar wavelets: Theory Matrix form Haar wavelet matrices Dimensionality reduction using Haar wavelets file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture35/35_1.htm[6/14/2012

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Numerical Differential Protection of Power Transformer using Algorithm based on Fast Haar Wavelet Transform

Numerical Differential Protection of Power Transformer using Algorithm based on Fast Haar Wavelet Transform IDIA ISTITUTE O TECHOLOGY, KHARAGPUR 7232, DECEMBER 27-29, 22 59 umerical Differential Protection of Power Transformer using Algorithm based on ast Haar Wavelet Transform K. K. Gupta and D.. Vishwakarma

More information

Problem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity

Problem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity Problem with Fourier Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Fourier analysis -- breaks down a signal into constituent sinusoids of

More information

Wavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG.

Wavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Problem with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Segment boundary detection via class entropy measurements in connectionist phoneme recognition q

Segment boundary detection via class entropy measurements in connectionist phoneme recognition q Speech Communication 48 (2006) 1666 1676 www.elsevier.com/locate/specom Segment boundary detection via class entropy measurements in connectionist phoneme recognition q Giampiero Salvi * KTH, Royal Institute

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Wavelets in Scattering Calculations

Wavelets in Scattering Calculations Wavelets in Scattering Calculations W. P., Brian M. Kessler, Gerald L. Payne polyzou@uiowa.edu The University of Iowa Wavelets in Scattering Calculations p.1/43 What are Wavelets? Orthonormal basis functions.

More information

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information