Wavelet Transform in Speech Segmentation
|
|
- Bathsheba Sara Hood
- 5 years ago
- Views:
Transcription
1 Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl 2 Faculty of Applied Mathematics, AGH University of Science and Technology, Kraków, Poland, drwiega@wms.mat.agh.edu.pl Summary. A non-uniform speech segmentation method based on discrete wavelet transform is used for the localization of phoneme boundaries. A vector of real values representing the digital speech signal is decomposed into phone-like units by placing segment borders according to the result of the multiresolution analysis. The final decision on localization of boundaries is taken by analysis of the energy flow among the decomposition levels. Distribution-like event functions indicate events, regarded as the segment boundaries. 1 Introduction Many speech segmentation algorithms (see [1], [2]) have been used in systems built for the speech technology, but only a few use the wavelet spectra [1, 5]. Wavelet methods are known to be very useful in the time-frequency analysis of signals. Wavelet transform combines the best properties of classic frequency and time analysis in a common tool. Most of the segmentation methods utilise some kind of statistical modelling of the signals and use optimisation methods (Viterbi decoding or dynamic time warping (DTW))(see [4]). These methods can only be used if the proper models of the language are known. This assumption leads to the necessity of preparing such models what usually is rough and time-consuming task. The algorithm proposed in this paper is feature-driven and thus does not need any additional language models. Phonetically annotated database of spoken Polish - Corpora 97 was used for tuning and testing the method. 2 Wavelet Decomposition The discrete wavelet transformation (DWT) belongs to the group of frequency transformations and is used to obtain a time-frequency spectrum (see [3, 8]) of signal {s(n)}. This encourages us to use the DWT as an artificial method
2 2 M. Ziółko, J. Gałka and T. Drwięga of speech analysis. Dyadic frequency division makes the DWT much more compatible with the principles of the operation of human hearing system, equipped with subsystem for frequency analysis (to reveal the information important for speech recognition ability), than other methods. In order to obtain the DWT, the coefficient c m+1,i of series s(n) = i c m+1,i φ m+1,i (n) (1) are computed for m = M,M 1,...,1, where φ m,i (n) = 2 m 2 φ(2 m n t i) (2) is the ith wavelet function at the mth resolution level and t is the sampling density. An example of wavelet function φ(t) and its spectrum is presented in Fig. 1. Due to the orthogonality of wavelet functions {φ m+1,i } i we obtain c m+1,i = 2 m+1 2 = 2 m n= s a (t) φ ( 2 m+1 t i ) dt s a (n) + φ ( 2 m+1 t i ) sin(π (t n t) / t) dt, (3) π (t n t) / t where s a (t) is an analog signal and its samples create the digital signal, i.e. s a (n t) = s(n). Fig. 1. Spectrum (left figure) and its Meyer scale function with N = 33 samples (right figure) Formula (3) has two disadvantages very important from the computational point of view. Firstly, it is difficult to compute integrals numerically when wavelet supports are unlimited. Secondly, the numerical computations of integrals are time-consuming, because the high quality standard needs series (1) for each second of the recorded speech signal. Therefore instead of formula (3), we used approximation
3 Wavelet Transform in Speech Segmentation 3 c m+1,i = s(n) φ m+1,i (n), n D i (4) where D i are compact supports of φ m+1,i. The support of scale function φ(t) must be compact to provide the fast calculations in the real time. It is common feature of the scale functions that φ(t) 0 very fast as t +. In practice the support can be limited to the segment [ T,T] where T = max {t R : φ(t) h}. (5) The threshold h should depend on the extreme value of the scale function. We choose condition h = α max φ(t), where α can be taken arbitrary, e.g. t α = In that way, the support of scale function was bounded to obtain the reasonable compromise: fast computations in real time and relatively small errors. The number of samples should be the smallest integer value N which satisfies inequality (N 1) t 2T, that is N T because the sampling frequency f s = 1/ t = Hz. The sampling density in the frequency domain f = 0.5/T and (N 1) f Hz because the whole frequency band is spread from 8000 to 8000 Hz. The coefficients of the lower level are calculated by applying the well known (see [3, 9]) formulae c m,n = i d m,n = i h i 2n c m+1,i (6) g i 2n c m+1,i (7) where {h i } and {g i } are the coefficients which depend on the assumed pair: scale function φ and wavelet ψ. In other words, the speech spectrum is decomposed using digital filtering and downsampling procedures defined by (6) and (7). It means that given the wavelet coefficients c m+1,i of the (m + 1)th resolution level, (6) and (7) are applied to compute the coefficients of the mth resolution level. The coefficients of next resolution levels are calculated recursively by applying formulae (6) and (7). The multiresolution analysis gives a hierarchical and fast scheme for the computation of the wavelet spectrum for a given signal s. The undertaken experiments show that the speech signal decomposition into six levels is sufficient (see Table 1) to cover the frequency band of voice. The energy of the speech signal above 8 khz and below 125 Hz is very low and can be neglected. The above presented wavelet decomposition leads to series s(n) = i M c 1,i φ 1,i (n) + d m,i ψ m,i (n) (8) m=1 i
4 4 M. Ziółko, J. Gałka and T. Drwięga Decomposition level m Frequency band [Hz] Approximation Table 1. Frequency division obtained for M = 6 levels of dyadic wavelet decomposition. Sampling frequency f s = 16 khz where φ 1,i (n) = 2 (1 M)/2 { φ (( 2 1 M n i ) t ) if M n i N 1 0 for other 2 1 M n i (9) and ψ m,i (n) = 2 (m M)/2 { ψ (( 2 1 M n i ) t ) if 0 2 m M n i N 1 0 for other 2 m M n i (10) The elements of the DWT for a mth level may be collected into a vector d m = (d m,1,d m,2,...) T. In this way the values of DWT for M + 1 levels can be obtained. It means that discrete wavelet spectrum DWT (s) = {d M,d M 1,...,d 1,c 1 } (11) is created from the coefficients of series (8). 3 Segmentation Scheme The role of the segmentation algorithm is to detect significant transitions of the energy among the wavelet sub-bands. When significant enough transition is found, it is marked and scored as a spectral-phonetic event. It is assumed that events occur when the energy transition changes the order of the powersorted bands. The non-uniform segmentation algorithm consists of the following steps: 1. Decompose signal s into the six levels of DWT = {d 6,n,d 5,n,...,d 1,n }. 2. Calculate the sum of power samples in all frequency sub-bands according to rule B m,k = k 2 6 m n=(k 1) 2 6 m +1 d 2 m,n. (12)
5 Wavelet Transform in Speech Segmentation 5 3. Calculate the power envelopes as a running mean values B env m,k = 2 K 2 k+ 1 K n=k K 2 B m,n, (13) where K = 2 M t µ f s for expected mean duration t µ of the segment of speech. For the given t µ = 100 ms, f s = 16 khz and M = 6 we obtain K = 25 samples. 4. Generate importance matrix M = [M m,k ] R 6 L of frequency bands by sorting the envelopes in each time k position i.e. M k = {m i } 6 i=1 : Benv m 1,k Bm env 2,k Bm env 3,k Bm env 4,k Bm env 5,k Bm env 6,k where L depends on the length of the speech signal. 5. Compute event-function f (k) = 6 m=1 M m,k+1 M m,k. (14) m 6. Segment border s locations can now be extracted from f (k) by choosing its local maxima, which fulfill two conditions: each of the chosen maximum has to be the highest value within the neighborhood of t min milliseconds, which is related to minimal assumed segment duration, local maximum is greater than specified threshold f tr. Time-range condition rejects multiple changes related to the same border and segments shorter than t min. Threshold adjusts sensitivity of the segmentation. By increasing its value we reduce the number of chosen events. It is reasonable to set its value on-line, according to f tr (k) = β P n= P 2P f (k n) where P is adaptation range corresponding to 100 milliseconds., (15) 4 Conclusions Presented algorithm was tested using Polish annotated speech database - Corpora 97. The speech of five different persons, with 1825 utterances were used for evaluation. These utterances include all of the 37 phonemes of Polish language and its natural concatenations. Reference phonetic annotation of speech was known, since it had been prepared earlier. Various values of the detection
6 6 M. Ziółko, J. Gałka and T. Drwięga parameters t min and β were used in order to find the combination producing the less number of errors. The best results were obtained for parameter t min set in the range milliseconds. In this range phone recognition, insertion and deletion rates are taking their best values. Threshold adaptation factor β does not affect mentioned rates when is set within 0 1. When β obtains the values greater than 1, results degrade considerably because of increase the rate of deletions, which are the most corrupting errors in speech segmentation (see [6]). It must be mentioned, that segmentation procedure uses acoustic, not phonetic features of speech. It will result in increased level of insertion rate because some phonemes are not acoustically uniform. This feature, however, does not affect overall performance of speech recognition systems (see [6, 7]). The use of wavelet analysis turns out to be an effective tool in finding the boundaries between two phonemes. The use of non-uniform segmentation reduces total number of segments to be processed by higher-level parts of ASR systems (HMM modeling). The effect is a significant decrease of Viterbi decoding search-space and computational cost. 5 Acknowledgments We would like to thank Stefan Grocholewski form Institute of Computer Science, Poznań University of Technology for providing a corpus of spoken Polish - Corpora 97. This work was supported by grant R References 1. A. Alani and M. Deriche, Proceedings of The Fifth International Symposium on Signal Processing and its Applications, (1999) 2. S. Cheng and H. Wang, Proceedings of 8th European Conference on Speech Communication and Technology - EUROSPEECH, (2003) 3. I. Daubechies, Ten lectures on Wavelets (SIAM, 1992) 4. K. Demuynck and T. Laureys, Proceedings of the 5th International Conference on Text, Speech and Dialogue, (2002) 5. O. Farooq and S. Datta, IEE Proceedings: Vision, Image and Signal Processing, 151(3), (2004) 6. J. Gałka and B. Ziółko, NAUN International Journal Of Circuits, Systems And Signal Processing, 2(1), (2007) 7. S. Grocholewski, Proceedings of International Conference on Language Resources and Evaluation, (1998) 8. Y. Meyer, Wavelets and applications (Masson, 1991) 9. O. Rioul and M. Vetterli, IEEE Signal Processing Magazine, 8, (1991)
CS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationPhoneme segmentation based on spectral metrics
Phoneme segmentation based on spectral metrics Xianhua Jiang, Johan Karlsson, and Tryphon T. Georgiou Introduction We consider the classic problem of segmenting speech signals into individual phonemes.
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationECE472/572 - Lecture 13. Roadmap. Questions. Wavelets and Multiresolution Processing 11/15/11
ECE472/572 - Lecture 13 Wavelets and Multiresolution Processing 11/15/11 Reference: Wavelet Tutorial http://users.rowan.edu/~polikar/wavelets/wtpart1.html Roadmap Preprocessing low level Enhancement Restoration
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationDynamic Time-Alignment Kernel in Support Vector Machine
Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More information446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and
Vol. 46 No. 6 SCIENCE IN CHINA (Series F) December 003 Construction for a class of smooth wavelet tight frames PENG Lizhong (Λ Π) & WANG Haihui (Ξ ) LMAM, School of Mathematical Sciences, Peking University,
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationIntroduction to Wavelet. Based on A. Mukherjee s lecture notes
Introduction to Wavelet Based on A. Mukherjee s lecture notes Contents History of Wavelet Problems of Fourier Transform Uncertainty Principle The Short-time Fourier Transform Continuous Wavelet Transform
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationCochlear modeling and its role in human speech recognition
Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL Allen/IPAM
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationFeature Extraction for ASR: Pitch
Feature Extraction for ASR: Pitch Wantee Wang 2015-03-14 16:55:51 +0800 Contents 1 Cross-correlation and Autocorrelation 1 2 Normalized Cross-Correlation Function 3 3 RAPT 4 4 Kaldi Pitch Tracker 5 Pitch
More informationMULTIRATE DIGITAL SIGNAL PROCESSING
MULTIRATE DIGITAL SIGNAL PROCESSING Signal processing can be enhanced by changing sampling rate: Up-sampling before D/A conversion in order to relax requirements of analog antialiasing filter. Cf. audio
More informationFEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes
FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION Xiao Li and Jeff Bilmes Department of Electrical Engineering University. of Washington, Seattle {lixiao, bilmes}@ee.washington.edu
More informationWavelets in Pattern Recognition
Wavelets in Pattern Recognition Lecture Notes in Pattern Recognition by W.Dzwinel Uncertainty principle 1 Uncertainty principle Tiling 2 Windowed FT vs. WT Idea of mother wavelet 3 Scale and resolution
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationLittlewood Paley Spline Wavelets
Proceedings of the 6th WSEAS International Conference on Wavelet Analysis & Multirate Systems, Bucharest, Romania, October 6-8, 26 5 Littlewood Paley Spline Wavelets E. SERRANO and C.E. D ATTELLIS Escuela
More informationWavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).
Wavelet Transform Andreas Wichert Department of Informatics INESC-ID / IST - University of Lisboa Portugal andreas.wichert@tecnico.ulisboa.pt September 3, 0 Short Term Fourier Transform Signals whose frequency
More informationAn Introduction to Wavelets and some Applications
An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54
More informationAssignment #09 - Solution Manual
Assignment #09 - Solution Manual 1. Choose the correct statements about representation of a continuous signal using Haar wavelets. 1.5 points The signal is approximated using sin and cos functions. The
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationDigital Image Processing
Digital Image Processing Wavelets and Multiresolution Processing (Wavelet Transforms) Christophoros Nikou cnikou@cs.uoi.gr University of Ioannina - Department of Computer Science 2 Contents Image pyramids
More informationHierarchical Multi-Stream Posterior Based Speech Recognition System
Hierarchical Multi-Stream Posterior Based Speech Recognition System Hamed Ketabdar 1,2, Hervé Bourlard 1,2 and Samy Bengio 1 1 IDIAP Research Institute, Martigny, Switzerland 2 Ecole Polytechnique Fédérale
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationExploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition
Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition Shivesh Ranjan Abstract In this paper, we propose a new scheme for recognition of isolated words in Hindi Language speech,
More informationQuadrature Prefilters for the Discrete Wavelet Transform. Bruce R. Johnson. James L. Kinsey. Abstract
Quadrature Prefilters for the Discrete Wavelet Transform Bruce R. Johnson James L. Kinsey Abstract Discrepancies between the Discrete Wavelet Transform and the coefficients of the Wavelet Series are known
More informationEvolutionary Power Spectrum Estimation Using Harmonic Wavelets
6 Evolutionary Power Spectrum Estimation Using Harmonic Wavelets Jale Tezcan Graduate Student, Civil and Environmental Engineering Department, Rice University Research Supervisor: Pol. D. Spanos, L.B.
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationArtificial Intelligence Markov Chains
Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010
More information2D Wavelets. Hints on advanced Concepts
2D Wavelets Hints on advanced Concepts 1 Advanced concepts Wavelet packets Laplacian pyramid Overcomplete bases Discrete wavelet frames (DWF) Algorithme à trous Discrete dyadic wavelet frames (DDWF) Overview
More informationUpper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition
Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation
More informationTraditionally a small part of a speech corpus is transcribed and segmented by hand to yield bootstrap data for ASR or basic units for concatenative sp
PROBABILISTIC ANALYSIS OF PRONUNCIATION WITH 'MAUS' Florian Schiel, Andreas Kipp Institut fur Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universitat Munchen ABSTRACT This paper describes
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationR E S E A R C H R E P O R T Entropy-based multi-stream combination Hemant Misra a Hervé Bourlard a b Vivek Tyagi a IDIAP RR 02-24 IDIAP Dalle Molle Institute for Perceptual Artificial Intelligence ffl
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationIDIAP. Martigny - Valais - Suisse ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION. Frederic BIMBOT + Dominique GENOUD *
R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse LIKELIHOOD RATIO ADJUSTMENT FOR THE COMPENSATION OF MODEL MISMATCH IN SPEAKER VERIFICATION Frederic BIMBOT + Dominique GENOUD * IDIAP{RR
More informationUnsupervised Vocabulary Induction
Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After
More informationMarkov processes on curves for automatic speech recognition
Markov processes on curves for automatic speech recognition Lawrence Saul and Mazin Rahim AT&T Labs - Research Shannon Laboratory 180 Park Ave E-171 Florham Park, NJ 07932 {lsaul,rnazin}gresearch.att.com
More informationLogarithmic quantisation of wavelet coefficients for improved texture classification performance
Logarithmic quantisation of wavelet coefficients for improved texture classification performance Author Busch, Andrew, W. Boles, Wageeh, Sridharan, Sridha Published 2004 Conference Title 2004 IEEE International
More informationComparison of Wavelet Families with Application to WiMAX Traffic Forecasting
Comparison of Wavelet Families with Application to WiMAX Traffic Forecasting Cristina Stolojescu 1,, Ion Railean, Sorin Moga, Alexandru Isar 1 1 Politehnica University, Electronics and Telecommunications
More informationWavelet analysis on financial time series. By Arlington Fonseca Lemus. Tutor Hugo Eduardo Ramirez Jaime
Wavelet analysis on financial time series By Arlington Fonseca Lemus Tutor Hugo Eduardo Ramirez Jaime A thesis submitted in partial fulfillment for the degree of Master in Quantitative Finance Faculty
More informationwhere =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag
Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power
More informationJorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function
890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth
More informationOn Homogeneous Segments
On Homogeneous Segments Robert Batůšek, Ivan Kopeček, and Antonín Kučera Faculty of Informatics, Masaryk University Botanicka 68a, 602 00 Brno Czech Republic {xbatusek,kopecek,tony}@fi.muni.cz Abstract.
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationFuzzy quantization of Bandlet coefficients for image compression
Available online at www.pelagiaresearchlibrary.com Advances in Applied Science Research, 2013, 4(2):140-146 Fuzzy quantization of Bandlet coefficients for image compression R. Rajeswari and R. Rajesh ISSN:
More informationIdentification and Classification of High Impedance Faults using Wavelet Multiresolution Analysis
92 NATIONAL POWER SYSTEMS CONFERENCE, NPSC 2002 Identification Classification of High Impedance Faults using Wavelet Multiresolution Analysis D. Cha N. K. Kishore A. K. Sinha Abstract: This paper presents
More informationWavelet Packet Based Digital Image Watermarking
Wavelet Packet Based Digital Image ing A.Adhipathi Reddy, B.N.Chatterji Department of Electronics and Electrical Communication Engg. Indian Institute of Technology, Kharagpur 72 32 {aar, bnc}@ece.iitkgp.ernet.in
More information1 Introduction to Wavelet Analysis
Jim Lambers ENERGY 281 Spring Quarter 2007-08 Lecture 9 Notes 1 Introduction to Wavelet Analysis Wavelets were developed in the 80 s and 90 s as an alternative to Fourier analysis of signals. Some of the
More informationOn the relationship between intra-oral pressure and speech sonority
On the relationship between intra-oral pressure and speech sonority Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves Interspeech 2005 1 We address the question of the relationship between
More informationTone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition
Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology
More informationENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH
ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH Wolfgang Wokurek Institute of Natural Language Processing, University of Stuttgart, Germany wokurek@ims.uni-stuttgart.de, http://www.ims-stuttgart.de/~wokurek
More informationLet p 2 ( t), (2 t k), we have the scaling relation,
Multiresolution Analysis and Daubechies N Wavelet We have discussed decomposing a signal into its Haar wavelet components of varying frequencies. The Haar wavelet scheme relied on two functions: the Haar
More informationWavelets and Multiresolution Processing
Wavelets and Multiresolution Processing Wavelets Fourier transform has it basis functions in sinusoids Wavelets based on small waves of varying frequency and limited duration In addition to frequency,
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationIntroduction to Biomedical Engineering
Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationHaar wavelets. Set. 1 0 t < 1 0 otherwise. It is clear that {φ 0 (t n), n Z} is an orthobasis for V 0.
Haar wavelets The Haar wavelet basis for L (R) breaks down a signal by looking at the difference between piecewise constant approximations at different scales. It is the simplest example of a wavelet transform,
More informationAN INVERTIBLE DISCRETE AUDITORY TRANSFORM
COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationSoundex distance metric
Text Algorithms (4AP) Lecture: Time warping and sound Jaak Vilo 008 fall Jaak Vilo MTAT.03.90 Text Algorithms Soundex distance metric Soundex is a coarse phonetic indexing scheme, widely used in genealogy.
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationA NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY
A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY Amir Hossein Harati Nead Torbati and Joseph Picone College of Engineering, Temple University Philadelphia, Pennsylvania, USA
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationReducing False Alarm Rate in Anomaly Detection with Layered Filtering
Reducing False Alarm Rate in Anomaly Detection with Layered Filtering Rafa l Pokrywka 1,2 1 Institute of Computer Science AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationModule 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example
The Lecture Contains: Wavelets Discrete Wavelet Transform (DWT) Haar wavelets: Example Haar wavelets: Theory Matrix form Haar wavelet matrices Dimensionality reduction using Haar wavelets file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture35/35_1.htm[6/14/2012
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,
More informationMaximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems
Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of
More informationNumerical Differential Protection of Power Transformer using Algorithm based on Fast Haar Wavelet Transform
IDIA ISTITUTE O TECHOLOGY, KHARAGPUR 7232, DECEMBER 27-29, 22 59 umerical Differential Protection of Power Transformer using Algorithm based on ast Haar Wavelet Transform K. K. Gupta and D.. Vishwakarma
More informationProblem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity
Problem with Fourier Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Fourier analysis -- breaks down a signal into constituent sinusoids of
More informationWavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG.
Wavelets: a preview February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG. Problem with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationSegment boundary detection via class entropy measurements in connectionist phoneme recognition q
Speech Communication 48 (2006) 1666 1676 www.elsevier.com/locate/specom Segment boundary detection via class entropy measurements in connectionist phoneme recognition q Giampiero Salvi * KTH, Royal Institute
More informationMultimedia Networking ECE 599
Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationWavelets in Scattering Calculations
Wavelets in Scattering Calculations W. P., Brian M. Kessler, Gerald L. Payne polyzou@uiowa.edu The University of Iowa Wavelets in Scattering Calculations p.1/43 What are Wavelets? Orthonormal basis functions.
More informationISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM
ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationSYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS
SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More information