Lecture 7: Feature Extraction
|
|
- Abel Hubert Tyler
- 6 years ago
- Views:
Transcription
1 Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28
2 Table of Content Acoustic features for speech recognition Dynamic features Feature projection: LDA & HLDA Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 2 / 28
3 Why We Need Feature Extraction? Pure spectrum has: Rich information- computational cost is high Redundant (irrelevant/disturbing) information for ASR - not effective Need to get more compact and effective features from spectrum for ASR Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 3 / 28
4 Commonly Used Features for Speech Recognition Short-term spectra are used to describe speech signals. Useful features extracted from short-term spectra include: Linear Prediction Coefficients (LPC) Mel Frequency Cepstral Coefficients (MFCC) Perceptual Linear Prediction Coefficients (PLP) Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 4 / 28
5 Linear Prediction Coefficient (LPC) Given the singal x = [x 1,, x T ], a linear predictor of order n predicts the sample at time t as a weighted linear interpolation of its n preceding samples: n ˆx t = a i x t i ˆx = Ma i=1 where x 0 x 1 x 2 x n+1 x 1 x 0 x 1 x n+2 M = a = x T 1 x T 2 x T 3 x n+t a 1 a 2. a n where a i, 1 i n are known as linear prediction coefficients, M is known as a Toeplitz matrix. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 5 / 28
6 Linear Prediction Coefficient (LPC) Find Linear Prediction Coefficient Note that L mse = 1 T T (ˆx t x t ) 2 = 1 T (ˆx x) (ˆx x) t=1 ˆx t = n a i x t i i=1 1 t T where a i, 1 i n are linear prediction coefficients. Do we want to the direct solution? Not really. a = (MM ) 1 M x Levinson-Durbin algorithm is an auto-correlation algorithm which makes use of the Toplitz property of M Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 6 / 28
7 Linear Prediction Coefficient (LPC) Spectral Envelopes from LPC TOP: waveform of sound aa MIDDLE & BOTTOM: Spectral magnitude plotted on log scale Spectral envelop is plotted as a smooth red line MIDDLE: 10-order LP BOTTOM: 25-order LP Higher order LP tracks spectral magnitude more precisely Envelope peaks can be used to determine formant locations Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 7 / 28
8 Filter A filter is a device or process that removes from a signal some unwanted component or feature. It is usually (though not necessarily) applied in frequency domain. Y (ω) = H(ω)X(ω) Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 8 / 28
9 Filter Bank Coefficients Spectral magnitude by STFT contains too much information Filter bank is a series (bank) of bandpass filters (typically triangular filters) Each bandpass filter produces one coefficient corresponding to the sum of bandpassed signal Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 9 / 28
10 Mel Scale Mel scale is is a perceptual scale of pitches judged by listeners to be equal in distance from one another. The Mel function is a non-linear mapping between the frequency and the Mel scales: Mel(f) = 2595 log 10 (1 + f 700 ) Note that points at equal distance apart in the Mel has a higher resolution at the lower frequencies. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 10 / 28
11 Mel Scale Filter Bank Coefficients m i = F i k=f i s(k)t i (k) where m i is the i th filter bank coefficient, f i and F i are the start and end frequency of the triangular filter and, s(k) is the spectral power (sometimes magnitude) at frequency bin k and T i (k) is the triangular filter bank value. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 11 / 28
12 Mel Frequency Cepstral Coefficient (MFCC) MFCC is widely used in many speech processing techniques. They are derived from the Mel Filter Bank Coefficients by: 1. Take logarithm of N log-filterbank coefficients 2. Compute Cepstral coefficients using Discrete Cosine Transform (DCT) c n = 2 N fb N fb j=1 ( ) πn log(m j ) cos (j 0.5) N fb n = 1, 2,, N mfcc where c n is the n th MFCC coefficient, m j is the j th mel scale filter bank coefficient and N fb and N mfcc are the number of filter bank and final MFCC coefficients respectively (usually N mfcc = 12, N fb varies from 20-30). Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 12 / 28
13 From Spectrum to Cepstrum The vector of spectral energies is not used directly because Speech power spectra are not Gaussin All coefficients are sensitive to the loudness Neighboring coefficients are highly correlated Discrete Cosine Transform can effectively remove the dependency between coefficients. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 13 / 28
14 Discrete Cosine Transform (DCT) DCT is a linear transform: c 1 c 2 cos 2.0. = N fb. cos c Nmfcc ( ) π(0.5) N fb ( ) πnmfcc(0.5) N fb ( cos π(nfb0.5).... cos N fb ) ( ) πnmfcc(nfb0.5) N fb log(m 1 ) log(m 2 ). log(m Nfb ) Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 14 / 28
15 Basis Function of DCT Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 15 / 28
16 Perceptual Linear Prediction (PLP) Perceptual linear prediction (PLP) is a well known and widely used feature extraction technique incorporating a perception into the front-end. It is believed to be more robust to noise. The PLP coefficients can be derived from the filter bank coefficients by applying: Apply equal loudness pre-emphasis curve and compression ( ˆm k = (L k m k ) β L k = f 2 k f 2 k + 1.6e5 ) 2 ( f 2 ) 2 k e6 fk e6 Apply inverse DFT to filter pre-emphasised bank to yields auto-correlation coefficients Apply Levinson-Durbin algorithm to get LP coefficients Convert LP coefficients to cepstral coefficients Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 16 / 28
17 Different Energy Terms Energy information is very important for speech recognition. Spectral energy E = 1 N N n=1 x 2 n E = 1 N N x n n=1 0 th cepstral coefficient C0 MFCC PLP 2.0 Nfb N fb k=1 m[k] log(lpc Gain) Energy normalization is important due to energy variations over different channels Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 17 / 28
18 Dynamic Features in Speech Recogntion Concept MFCC or PLP describe the instantaneous speech signal spectrum, but can not describe signal dynamics. This situation can be improved by including coefficients differentials into feature vector. o = c c c Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 18 / 28
19 Dynamic Features in Speech Recogntion Calculation Simple differential coefficients can be calculated as n = c n+δ c n δ 2δ More robust estimation uses regression coefficients to calculate the best straight line through a number of frames (here 2σ + 1) δ i=1 i(c n+i c n i ) n = 2 δ i=1 i2 Higher order differential coefficients can be obtained by applying the above recursively δ i=1 n = i( n+i n i ) 2 δ i=1 i2 Typically 2 nd or 3 rd differential are used Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 19 / 28
20 Linear Disriminant Analysies (LDA) Linear Disriminant Analysies (LDA) is a linear projection scheme to find matrix of dimensions p n, where n is the original vector size and p n,which map onto feature space which is best for discrimination. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 20 / 28
21 Fisher s Linear Discriminant Recap Criterion: L(w) = w S B w w S w w where within and between class covariances are S B = (µ 1 µ 2 )(µ 1 µ 2 ) S w = Σ 1 + Σ 2 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 21 / 28
22 Criterion for LDA L(A [p] ) = diag(a [p]ba [p] ) diag(a [p] WA [p] ) Between class matrix B = m,t γ m(t)(µ (m) µ (g) )(µ (m) µ (g) ) m,t γ m(t) Global within class matrix W = m,t γ m(t)(o(t) µ (m) )(o(t) µ (m) ) m,t γ m(t) Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 22 / 28
23 Linear Disriminant Analysies (LDA) A simple 2-dimension example The two classes have the same covariation matrix. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 23 / 28
24 Heteroscedastic LDA (HLDA) An extended version of LDA where each class has its own covariance matrices Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 24 / 28
25 Comparison Between LDA and HLDA LDA: Global within class matrix m,t W LDA = γ m(t)(o(t) µ (m) )(o(t) µ (m) ) m,t γ m(t) HLDA: Local within class matrix W HLDA = t γ m(t)(o(t) µ (m) )(o(t) µ (m) ) m t γ m(t) Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 25 / 28
26 HTK - Feature Extraction (HCopy) HCopy -C mfcc.cfg -S digit.wav2mfcc.scp Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 26 / 28
27 HTK - Supported Feature Types LPC: Linear Prediction Coefficient MELSPEC: Mel-frequency spectral magnitude FBANK: Log filter bank coefficient MFCC: Mel Frequency Cepstral Coefficient PLP: Perceptral Linear Prediction coefficient Additional qualifiers: Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 27 / 28
28 HTK - View Feature Files (HList) HList -h data.mfcc Actual format: Note, HTK stores data in big endian format. Can specify NATURALREAD and NATURALWRITE to override defaults. Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 28 / 28
Feature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationTopic 6. Timbre Representations
Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationMel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationFEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION
FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationCEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.
CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationL6: Short-time Fourier analysis and synthesis
L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude
More informationEECS E Speech Recognition Lecture 2
EECS E6870 - Speech Recognition Lecture 2 Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, USA stanchen@us.ibm.com picheny@us.ibm.com bhuvana@us.ibm.com
More informationSpeech Enhancement with Applications in Speech Recognition
Speech Enhancement with Applications in Speech Recognition A First Year Report Submitted to the School of Computer Engineering of the Nanyang Technological University by Xiao Xiong for the Confirmation
More informationTime and frequency ltering of lter-bank energies for robust HMM speech recognition
Speech Communication 34 (2001) 93±114 www.elsevier.nl/locate/specom Time and frequency ltering of lter-bank energies for robust HMM speech recognition Climent Nadeu *,Dusan Macho, Javier Hernando TALP
More informationSpectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates
Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationRobust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen
Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR
More informationFeature Reduction with Linear Discriminant Analysis and its Performance on Phoneme Recognition
Feature Reduction with Linear Discriminant Analysis and its Performance on Phoneme Recognition Stefan Geirhofer University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationMVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes
MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report
More informationSpeaker Identification Based On Discriminative Vector Quantization And Data Fusion
University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou
More informationOn Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR
On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR Vivek Tyagi a,c, Hervé Bourlard b,c Christian Wellekens a,c a Institute Eurecom, P.O Box: 193, Sophia-Antipolis, France.
More informationSignal Modeling Techniques In Speech Recognition
Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationHidden Markov Model Based Robust Speech Recognition
Hidden Markov Model Based Robust Speech Recognition Vikas Mulik * Vikram Mane Imran Jamadar JCEM,K.M.Gad,E&Tc,&Shivaji University, ADCET,ASHTA,E&Tc&Shivaji university ADCET,ASHTA,Automobile&Shivaji Abstract
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationModel-based Approaches to Robust Speech Recognition in Diverse Environments
Model-based Approaches to Robust Speech Recognition in Diverse Environments Yongqiang Wang Darwin College Engineering Department Cambridge University October 2015 This dissertation is submitted to the
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More information200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs.
Overview Acoustics of Speech and Hearing Lecture 1-2 How is sound pressure and loudness related? How can we measure the size (quantity) of a sound? The scale Logarithmic scales in general Decibel scales
More informationUniversity of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015
University of Colorado at Boulder ECEN 4/5532 Lab 2 Lab report due on February 16, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1
More informationStress detection through emotional speech analysis
Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or
More informationA Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis
A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav Spectral Flatness
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationZeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals
Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017
More informationL8: Source estimation
L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction
More informationApplications of Linear Prediction
SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.
More informationNoise Reduction. Two Stage Mel-Warped Weiner Filter Approach
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards
More informationRobust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization
Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization Qiang Wu, Liqing Zhang, and Guangchuan Shi Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai
More informationDISCRETE-TIME SIGNAL PROCESSING
THIRD EDITION DISCRETE-TIME SIGNAL PROCESSING ALAN V. OPPENHEIM MASSACHUSETTS INSTITUTE OF TECHNOLOGY RONALD W. SCHÄFER HEWLETT-PACKARD LABORATORIES Upper Saddle River Boston Columbus San Francisco New
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationLinear Prediction Coding. Nimrod Peleg Update: Aug. 2007
Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is
More informationTHE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague
THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract
More informationTimbral, Scale, Pitch modifications
Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications
More informationPHASE-SPACE REPRESENTATION OF SPEECH REVISITING THE DELTA AND DOUBLE DELTA FEATURES. Hua Yu
PHASE-SPACE REPRESENTATION OF SPEECH REVISITING THE DELTA AND DOUBLE DELTA FEATURES Hua Yu Interactive Systems Labs, Carnegie Mellon University, Pittsburgh, PA 15213 hyu@cs.cmu.edu ABSTRACT Speech production
More informationSpeech Enhancement Preprocessing to J- RASTA-PLP
Speech Enhancement Preprocessing to J- RASTA-PLP Michael Shire EECS 225D Prof. Morgan and Prof. Gold 1.0 Introduction The concept of the experiment presented here is to investigate whether a process that
More informationUSEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK
Volume 17 HYDROACOUSTICS USEFULNESS OF LINEAR PREDICTIVE CODING IN HYDROACOUSTICS SIGNATURES FEATURES EXTRACTION ANDRZEJ ZAK Polish Naval Academy Smidowicza 69, 81-103 Gdynia, Poland a.zak@amw.gdynia.pl
More informationFrog Sound Identification System for Frog Species Recognition
Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,
More informationDiscriminant Feature Space Transformations for Automatic Speech Recognition
Discriminant Feature Space Transformations for Automatic Speech Recognition Vikrant Tomar McGill ID: 260394445 Department of Electrical & Computer Engineering McGill University Montreal, Canada February
More informationSNR Features for Automatic Speech Recognition
SNR Features for Automatic Speech Recognition Philip N. Garner Idiap Research Institute Martigny, Switzerland pgarner@idiap.ch Abstract When combined with cepstral normalisation techniques, the features
More information[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY TAJWEED UTOMATION SYSTEM USING HIDDEN MARKOUV MODEL AND NURAL NETWORK Safaa Omer Mohammed Nssr*, Hoida Ali Abdelgader SUDAN UNIVERSITY
More informationEnvironmental and Speaker Robustness in Automatic Speech Recognition with Limited Learning Data
University of California Los Angeles Environmental and Speaker Robustness in Automatic Speech Recognition with Limited Learning Data A dissertation submitted in partial satisfaction of the requirements
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationUncertainty Decoding for Noise Robust Speech Recognition
Uncertainty Decoding for Noise Robust Speech Recognition Hank Liao Sidney Sussex College University of Cambridge September 2007 This dissertation is submitted for the degree of Doctor of Philosophy to
More informationEstimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES, Vol. 60, No. 1, 2012 DOI: 10.2478/v10175-012-0011-z Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram
More informationParametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion
Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides were adapted
More informationDigital Speech Processing Lecture 10. Short-Time Fourier Analysis Methods - Filter Bank Design
Digital Speech Processing Lecture Short-Time Fourier Analysis Methods - Filter Bank Design Review of STFT j j ˆ m ˆ. X e x[ mw ] [ nˆ m] e nˆ function of nˆ looks like a time sequence function of ˆ looks
More informationISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM
ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition
More informationADVANCED SPEAKER RECOGNITION
ADVANCED SPEAKER RECOGNITION Amruta Anantrao Malode and Shashikant Sahare 1 Department of Electronics & Telecommunication, Pune University, Pune, India ABSTRACT The domain area of this topic is Bio-metric.
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationFusion of Spectral Feature Sets for Accurate Speaker Identification
Fusion of Spectral Feature Sets for Accurate Speaker Identification Tomi Kinnunen, Ville Hautamäki, and Pasi Fränti Department of Computer Science University of Joensuu, Finland {tkinnu,villeh,franti}@cs.joensuu.fi
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More information1. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix),R c =
ENEE630 ADSP Part II w/ solution. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix) R a = 4 4 4,R b = 0 0,R c = j 0 j 0 j 0 j 0 j,r d = 0 0 0
More informationWaveform-Based Coding: Outline
Waveform-Based Coding: Transform and Predictive Coding Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu/~yao Based on: Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and
More informationASPEAKER independent speech recognition system has to
930 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 Vocal Tract Normalization Equals Linear Transformation in Cepstral Space Michael Pitz and Hermann Ney, Member, IEEE
More informationLecture 4 - Spectral Estimation
Lecture 4 - Spectral Estimation The Discrete Fourier Transform The Discrete Fourier Transform (DFT) is the equivalent of the continuous Fourier Transform for signals known only at N instants separated
More informationSinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,
Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech
More informationAdvanced Digital Signal Processing -Introduction
Advanced Digital Signal Processing -Introduction LECTURE-2 1 AP9211- ADVANCED DIGITAL SIGNAL PROCESSING UNIT I DISCRETE RANDOM SIGNAL PROCESSING Discrete Random Processes- Ensemble Averages, Stationary
More informationDeep Learning for Automatic Speech Recognition Part I
Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech
More informationSPEECH enhancement has been studied extensively as a
JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)
More informationReal Sound Synthesis for Interactive Applications
Real Sound Synthesis for Interactive Applications Perry R. Cook я А К Peters Natick, Massachusetts Contents Introduction xi 1. Digital Audio Signals 1 1.0 Introduction 1 1.1 Digital Audio Signals 1 1.2
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationStatistical and Adaptive Signal Processing
r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory
More informationEfficient Block Quantisation for Image and Speech Coding
Efficient Block Quantisation for Image and Speech Coding Stephen So, BEng (Hons) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith University, Brisbane, Australia
More informationOLA and FBS Duality Review
MUS421/EE367B Lecture 10A Review of OverLap-Add (OLA) and Filter-Bank Summation (FBS) Interpretations of Short-Time Fourier Analysis, Modification, and Resynthesis Julius O. Smith III (jos@ccrma.stanford.edu)
More informationM. Hasegawa-Johnson. DRAFT COPY.
Lecture Notes in Speech Production, Speech Coding, and Speech Recognition Mark Hasegawa-Johnson University of Illinois at Urbana-Champaign February 7, 000 M. Hasegawa-Johnson. DRAFT COPY. Chapter Linear
More information