Stress detection through emotional speech analysis

Size: px
Start display at page:

Download "Stress detection through emotional speech analysis"

Transcription

1 Stress detection through emotional speech analysis INMA MOHINO ROBERTO GIL-PITA LORENA ÁLVAREZ PÉREZ Abstract: Stress is a reaction or response of the subject to face up the daily mental, emotional or physical challenges. Continuous scanning of stress levels of a subject is a key point to understand and control personal stress. Stress is expressed by physiological changes, emotional reactions, and conduct changes. Some of the physiological changes are the increase of adrenaline produced to intensify the concentration, or the rise in heart rate and the acceleration of the reflexes. Concerning emotional reactions, they can be expressed by changes in the prosody of speech. In this paper we study the design of a classification system of stress levels using emotional speech analysis. For this purpose, linear discriminant combined with bootstrapping techniques are useful tools in order to implement classifiers. Results demonstrate the feasibility of the proposed system, obtaining error rates lower than 33% Key Words: emotional speech, speech-processing, stress detection. 1 Introduction In recent years there has been a tremendous job in studying the parameters of emotions in the human voice, fundamentally divided in two different research lines: the artificial production of emotional sounds [1, 2], and the classification of emotional states [3, 4, 5, 6, 7]. In the first line, researches focus on the study of the characteristics of speech signals produced under different emotional states of the subject, and its relationship with the language. Several and algorithms are proposed in the literature [1], and a review of the state of the art can be found in [2]. Concerning the classification of emotional states, the objective is to determine the emotional state of the subject given a speech signal, from a limited set of available states. From the results presented in the literature, the with greater classification capability are related to the pitch, and they are widely studied in [3] and [4]. Furthermore, a comprehensive study of the most used in the recognition of emotions can be found in [5], where those based on the pitch are again shown as the most discriminatory. Other papers focus on selecting a suitable reduced set of, in order to improve generalization capability of the classifiers, like [6], in which automatic classification is used to select a minimum set of, or [7], in which a detailed study of a huge number of parameters is included, with the purpose of selecting with linear independence. This last paper concludes that with only 6 a high rate of classification success can be achieved. In this paper, the objective is not to simply classify the different emotional states, but to distinguish the level of excitation from the emotions, with the aim of predicting stress levels. For this study, we use the public database The Berlin Database of Emotional Speech, described in [1], and we carry out a set of experiments aiming at studying the combination of extracted from the literature and their effects over the classification performance. 2 Materials and Methods This section includes a brief description of the classification method (the least-square linear classifier) and a description of the database. 2.1 Standard MSE minimization of a diagonal linear discriminant Linear classifiers are characterized by the use of linear decision boundaries, which implies that they cannot discriminate classes associated in very complex shapes. Let us consider a set of training patterns x = [x 1, x 2,..., x L ] T, where each of these patters is assigned to one of the possible classes denoted as C i, i = 1,, K. In a linear classifier, the decision rule is obtained using a set of K linear combinations of the training patterns, as it can be observed in equation (1). ISBN:

2 y k = w k0 + L w kn x n (1) n=1 Where w kn are the weighting values and w k0 the threshold. Furthermore, equation (1) can be expressed in matrix notation as equation (2). y = w 0 + W T x (2) Where W is the weight matrix that contain the values of w kn. The design of the classifier consists of finding the best values of W and w 0 to minimize the classification error. The output of the linear combinations y is used to determine the decision rule. For instance, if the component y k gives the maximum value of the vector, then the k-th class is assigned to the pattern. In order to determine the values of the weights, it is necessary to minimize the mean squared error value. Let us define the matrix V = [w 0, W] T containing the weight matrix W and the threshold vector w 0, then the pattern matrix Q, which contains the input for classification, is the expressed in (3) x 11 x 12 x x 1N Q = x L1 x L2 x L3... x LN (3) So, the output of the linear classifier is obtained as a linear combination of the inputs according to (4). Y = V Q (4) Let us now define the target matrix containing the labels of each pattern as: t 11 t 12 t t 1N T = (5) t K1 t K2 t K3... t KN where N is the number of data samples, and t kn = 1 if the n-th pattern belongs to class C k, and 0 in other case. Then, the error is the difference between the outputs of the classifier and the true values, which are contained in the target vector: E = Y T = V Q V (6) Consequently, the Mean Square Error is computed according to equation (7). MSE = 1 N Y T) 2 = 1 N V Q T 2 (7) In the least squares approach, the weights are adjusted in order to minimize the mean squared value of this error (MSE). The minimization of the MSE is obtaining deriving expression (7) with respect V and, using the equations of Wiener-Hopf [9], the next expression for the weight values is obtained: V = T Q T (Q Q T ) 1 (8) This expression allows to determine the values of the coefficients that minimize the mean squared error for a given set of. 2.2 Database description For this study, we use the public database The Berlin Database of Emotional Speech, described in [1]. This database consists of 535 sound files (patterns), produced by 10 persons: 5 males and 5 females. A key point to note is that, as the sound database is not excessively large, and with the aim of investigating the robustness, the classification generalization, and the significance of the results, we have made use of several different subdivisions of the database into design and test subsets by means of bootstrapping [10]. Bootstrapping is a method for estimating error probabilities in those cases in which few data are available. It consists in iteratively select the design data and test data from the available data, in order to implement so many classification systems as iterations of the bootstrapping [11]. In our case, with the aim of maximizing the generalization capability of the results, we have selected all the possible configurations of test sets, selecting one male and one female speaker each time. 3 Features containing emotional information The measurements selected for the study of the emotion detection problem have been the Mel-Frequency Cepstral Coefficients (MFCCs), the Short Term Energy (STE), the Pitch, the Jitter, the Harmonic to Noise Ratio(HNR), the Aperture Perturbation Quotient (APQ) and the Pitch Perturbation Quotient (PPQ). We have also evaluated a novel measurement, that has demonstrated to be very useful in order to determine the emotion. Once these measurements are determined, different statistics (mean, variance, kurtosis, etc.) are then evaluated in order to obtain the. Table 1 includes a description of the and the statistics determined from each measurement. ISBN:

3 Table 1: Description of the set of used in the paper MFCCs (5 coef.) Features Index mean 1-5 Total number of std delta MFCC mean(e) 16 std(e) 17 Energy (e) kurtosis(e) 18 5 skeness(e) 19 median(e) 20 mean(p) 21 std(p) 22 Pitch (p) kurtosis(p) 23 5 skewness(p) 24 median(p) 25 Standard Energy and mean(p e) 26 Pitch std(p e) 27 Proposed mean(j) 28 Jitter (j) mean(log(j)) 29 3 median(j) 30 mean 31 std 32 geomean 33 HNR var 34 7 kutosis 35 skewness 36 median 37 APQ 38 1 PPQ 39 1 mean(x) 40 std(x) 41 var(x) 42 Proposed geomean(x) 43 feature family (x) mean(log(x)) 44 kurtosis(x) 45 skewness(x) 46 median(x) MFCCs The MFCCs are a set of perceptual parameters calculated from the STFT [8] that have been widely used in speech recognition. They provide a compact representation of the spectral envelope, such that most of the signal energy is concentrated in the first coefficients. Perceptual analysis emulates human ear nonlinear frequency response by creating a set of filters on non-linearly spaced frequency bands. Mel cepstral analysis uses the Mel scale and a cepstral smoothing in order to get the final smoothed spectrum. The process used for obtaining the MFCCs is described as follows: First, the short-term spectrum of the vocal segment is evaluated. This spectrum is integrated over gradually widening frequency intervals on the Mel scale. The resulting Mel-warped spectrum is projected on a cosine basis and the Mel frequency cepstral coefficients (MFCC) are obtained. The bandwidth of each band in the frequency domain depends on the filter central frequency. The higher the frequency is, the wider the bandwidth is. Next, a vector with log energies is evaluated for each filter. Finally, the cosine transform converts the log energies to a set of uncorrelated cepstral coefficients (MFCCs) The first cepstral coefficient describe the shape of the log spectrum independently of its overall level, the second coefficient measures the balance between the upper and lower halves of the spectrum, and higher order coefficients are concerned with increasingly finer in the spectrum. In this paper, 5 MFCCs have been evaluated for each file-pattern, and several statistics of the Mel- Frequency Cepstral Coefficients (MFCC) were considered as. 3.2 Short term energy The Short Term Energy is described by the evaluation of the energy in 20ms time frames. 3.3 Pitch Pitch or Fundamental frequency, gives information about the vibration velocity of the vocal chords when a sound is produced, which is generated by the quick aperture and close of vocal chords. ISBN:

4 3.4 Jitter Jitter, or frequency perturbation, is defined as small cycle to cycle changes of period that occur during phonation which is not accounted by voluntary changes in frequency. The more jitter deviates from zero, the more it correlates with erratic vibratory patterns of the vocal folds. Depends on voice, sex and voluntary intonation HNR HNR (Harmonics to Noise Ratio: is a measurement of voice pureness. It is based on calculating the ratio of the energy of the harmonics related to the noise energy present in the voice (both measured in db). Such measurement is carried out from the speech spectrum removing by filtering the energy present at the harmonics. Fourier transformed the resulting filtered spectrum to provide a noise spectrum which is subtracted from the original log spectrum. This results in, what is termed here, a source related spectrum. After performing a baseline correction procedure on this spectrum, the modified noise spectrum is subtracted from the original log spectrum in order to provide the HNR ratio estimate. 3.6 PPQ PPQ (Pitch Perturbation Quotient). Compute the relative variability from period to period of the fundamental frequency, with a smoothing factor of M periods. Specifically, it averages the differences between each period and the Z previous periods and the Z next periods. Hence, the whole signal is analyzed with a moving window containing the M periods. 3.7 APQ APQ, Aperture Perturbation Quotient. When there are voluntary amplitude changes in the voice, it is useful the use of the APQ parameter, due to the fact that it measures the period to period amplitude variability averaging among M periods. 3.8 Proposed measurement: normalized harmonic energy variation The normalized harmonic energy variation can be obtained by dividing the high frequency energy by the harmonic energy between 0.1Hz and 4Hz. First, the harmonic energy is evaluated in time frames of 20ms. Then this sequence of values is filtered in order to select the values between 0.1Hz and 4Hz. The proposed measurement normalizes this value using the high frequency energy, measured between 2.5KHz and 5.8KHz. Table 2: Confusion matrix (%) for the proposed set of. The emotions listed are Neutral (N), Boredom (B), Sadness (S), Disgust (D), Fear (F), Happiness (H) and Anger (A). Real emotion Low Medium High stress stress stress N B S D F H A N 53 % 15 % 0 % 0 % 6 % 8 % 0 % B 5 % 69 % 17 % 0 % 0 % 0 % 0 % S 3 % 5 % 70 % 0 % 0 % 0 % 0 % D 2 % 5 % 7 % 63 % 0 % 0 % 1 % F 24 % 0 % 7 % 25 % 78 % 0 % 6 % H 9 % 0 % 0 % 13 % 17 % 50 % 18 % A 3 % 5 % 0 % 0 % 0 % 42 % 75 % 4 Results This section presents the results of the experiments carried out in this paper. Table 2 shows the confusion matrix for the proposed set of. The emotions listed are Neutral (N), Boredom (B), Sadness (S), Disgust (D), Fear (F), Happiness (H) and Anger (A). Each element represents the probability of assigning an emotion. Each column represents the distribution of the values belonging to a different emotion. As we can see, there exists three groups of emotions that can be confused. Due to this result, we propose to group those original emotions into three groups: Low stressed emotions. This set includes neutral, boredom and sadness patterns. Medium stressed emotions. This set includes disgust and fear. High stressed emotions. This set includes happiness and anger. Table 3 shows the classification errors (%) for the original and proposed set of, in function of the number of classes. As we can see, the use of the proposed set of increases the performance of the classifier, both with the 7-class problem and the 3-class problem. 5 Conclusion Stress is a reaction or response of the subject to face up the daily mental, emotional or physical challenges. Continuous scanning of stress levels of a subject is a key point to understand and control personal stress. Stress is expressed by physiological changes, emotional reactions, and conduct changes. Some of the ISBN:

5 Table 3: Classification errors (%) for the standard and proposed set of, in function of the number of classes Number of Standard set Standard and Classes of Proposed 7 classes 37.54% 32.61% 3 classes 19.05% 15.93% physiological changes are the increase of adrenaline produced to intensify the concentration, or the rise in heart rate and the acceleration of the reflexes. Concerning emotional reactions, they can be expressed by changes in the prosody of speech. Therefore, it is important for the society to find solutions in order to determine the instantaneous stress level, since stress can undermine both mental and physical health. In this work some are proposed for minimizing the error in emotion detectors that improves the performance of actual systems. Results demonstrate the feasibility of the proposed system, obtaining error rates lower than 33% with seven types of emotions. It is important to highlight that subjective tests with native listeners have been carried out, obtaining error rates of around 40%. This fact makes the proposed family of measurements useful in order to implement emotion classification systems, even in those cases in which there is not prior knowledge about the speaker. [1] P.Y. Oudeyer, The production and recognition of emotions in speech: and algorithms, International Journal of Human-Computer Studies 59 (2003), pp [2] R. Barra, J.Macias-Guarasa, J.M.Montero, C. Rincon, F.Fernandez, and R.Cordoba, In Search of Primary Rubrics for Language Independent Emotional Speech Identification, International Symposium on Intelligent Signal Processing (2007), pp [3] F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier and B. Weiss, A database of german emotional speech, Proceedings of the Interspeech (2005), pp [4] A. Paeschke Global trend of fundamental frequency in emotional speech, Proceedings of the Speech Prosody (2004), pp [5] D. Ververidis and C. Kotropoulos, Emotional speech recognition: Resources,, and methods, Speech Communication 48 (2006), no. 9, pp [6] D. Ververidis, C. Kotropoulos and L.Pitas, Automatic emotional speech classification, Proceedings of the International Conference on Acoustics, Speech and Signal Processing (2004), vol. 1, pp [7] K. Hammerschmidt and U. Jrgens, Acoustical Correlates of Affective Prosody, Journal of Voice 21 (2007), pp [8] S. Davis and P. Mermelstein, Experiments in syllable-based recognition of continuous speech, IEEE Transactions on Acoustics, Speech and Signal Processing 28 (1980), pp [9] H.L. Van Trees, Detection, estimation, and modulation theory, vol. 1. Wiley, [10] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, Wiley-Interscience, [11] A.C. Davison and D.V. Hinkley, Bootstrap methods and their application, vol. 1, Cambridge Univ Pr, Acknowledgments This work has been funded by the Spanish Ministry of Education and Science (TEC C03-03) and by the under project UAH2011/EXP-028. References: ISBN:

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Harmonic Structure Transform for Speaker Recognition

Harmonic Structure Transform for Speaker Recognition Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

Gender-Driven Emotion Recognition Through Speech Signals for Ambient Intelligence Applications

Gender-Driven Emotion Recognition Through Speech Signals for Ambient Intelligence Applications IEEE TRANSACTIONS ON Received 10 April 2013; revised 17 July 2013; accepted 17 July 2013. Date of publication 25 July 2013; date of current version 21 January 2014. Digital Object Identifier 10.1109/TETC.2013.2274797

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

A comparative study of time-delay estimation techniques for convolutive speech mixtures

A comparative study of time-delay estimation techniques for convolutive speech mixtures A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Topic 6. Timbre Representations

Topic 6. Timbre Representations Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Frog Sound Identification System for Frog Species Recognition

Frog Sound Identification System for Frog Species Recognition Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,

More information

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Cochlear modeling and its role in human speech recognition

Cochlear modeling and its role in human speech recognition Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL Allen/IPAM

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

On The Role Of Head Motion In Affective Expression

On The Role Of Head Motion In Affective Expression On The Role Of Head Motion In Affective Expression Atanu Samanta, Tanaya Guha March 9, 2017 Department of Electrical Engineering Indian Institute of Technology, Kanpur, India Introduction Applications

More information

2D Spectrogram Filter for Single Channel Speech Enhancement

2D Spectrogram Filter for Single Channel Speech Enhancement Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

Recognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines

Recognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines Recognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines by Siqing Wu A thesis submitted to the Department of Electrical and Computer Engineering in conformity

More information

NEW ACOUSTICAL PATTERN RECOGNITION APPROACH TO IDENTIFY DIFFERENT STAGES OF A COOKING PROCESS. THE BOILING WATER CASE.

NEW ACOUSTICAL PATTERN RECOGNITION APPROACH TO IDENTIFY DIFFERENT STAGES OF A COOKING PROCESS. THE BOILING WATER CASE. NEW ACOUSTICAL PATTERN RECOGNITION APPROACH TO IDENTIFY DIFFERENT STAGES OF A COOKING PROCESS. THE BOILING WATER CASE. M. Tabacchi (1), C. Asensio (2), I. Pavón (2) and M. Recuero (2) (1) Hibbs & Associates

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Musical Genre Classication

Musical Genre Classication Musical Genre Classication Jan Müllers RWTH Aachen, 2015 Jan Müllers Finding Disjoint Paths 1 / 15 Musical Genres The Problem Musical Genres History Automatic Speech Regocnition categorical labels created

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions

More information

Multiclass Discriminative Training of i-vector Language Recognition

Multiclass Discriminative Training of i-vector Language Recognition Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

Text Independent Speaker Identification Using Imfcc Integrated With Ica

Text Independent Speaker Identification Using Imfcc Integrated With Ica IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc

More information

Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition

Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition . Manuscript Click here to view linked References 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION

MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION Joakim Andén CMAP, Ecole Polytechnique, 91128 Palaiseau anden@cmappolytechniquefr Stéphane Mallat CMAP, Ecole Polytechnique, 91128 Palaiseau ABSTRACT Mel-frequency

More information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA

More information

Modifying Voice Activity Detection in Low SNR by correction factors

Modifying Voice Activity Detection in Low SNR by correction factors Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III

More information

Time-Series Analysis for Ear-Related and Psychoacoustic Metrics

Time-Series Analysis for Ear-Related and Psychoacoustic Metrics Time-Series Analysis for Ear-Related and Psychoacoustic Metrics V. Mellert, H. Remmers, R. Weber, B. Schulte-Fortkamp how to analyse p(t) to obtain an earrelated parameter? general remarks on acoustical

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Analysis of audio intercepts: Can we identify and locate the speaker?

Analysis of audio intercepts: Can we identify and locate the speaker? Motivation Analysis of audio intercepts: Can we identify and locate the speaker? K V Vijay Girish, PhD Student Research Advisor: Prof A G Ramakrishnan Research Collaborator: Dr T V Ananthapadmanabha Medical

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Pattern Recognition Applied to Music Signals

Pattern Recognition Applied to Music Signals JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Noisy Speech Recognition using Wavelet Transform and Weighting Coefficients for a Specific Level

Noisy Speech Recognition using Wavelet Transform and Weighting Coefficients for a Specific Level Proceedings of th International Congress on Acoustics, ICA 23-27 August, Sydney, Australia Noisy Speech Recognition using Wavelet Transform and Weighting Coefficients for a Specific Level Yoichi MIDORIKAWA,

More information

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 19, 267-282 (2003) Robust Speaer Identification System Based on Wavelet Transform and Gaussian Mixture Model Department of Electrical Engineering Tamang University

More information

Non-parametric Classification of Facial Features

Non-parametric Classification of Facial Features Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted

More information

Modeling the creaky excitation for parametric speech synthesis.

Modeling the creaky excitation for parametric speech synthesis. Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Sensors & Transducers 2016 by IFSA Publishing, S. L.

Sensors & Transducers 2016 by IFSA Publishing, S. L. Sensors & Transducers 2016 by IFSA Publishing, S. L. http://www.sensorsportal.com Exploring Recurrence Properties of Vowels for Analysis of Emotions in Speech * Angela LOMBARDI, Pietro GUCCIOE and Cataldo

More information

Introduction Basic Audio Feature Extraction

Introduction Basic Audio Feature Extraction Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017 Today g Main modules A. Sound and music for

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

More information

/16/$ IEEE 2817

/16/$ IEEE 2817 INFORMAION FUSION BASED ON KERNEL ENROPY COMPONEN ANALYSIS IN DISCRIMINAIVE CANONICAL CORRELAION SPACE WIH APPLICAION O AUDIO EMOION RECOGNIION Lei Gao 1,2 Lin Qi 1, Ling Guan 2 1. School of Information

More information

A Survey on Voice Activity Detection Methods

A Survey on Voice Activity Detection Methods e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 668-675 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com A Survey on Voice Activity Detection Methods Shabeeba T. K. 1, Anand Pavithran 2

More information

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System

GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux

More information

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information

Text-Independent Speaker Identification using Statistical Learning

Text-Independent Speaker Identification using Statistical Learning University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 7-2015 Text-Independent Speaker Identification using Statistical Learning Alli Ayoola Ojutiku University of Arkansas, Fayetteville

More information

Unsupervised Learning Methods

Unsupervised Learning Methods Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational

More information

Comparing linear and non-linear transformation of speech

Comparing linear and non-linear transformation of speech Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,

More information

Nonlinear Modeling of a Guitar Loudspeaker Cabinet

Nonlinear Modeling of a Guitar Loudspeaker Cabinet 9//008 Nonlinear Modeling of a Guitar Loudspeaker Cabinet David Yeh, Balazs Bank, and Matti Karjalainen ) CCRMA / Stanford University ) University of Verona 3) Helsinki University of Technology Dept. of

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information