Harmonic Structure Transform for Speaker Recognition

Size: px
Start display at page:

Download "Harmonic Structure Transform for Speaker Recognition"

Transcription

1 Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 1/21

2 Spectral Transforms in General Given x the energy spectrum of a speech frame, ( ( )) y = F 1 log M T x normalization term The matrix M is a filterbank, whose columns look like: M defines the number of filters, and their central frequencies, widths, and general shapes. Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 2/21

3 Spectral Transforms in General Given x the energy spectrum of a speech frame, ( ( )) y = F 1 log M T x normalization term The matrix M is a filterbank, whose columns look like: M defines the number of filters, and their central frequencies, widths, and general shapes. Importantly here, the filters of all such filterbanks integrate energy across frequencies related by adjacency. Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 2/21

4 The Harmonic Structure Transform (HST) In contrast, the HST is implemented by a matrix H whose columns look like: Each filter integrates energy across frequencies related by harmonicity (not adjacency). this is novel (Laskowski & Jin, 2010) for speaker recognition related to (Liénard, Barras & Signol, 8) for pitch detection unknown: number of filters, and their fundamental frequencies, tooth widths, and individual tooth shapes Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 3/21

5 Outline of this Talk 1 Baseline Performance What is known? 2 Experiments in HSCC Filterbank Design linear spacing in fundamental frequency piecewise linear spacing in fundamental frequency logarithmic spacing in fundamental frequency fundamental frequency range and density 3 Score-level Fusion with Standard MFCCs 4 Generalization 5 Conclusions Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 4/21

6 HST Processing frame FFT x t idealized FFT (comb filter h) as a function of i f h [i 1] f h [i] f h [i + 1] analysis every 8 ms frames 32 ms wide comb filter teeth triangular (global width parameter) filters, linearly spanning from 50 Hz to 450 Hz logarithm at each filter output, then normalization decorrelation using LDA yields harmonic structure cepstral coefficients (HSCCs) Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 5/21

7 HSCC Modeling for Classification As simple as possible. one GMM per speaker 1 assume one Gaussian element 2 determine optimal number N D of LDA dimensions 3 hold N D fixed 4 determine optimal number of N G Gaussians maximum likelihood closed-set classification (MAP under uniform prior) Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 6/21

8 Available Results (Laskowski & Jin, ODYSSEY 2010) Wall Street Journal data, mostly read speech 100-way closed-set classification, per gender second trials, per gender and dataset matched channel and matched multi-session conditions System Female, Male, Dev Test Dev Test F HST/LDA MEL/DCT MEL/LDA Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 7/21

9 Session Mismatch MIXER5 data, various speaking styles 66-way closed-set classification second trials, per dataset matched channel and matched session: accuracies of 100% matched channel but mismatched session: System Dev Test F HST/LDA MEL/DCT MEL/LDA Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 8/21

10 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

11 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

12 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

13 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

14 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

15 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

16 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

17 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

18 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

19 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

20 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

21 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

22 Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz? Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 9/21

23 Piecewise Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 10/21

24 Piecewise Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 10/21

25 Piecewise Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 10/21

26 Piecewise Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 10/21

27 Piecewise Linear Spacing of Fundamental Frequencies 50Hz 150Hz 250Hz 350Hz 450Hz 550Hz 650Hz 750Hz 850Hz Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 10/21

28 Logarithmic Spacing of Fundamental Frequencies In the linear case: N h fundamental frequencies f h between f min and f max : f h [i] = fh min + i 1 ( f max h f min ) h 1 i N h N h 1 Baseline Linear 34.4 PW Linear frequency f h (Hz) Baseline Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 11/21

29 Logarithmic Spacing of Fundamental Frequencies In the linear case: N h fundamental frequencies f h between f min and f max : f h [i] = fh min + i 1 ( f max h f min ) h 1 i N h N h 1 Linear Baseline Linear 34.4 PW Linear frequency f h (Hz) Baseline Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 11/21

30 Logarithmic Spacing of Fundamental Frequencies In the linear case: N h fundamental frequencies f h between f min and f max : f h [i] = fh min + i 1 ( f max h f min ) h 1 i N h N h 1 Linear Baseline Linear 34.4 PW Linear frequency f h (Hz) Baseline Logarithmic PW Linear filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 11/21

31 Logarithmic Spacing of Fundamental Frequencies In the linear case: N h fundamental frequencies f h between f min and f max : f h [i] = fh min + i 1 ( f max h f min ) h 1 i N h N h 1 frequency f h (Hz) Linear Logarithmic Baseline PW Linear filter index i Baseline Linear PW Linear Logarithmic f h [i] = f min h ( f max ) i 1 N h h 1 fh min Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 11/21

32 Range and Density of Fundamental Frequencies Three manipulations: f min : raise to 62.5 Hz f max : raise to maximize accuracy (to 0 Hz) N h : lower to maximize accuracy (to 1129) frequency f h (Hz) Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 12/21

33 Range and Density of Fundamental Frequencies Three manipulations: f min : raise to 62.5 Hz f max : raise to maximize accuracy (to 0 Hz) N h : lower to maximize accuracy (to 1129) frequency f h (Hz) Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N Raise LB Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 12/21

34 Range and Density of Fundamental Frequencies Three manipulations: f min : raise to 62.5 Hz f max : raise to maximize accuracy (to 0 Hz) N h : lower to maximize accuracy (to 1129) Baseline Linear 34.4 PW Linear frequency f h (Hz) Logarithmic Raise LB Raise UB Raise UB Lower N Raise LB Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 12/21

35 Range and Density of Fundamental Frequencies Three manipulations: f min : raise to 62.5 Hz f max : raise to maximize accuracy (to 0 Hz) N h : lower to maximize accuracy (to 1129) Baseline Linear 34.4 PW Linear frequency f h (Hz) Lower N Logarithmic Raise LB Raise UB Raise UB Lower N Raise LB Logarithmic filter index i Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 12/21

36 Interpolation with Standard MFCCs linear interpolation with DCT-decorrelated log-mel energies 20 coefficients 25 coefficients (always worse) MEL/DCT %abs accuracy MEL/DCT 25 HSCC MEL/DCT HSCC %abs interpolation parameter Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 13/21

37 Interpolation with LDA-Rotated MFCCs linear interpolation with LDA-decorrelated log-mel energies 20 coefficients (better in combination) 25 coefficients (better alone) 10.6%abs MEL/LDA 20 accuracy MEL/LDA 25 HSCC MEL/LDA + HSCC %abs interpolation parameter Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 14/21

38 Summary of Accuracy on DevSet DEVSET EVALSET 10.7%abs 33.5%rel Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N ??.? %abs 27.6%rel MEL/DCT + HSCC %abs 31.8%rel MEL/LDA + HSCC Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 15/21

39 Accuracy on EvalSet DEVSET EVALSET 10.7%abs 33.5%rel Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N ??.? %abs 27.6%rel MEL/DCT + HSCC %abs 31.8%rel MEL/LDA + HSCC Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 16/21

40 Accuracy on EvalSet DEVSET EVALSET 10.7%abs 33.5%rel Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N ??.? %abs 27.6%rel MEL/DCT + HSCC %abs 31.8%rel MEL/LDA + HSCC Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 16/21

41 Accuracy on EvalSet DEVSET EVALSET 10.7%abs 33.5%rel Baseline Linear PW Linear Logarithmic Raise LB Raise UB Lower N ??.? %abs 27.6%rel MEL/DCT + HSCC %abs 31.8%rel MEL/LDA + HSCC Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 16/21

42 Conclusions 1 evaluated the baseline transform in session mismatch viable: twice as many errors an equivalent MFCC system 2 errors can be reduced by a third (33.5%rel) by optimizing: the number of filters in the filterbank the fundamental frequency corresponding to each filter 3 logarithmic spacing of fundamental frequencies is better than linear spacing more filters for low fundamental frequencies fewer filters for high fundamental frequencies 4 in an equivalent MFCC system, errors can be reduced by almost a third ( %rel) via score-level fusion with the improved HST system Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 17/21

43 Future Directions 1 change framing policy from 8 ms/32 ms to something longer intonation and voice quality are supra-segmental larger temporal support greater spectral resolution 2 optimize the tooth shape of comb filters 3 find a data-independent decorrelation transform leading to a compact (< 25 coefficients) representation 4 explore adaptation from a universal background model 5 generalize to binary speaker verification (and NIST SREs) Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 18/21

44 Potential Impact 1 a new general representation of the spectrum 2 deliberately orthogonal to spectral envelope features (MFCCs, LPCCs, etc.) but computed in an identical manner 3 likely beneficial not only for speaker recognition, but also: online speaker diarization classification of emotional speech clinical voice quality assessment Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 19/21

45 Some Interesting Insights... 1 currently in HST, spectral energy < 300 Hz is zeroed out this improves closed-set speaker classification but the fundamental (zeroth harmonic) is ignored the fundamental is thought to play a role in emotional expression 2 that optimal filter spacing is logarithmic is curious independent of the logarithmic tonotopicity of the basilar membrane greater acuity in discriminating among harmonic sounds (not just pure tones) Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 20/21

46 THANK YOU Laskowski & Jin INTERSPEECH 2011, Firenze, Italy 21/21

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems

Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems K. Laskowski a, M. Wölfel b, M. Heldner c and J. Edlund c a Carnegie Mellon University, 407 South Craig

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Stress detection through emotional speech analysis

Stress detection through emotional speech analysis Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

Front-End Factor Analysis For Speaker Verification

Front-End Factor Analysis For Speaker Verification IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This

More information

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

Session Variability Compensation in Automatic Speaker Recognition

Session Variability Compensation in Automatic Speaker Recognition Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

Topic 6. Timbre Representations

Topic 6. Timbre Representations Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm

More information

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION by Taufiq Hasan Al Banna APPROVED BY SUPERVISORY COMMITTEE: Dr. John H. L. Hansen, Chair Dr. Carlos Busso Dr. Hlaing Minn Dr. P. K. Rajasekaran

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors Published in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Bangalore, India Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

More information

Text-Independent Speaker Identification using Statistical Learning

Text-Independent Speaker Identification using Statistical Learning University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 7-2015 Text-Independent Speaker Identification using Statistical Learning Alli Ayoola Ojutiku University of Arkansas, Fayetteville

More information

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

Multiclass Discriminative Training of i-vector Language Recognition

Multiclass Discriminative Training of i-vector Language Recognition Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Odyssey 2014: The Speaker and Language Recognition Workshop 16-19 June 2014, Joensuu, Finland An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Jinghua Zhong 1, Weiwu

More information

An Investigation of Spectral Subband Centroids for Speaker Authentication

An Investigation of Spectral Subband Centroids for Speaker Authentication R E S E A R C H R E P O R T I D I A P An Investigation of Spectral Subband Centroids for Speaker Authentication Norman Poh Hoon Thian a Conrad Sanderson a Samy Bengio a IDIAP RR 3-62 December 3 published

More information

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION

MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION Joakim Andén CMAP, Ecole Polytechnique, 91128 Palaiseau anden@cmappolytechniquefr Stéphane Mallat CMAP, Ecole Polytechnique, 91128 Palaiseau ABSTRACT Mel-frequency

More information

Noise Compensation for Subspace Gaussian Mixture Models

Noise Compensation for Subspace Gaussian Mixture Models Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012 Outline Motivation ubspace GMM

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

CEPSTRAL analysis has been widely used in signal processing

CEPSTRAL analysis has been widely used in signal processing 162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior

More information

How to Deal with Multiple-Targets in Speaker Identification Systems?

How to Deal with Multiple-Targets in Speaker Identification Systems? How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS

A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS Kornel Laskowski and Tanja Schultz interact, Carnegie Mellon

More information

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015 University of Colorado at Boulder ECEN 4/5532 Lab 2 Lab report due on February 16, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1

More information

TNO SRE-2008: Calibration over all trials and side-information

TNO SRE-2008: Calibration over all trials and side-information Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Fusion of Spectral Feature Sets for Accurate Speaker Identification

Fusion of Spectral Feature Sets for Accurate Speaker Identification Fusion of Spectral Feature Sets for Accurate Speaker Identification Tomi Kinnunen, Ville Hautamäki, and Pasi Fränti Department of Computer Science University of Joensuu, Finland {tkinnu,villeh,franti}@cs.joensuu.fi

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

Minimax i-vector extractor for short duration speaker verification

Minimax i-vector extractor for short duration speaker verification Minimax i-vector extractor for short duration speaker verification Ville Hautamäki 1,2, You-Chi Cheng 2, Padmanabhan Rajan 1, Chin-Hui Lee 2 1 School of Computing, University of Eastern Finl, Finl 2 ECE,

More information

Low-dimensional speech representation based on Factor Analysis and its applications!

Low-dimensional speech representation based on Factor Analysis and its applications! Low-dimensional speech representation based on Factor Analysis and its applications! Najim Dehak and Stephen Shum! Spoken Language System Group! MIT Computer Science and Artificial Intelligence Laboratory!

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Modeling Norms of Turn-Taking in Multi-Party Conversation

Modeling Norms of Turn-Taking in Multi-Party Conversation Modeling Norms of Turn-Taking in Multi-Party Conversation Kornel Laskowski Carnegie Mellon University Pittsburgh PA, USA 13 July, 2010 Laskowski ACL 1010, Uppsala, Sweden 1/29 Comparing Written Documents

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Frog Sound Identification System for Frog Species Recognition

Frog Sound Identification System for Frog Species Recognition Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,

More information

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions Ville Vestman 1, Dhananjaya Gowda, Md Sahidullah 1, Paavo Alku 3, Tomi

More information

Intelligent Rapid Voice Recognition using Neural Tensor Network, SVM and Reinforcement Learning

Intelligent Rapid Voice Recognition using Neural Tensor Network, SVM and Reinforcement Learning CS229/CS221 PROJECT REPORT, DECEMBER 2015, STANFORD UNIVERSITY 1 Intelligent Rapid Voice Recognition using Neural Tensor Network, SVM and Reinforcement Learning Davis Wertheimer, Aashna Garg, James Cranston

More information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Distribution A: Public Release Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström Elliot Singer Douglas Reynolds and Omid Sadjadi 2

More information

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Gain Compensation for Fast I-Vector Extraction over Short Duration

Gain Compensation for Fast I-Vector Extraction over Short Duration INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee and Haizhou Li 2 Institute for Infocomm Research I 2 R), A STAR, Singapore

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Robust Sound Event Detection in Continuous Audio Environments

Robust Sound Event Detection in Continuous Audio Environments Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University

More information

AUDIO-VISUAL RELIABILITY ESTIMATES USING STREAM

AUDIO-VISUAL RELIABILITY ESTIMATES USING STREAM SCHOOL OF ENGINEERING - STI ELECTRICAL ENGINEERING INSTITUTE SIGNAL PROCESSING LABORATORY Mihai Gurban ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE EPFL - FSTI - IEL - LTS Station Switzerland-5 LAUSANNE Phone:

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

Pattern Recognition Applied to Music Signals

Pattern Recognition Applied to Music Signals JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Aleksandr Sizov 1, Kong Aik Lee, Tomi Kinnunen 1 1 School of Computing, University of Eastern Finland, Finland Institute

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information