University of Birmingham Research Archive

Size: px
Start display at page:

Download "University of Birmingham Research Archive"

Transcription

1

2 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23 1

24

25

26

27

28

29 Chapter 1 Introduction Chapter 2 Chapter 3 Literature and review of techniques Corpora descriptions microphone recorded speech signals telephony recorded speech signals Chapter 4 Base-line systems and system evaluations Adult speech Sub-band experiments Thesis Chapter 5 Accent ID vs Speaker ID Adult speech Microphon e recorded speech signals Full-band experiments Full-band experiments Chapter 6 Speaker recognition Adult speech Microphone recorded speech signals Sub-band experiments Child speech Microphone recorded speech signals Full-band experiments Chapter 7 Gender ID Sub-band experiments Age-group ID Child speech Microphone recorded speech signals Full-band experiments Human experimtents Chapter 8 Conclusion Child speech Microphone recorded speech signals Sub-band experimetns Full-band experiments Human experiments Sub-band experiments

30

31 2

32

33

34

35

36 [ ] [ ] = [ ] [ ]

37 [, / ] Gain 9 x Frequency (Hz)

38 = = ( + ) = + (, ) = ( + + ) ( + )

39 =,..., =,..., ; =,..., ( ) ; ; = ; ;, =,..., ; = = ;

40 = +

41 = = ( ) ( ) ( ) = = = ( ) ( ) ( )

42 (, ) ( ) ( )

43 ( ) = (, ) = = {,, }, =,...,.

44 (, ), T (, ) =, [ (, )], = =, T, = (, ) = (, ) = T =, T =,

45 ( ) = (, ) = = {,, }, =,...,. (, ) = + + = T =, T =, = T, =,

46

47 = + () {, } =,..., {, }, R. =,,..., >. + =. + + = +. + = (. + ). + = +. + = (. + )

48 y w w x + b = 1 w x + b = 0 2 w w x + b = 1 x b w (. + ) + = =

49 = = = = =. =, (. + ) =

50 (.,.) ( ) = (, ) + = = = ( ) (.,.) (.,.) (, ) = ( ) ( ) ( ) ( )

51 (, )

52 N (, ) N ( +, ) < : > = = ( ) + ( ) ( ) ( ) + ( ) ( ) + ( ) ( )

53 N (, ) N (, ), = + + [ ] N (, ). N (, ) [ ] = [ ] = [ + + ] = + [ ] + [ ] = + + =. [ ] = = [( [ ])( [ ]) ] = [( [ ])( [ ]) ] = [( [ ])( [ ]) ] N (, ) = ( ) = [( [ ])( [ ]) ] = [ ( + + ) ] = [ ] + [ ] =.

54 [ ] = ( ) [ ] = [ ] [ ] = [( [ ])( [ ]) ] = [( + + )( + + ) ] = [ ] = [ ] + [ ] = +. [ ] ([ ] N, [ + ]). N (, + ) ( ) ; =,..., L(,, ) = = ( ) () / + / ( ( ) ) ( + ) ( ( ) )

55 Speaker/session dependent Mean supervector : > UBM MAP Apapt.... Features for a given speaker S Speaker/session dependent GMM. = + (, )

56 T (, ) =, ( [ + ], )N ( ) = = N ( ) = [ + ] = [, ( )], = = = T (, ) =,,, (, [ + ],, ).

57 Unlabelled data UBM T Testutterance Front-end processing Super-vector extraction i-vector extraction Optional normalization and compensation techniques, such as LDA and length norm. Scoring Model i-vector Score (, ) =,

58 (, ) = ( ) (, ) =, ( ) =, (, ) =,

59 =. = = = =

60 X1 Y1 i-vector extraction Model 1 i-vector extraction X2 Y1... Xn Ym i-vector extraction i-vector extraction LDA and length norm. Model 2... Model m Decision LDA and length norm. Test utterance

61 = +, ( ) ( )

62

63 =. µ T µ I µ T µ I

64 =

65 = {(, ),..., (, )} = ( )

66

67

68 =. %,. %. %. %. %. %. %. %. %. %.% %

69

70

71

72

73 3

74

75 Train Test Evaluation Train Test Evaluation Train and evaluation Test Evaluation CSLU Kids Corpus Total number of speakers 1118 Age-Group ID Gender ID Speaker ID (Identifying a child in school) 352 spk. 766 spk. 766 spk.(n -1 file per spk.)* 430 spk. 687 spk. 687 spk.(n -1 file per spk.)* 918 spk. 100 spk. 100 spk. 50% male and 50% female. 54.2% male and 45.8% female. 50% male and 50% female. 55.7% male and 44.3% female. 50% male and 50% female.

76

77

78

79 Train and Evaluation Test TIMIT Corpus Total number of speakers 630 (438Male+192 Female) Speaker Identification 530 spk. 100 spk.

80

81

82 4

83

84

85

86

87 = = =.

88

89 Speaker Detection Performance 40 Miss probability (in %) False Alarm probability (in %)

90

91 Development Features UBM Training Computing Statistics T-Matrix Training Extracting i-vectors LDA Training i-vectors Test i-vectors Scoring Decision

92

93 40 False Negative Rate (FNR) [%] False Positive Rate (FPR) [%] Score Model Index Test Index

94

95 5

96

97 + =

98

99 ,...,

100 X1 Y1 i-vector extraction i-vector extraction X2 Y1... Xn Ym i-vector extraction i-vector extraction SVM Decision Test utterance

101

102 Identification Rate (%) A SID (30 Seconds test segments) SID (10 Seconds test segments) SID (3 Seconds test segments) B C D Sub Band

103 24 21 A B C D 18 Identification Rate (%) Sub Band A C D 0.01 NSID NAID B Sub Band

104 24 21 A B C D 18 Identification Rate (%) Sub Band

105 A C D NSID NAID B Sub Band....%

106

107

108 6

109

110

111 /

112

113 <

114

115 3 2.5 EER with 90% Confidence Interval EER (%) Number of Mixture Components

116 5 EER with 90% Confidence Interval 4 EER (%) Frequency (Hz)

117 8 7 EER with 90% Confidence Interval EER (%) Frequency (Hz)

118

119

120 EER (%) B1 GMM UBM (64 Mixture Components) GMM SVM (64 Mixture Components) B2 B3 B Sub Band GMM UBM (64 Mixture Components) GMM SVM (64 Mixture Components) 35 Identification Rate(%) Sub Band

121 GMM UBM GMM SVM GMM UBM Correlation Matrix GMM SVM

122 40 35 Kth to 2th Grade Speakers 3th to 6th Grade Speakers 7th to 10th Grade Speakers 30 Identification Rate (%) Sub Band

123

124

125 7

126

127

128 +

129

130

131 Full bandwidth performance Identification Rate (%) S1 S5 S9 S13 S17 S21 Sub Band Identification Rate (%) AG1 (5 9 years old) AG2 (9 13 years old) AG3 (13 16 years old) AG1 FB AG2 FB AG3 FB S1 S5 S9 S13 S17 S21 Sub Band

132

133 Identification Rate (%) S1 S5 S9 S13 S17 S21 Sub Band

134 0.015 Normalised GenderID Normalised AgeID Sub Band

135

136

137

138

139

140

141

142

143

144 8

145

146

147

148

149 A

150 B (, ) = = ( ) ( ) (, ) = = = ( )( ) = ( ) ( ) =

151 C = (, )

152

153 D [ ] = + N (, ) [ ] =, [ ] =. = [ ] = ( ) = [( )( ) ] =

154 ( ) = [ ] = = [( )( ) [ ] ( = )( ) ( )( ) ( )( ) ( )( ) N (, ) N (, ) = + ( ), =.

155 E {,,..., }, ( ) = ( ) = ( ) = ( ) = { ( ) = ( ) } ( ) = ( ) = ( ) ( )

156 ( ) = ( ) { ( ) + ( ) ( ) } ( ) ( ) = ( ) ( ) ( ) = ( ) ( ) ( ) = ( ) ( ) ( ) ( ) ( ),..., ( ) ( ) ( ) = + ( ) ( ) ( ) ( ) ( ( ) ( ), ( )) [ ( )] ( ) [ ( )] = ( ) ( ) [ ( )] ( ) ( ) ( ) ( ) = ( ) = ( ) ( )

157 C = ( ) ( ( ) ( )) = ( ) ( ( ) ( )) ( ) C = C C = C C ( ) = ( + ( ) ). ( )

158 F << ( ) ( ) { = + ; } ( ) + ( ) ( ) ( ) + ( ) ( ) + ( ) ( ) =

159 >

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

Front-End Factor Analysis For Speaker Verification

Front-End Factor Analysis For Speaker Verification IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This

More information

How to Deal with Multiple-Targets in Speaker Identification Systems?

How to Deal with Multiple-Targets in Speaker Identification Systems? How to Deal with Multiple-Targets in Speaker Identification Systems? Yaniv Zigel and Moshe Wasserblat ICE Systems Ltd., Audio Analysis Group, P.O.B. 690 Ra anana 4307, Israel yanivz@nice.com Abstract In

More information

TNO SRE-2008: Calibration over all trials and side-information

TNO SRE-2008: Calibration over all trials and side-information Image from Dr Seuss TNO SRE-2008: Calibration over all trials and side-information David van Leeuwen (TNO, ICSI) Howard Lei (ICSI), Nir Krause (PRS), Albert Strasheim (SUN) Niko Brümmer (SDV) Knowledge

More information

i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU

i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU i-vector and GMM-UBM Bie Fanhu CSLT, RIIT, THU 2013-11-18 Framework 1. GMM-UBM Feature is extracted by frame. Number of features are unfixed. Gaussian Mixtures are used to fit all the features. The mixtures

More information

Session Variability Compensation in Automatic Speaker Recognition

Session Variability Compensation in Automatic Speaker Recognition Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem

More information

IBM Research Report. Training Universal Background Models for Speaker Recognition

IBM Research Report. Training Universal Background Models for Speaker Recognition RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive ODYSSEY04 - The Speaker and Language Recognition Workshop Toledo, Spain May 3 - June 3, 2004 Analysis of Multitarget Detection for Speaker and Language Recognition*

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

speaker recognition using gmm-ubm semester project presentation

speaker recognition using gmm-ubm semester project presentation speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces

More information

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre

Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre Around the Speaker De-Identification (Speaker diarization for de-identification ++) Itshak Lapidot Moez Ajili Jean-Francois Bonastre The 2 Parts HDM based diarization System The homogeneity measure 2 Outline

More information

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS

SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS SCORE CALIBRATING FOR SPEAKER RECOGNITION BASED ON SUPPORT VECTOR MACHINES AND GAUSSIAN MIXTURE MODELS Marcel Katz, Martin Schafföner, Sven E. Krüger, Andreas Wendemuth IESK-Cognitive Systems University

More information

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review

More information

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors Published in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Bangalore, India Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-vectors

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems Chin-Hung Sit 1, Man-Wai Mak 1, and Sun-Yuan Kung 2 1 Center for Multimedia Signal Processing Dept. of

More information

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification

Studies on Model Distance Normalization Approach in Text-independent Speaker Verification Vol. 35, No. 5 ACTA AUTOMATICA SINICA May, 009 Studies on Model Distance Normalization Approach in Text-independent Speaker Verification DONG Yuan LU Liang ZHAO Xian-Yu ZHAO Jian Abstract Model distance

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Text-Independent Speaker Identification using Statistical Learning

Text-Independent Speaker Identification using Statistical Learning University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 7-2015 Text-Independent Speaker Identification using Statistical Learning Alli Ayoola Ojutiku University of Arkansas, Fayetteville

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions Ville Vestman 1, Dhananjaya Gowda, Md Sahidullah 1, Paavo Alku 3, Tomi

More information

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition

Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Geoffrey Zweig May 7, 2009

Geoffrey Zweig May 7, 2009 Geoffrey Zweig May 7, 2009 Taxonomy of LID Techniques LID Acoustic Scores Derived LM Vector space model GMM GMM Tokenization Parallel Phone Rec + LM Vectors of phone LM stats [Carrasquillo et. al. 02],

More information

Kernel Methods for Text-Independent Speaker Verification

Kernel Methods for Text-Independent Speaker Verification Kernel Methods for Text-Independent Speaker Verification Chris Longworth Cambridge University Engineering Department and Christ s College February 25, 2010 Dissertation submitted to the University of Cambridge

More information

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System

Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System INERSPEECH 2015 Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System QingYang Hong 1, Lin Li 1, Ming Li 2, Ling Huang 1, Lihong Wan 1, Jun Zhang 1

More information

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Odyssey 2014: The Speaker and Language Recognition Workshop 16-19 June 2014, Joensuu, Finland An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification Jinghua Zhong 1, Weiwu

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Low-dimensional speech representation based on Factor Analysis and its applications!

Low-dimensional speech representation based on Factor Analysis and its applications! Low-dimensional speech representation based on Factor Analysis and its applications! Najim Dehak and Stephen Shum! Spoken Language System Group! MIT Computer Science and Artificial Intelligence Laboratory!

More information

Support Vector Machines and Speaker Verification

Support Vector Machines and Speaker Verification 1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft

More information

Multiclass Discriminative Training of i-vector Language Recognition

Multiclass Discriminative Training of i-vector Language Recognition Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center

More information

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION. Taufiq Hasan Al Banna EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION by Taufiq Hasan Al Banna APPROVED BY SUPERVISORY COMMITTEE: Dr. John H. L. Hansen, Chair Dr. Carlos Busso Dr. Hlaing Minn Dr. P. K. Rajasekaran

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Harmonic Structure Transform for Speaker Recognition

Harmonic Structure Transform for Speaker Recognition Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &

More information

Bich Ngoc Do. Neural Networks for Automatic Speaker, Language and Sex Identification

Bich Ngoc Do. Neural Networks for Automatic Speaker, Language and Sex Identification Charles University in Prague Faculty of Mathematics and Physics University of Groningen Faculty of Arts MASTER THESIS Bich Ngoc Do Neural Networks for Automatic Speaker, Language and Sex Identification

More information

A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS

A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS Kornel Laskowski and Tanja Schultz interact, Carnegie Mellon

More information

26 Chapter 4 Classification

26 Chapter 4 Classification 26 Chapter 4 Classification The preceding tree cannot be simplified. 2. Consider the training examples shown in Table 4.1 for a binary classification problem. Table 4.1. Data set for Exercise 2. Customer

More information

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication

Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication Aleksandr Sizov 1, Kong Aik Lee, Tomi Kinnunen 1 1 School of Computing, University of Eastern Finland, Finland Institute

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Using Deep Belief Networks for Vector-Based Speaker Recognition

Using Deep Belief Networks for Vector-Based Speaker Recognition INTERSPEECH 2014 Using Deep Belief Networks for Vector-Based Speaker Recognition W. M. Campbell MIT Lincoln Laboratory, Lexington, MA, USA wcampbell@ll.mit.edu Abstract Deep belief networks (DBNs) have

More information

Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification

Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Odyssey 2018 The Speaker and Language Recognition Workshop 26-29 June 2018, Les Sables d Olonne, France Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification Md Hafizur Rahman 1, Ivan

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

SPEECH enhancement has been studied extensively as a

SPEECH enhancement has been studied extensively as a JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

Unsupervised Methods for Speaker Diarization. Stephen Shum

Unsupervised Methods for Speaker Diarization. Stephen Shum Unsupervised Methods for Speaker Diarization by Stephen Shum B.S., University of California, Berkeley (2009) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment

More information

Gender Classification in Speech Processing. Baiju M Nair(2011CRF3637) Geetanjali Srivastava(2012EEZ8304) Group No. 22

Gender Classification in Speech Processing. Baiju M Nair(2011CRF3637) Geetanjali Srivastava(2012EEZ8304) Group No. 22 Gender Classification in Speech Processing Baiju M Nair(2011CRF3637) Geetanjali Srivastava(2012EEZ8304) Group No. 22 Introduction Main objective is gender classification in speech processing. Classify

More information

SUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING 1. Correlation and Class Based Block Formation for Improved Structured Dictionary Learning

SUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING 1. Correlation and Class Based Block Formation for Improved Structured Dictionary Learning SUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 Correlation and Class Based Block Formation for Improved Structured Dictionary Learning Nagendra Kumar and Rohit Sinha, Member, IEEE arxiv:178.1448v2

More information

Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion

Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion INTERSPEECH 203 Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion Ville Hautamäki, Kong Aik Lee 2, David van Leeuwen 3, Rahim Saeidi 3, Anthony Larcher 2, Tomi Kinnunen, Taufiq

More information

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions

More information

INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition

INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition Man-Wai Mak and Jen-Tzung Chien The Hong Kong Polytechnic University, Hong Kong National Chiao Tung University, Taiwan September 8, 2016

More information

Kernel Based Text-Independnent Speaker Verification

Kernel Based Text-Independnent Speaker Verification 12 Kernel Based Text-Independnent Speaker Verification Johnny Mariéthoz 1, Yves Grandvalet 1 and Samy Bengio 2 1 IDIAP Research Institute, Martigny, Switzerland 2 Google Inc., Mountain View, CA, USA The

More information

On The Best Principal. Submatrix Problem

On The Best Principal. Submatrix Problem On The Best Principal Submatrix Problem by Seth Charles Lewis A thesis submitted to University of Birmingham for the degree of Doctor of Philosophy (PhD) School of Mathematics University of Birmingham

More information

Spoken Language Understanding in a Latent Topic-based Subspace

Spoken Language Understanding in a Latent Topic-based Subspace INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Spoken Language Understanding in a Latent Topic-based Subspace Mohamed Morchid 1, Mohamed Bouaziz 1,3, Waad Ben Kheder 1, Killian Janod 1,2, Pierre-Michel

More information

Approximate Bayesian Inference for Robust Speech Processing. A Thesis. Submitted to the Faculty. Drexel University. Ciira wa Maina

Approximate Bayesian Inference for Robust Speech Processing. A Thesis. Submitted to the Faculty. Drexel University. Ciira wa Maina Approximate Bayesian Inference for Robust Speech Processing A Thesis Submitted to the Faculty of Drexel University by Ciira wa Maina in partial fulfillment of the requirements for the degree of Doctor

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Minimax i-vector extractor for short duration speaker verification

Minimax i-vector extractor for short duration speaker verification Minimax i-vector extractor for short duration speaker verification Ville Hautamäki 1,2, You-Chi Cheng 2, Padmanabhan Rajan 1, Chin-Hui Lee 2 1 School of Computing, University of Eastern Finl, Finl 2 ECE,

More information

8. Classification and Pattern Recognition

8. Classification and Pattern Recognition 8. Classification and Pattern Recognition 1 Introduction: Classification is arranging things by class or category. Pattern recognition involves identification of objects. Pattern recognition can also be

More information

Towards Multi-Modal Driver s Stress Detection

Towards Multi-Modal Driver s Stress Detection Towards Multi-Modal Driver s Stress Detection Hynek Bořil, Pinar Boyraz, John H.L. Hansen Center for Robust Speech Systems, Erik Jonsson School of Engineering & Computer Science, University of Texas at

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Glottal Modeling and Closed-Phase Analysis for Speaker Recognition

Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Glottal Modeling and Closed-Phase Analysis for Speaker Recognition Raymond E. Slyh, Eric G. Hansen and Timothy R. Anderson Air Force Research Laboratory, Human Effectiveness Directorate, Wright-Patterson

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

GMM-Based Speech Transformation Systems under Data Reduction

GMM-Based Speech Transformation Systems under Data Reduction GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex

More information

Unsupervised Vocabulary Induction

Unsupervised Vocabulary Induction Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After

More information

A Generative Model for Score Normalization in Speaker Recognition

A Generative Model for Score Normalization in Speaker Recognition INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden A Generative Model for Score Normalization in Speaker Recognition Albert Swart and Niko Brümmer Nuance Communications, Inc. (South Africa) albert.swart@nuance.com,

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Gain Compensation for Fast I-Vector Extraction over Short Duration

Gain Compensation for Fast I-Vector Extraction over Short Duration INTERSPEECH 27 August 2 24, 27, Stockholm, Sweden Gain Compensation for Fast I-Vector Extraction over Short Duration Kong Aik Lee and Haizhou Li 2 Institute for Infocomm Research I 2 R), A STAR, Singapore

More information

Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs

Speaker Recognition Using Artificial Neural Networks: RBFNNs vs. EBFNNs Speaer Recognition Using Artificial Neural Networs: s vs. s BALASKA Nawel ember of the Sstems & Control Research Group within the LRES Lab., Universit 20 Août 55 of Sida, BP: 26, Sida, 21000, Algeria E-mail

More information

Bayesian Estimation of Bipartite Matchings for Record Linkage

Bayesian Estimation of Bipartite Matchings for Record Linkage Bayesian Estimation of Bipartite Matchings for Record Linkage Mauricio Sadinle msadinle@stat.duke.edu Duke University Supported by NSF grants SES-11-30706 to Carnegie Mellon University and SES-11-31897

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme

More information

Automatic Phoneme Recognition. Segmental Hidden Markov Models

Automatic Phoneme Recognition. Segmental Hidden Markov Models Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

July 6, Applause Identification and its relevance to Archival of Carnatic Music. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.

July 6, Applause Identification and its relevance to Archival of Carnatic Music. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A. Applause Identification and its relevance to Archival of Carnatic Music Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1 1 Computer Science Dept, IIT Madras, India. July 6, 2012 Outline of

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

INVESTIGATION OF MICROWAVE TRI-RESONATOR STRUCTURES

INVESTIGATION OF MICROWAVE TRI-RESONATOR STRUCTURES SCHOOL OF ELECTRONIC, ELECTRICAL AND COMPUER ENGINEERING THE UNIVERSITY OF BIRMINGHAM INVESTIGATION OF MICROWAVE TRI-RESONATOR STRUCTURES Negassa Sori Gerba A thesis submitted to the University of Birmingham

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

A Confidence-Based Late Fusion Framework For Audio-Visual Biometric Identification

A Confidence-Based Late Fusion Framework For Audio-Visual Biometric Identification Pattern Recognition Letters journal homepage: www.elsevier.com A Confidence-Based Late Fusion Framework For Audio-Visual Biometric Identification Mohammad Rafiqul Alam a,, Mohammed Bennamoun a, Roberto

More information

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr. Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

Augmented Statistical Models for Classifying Sequence Data

Augmented Statistical Models for Classifying Sequence Data Augmented Statistical Models for Classifying Sequence Data Martin Layton Corpus Christi College University of Cambridge September 2006 Dissertation submitted to the University of Cambridge for the degree

More information

1. Use Scenario 3-1. In this study, the response variable is

1. Use Scenario 3-1. In this study, the response variable is Chapter 8 Bell Work Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber of 32 cherry trees are measured by a researcher. The goal is to determine if volume of usable lumber can

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Norm Referenced Test (NRT)

Norm Referenced Test (NRT) 22 Norm Referenced Test (NRT) NRT Test Design In 2005, the MSA Mathematics tests included the TerraNova Mathematics Survey (TN) Form C at Grades 3, 4, 5, 7, and 8 and Form D at Grade 6. The MSA Grade 10

More information

Achieving Reliable Energy Production During Winter Months

Achieving Reliable Energy Production During Winter Months Achieving Reliable Energy Production During Winter Months Monelle Comeau CanWEA 2017 2017-10-05 1 AGENDA 1 Overview of ENERCON s Icing and Cold Climate Innovations 2 Ice Detection System Evaluations Technology

More information

A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY

A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY Amir Hossein Harati Nead Torbati and Joseph Picone College of Engineering, Temple University Philadelphia, Pennsylvania, USA

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores Tutorial and User Manual

FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores Tutorial and User Manual FoCal Multi-class: Toolkit for Evaluation, Fusion and Calibration of Multi-class Recognition Scores Tutorial and User Manual Niko Brümmer Spescom DataVoice niko.brummer@gmail.com June 2007 Contents 1 Introduction

More information