University of Birmingham Research Archive

Size: px

Start display at page:

Download "University of Birmingham Research Archive"

Elisabeth Dean
6 years ago
Views:

2 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.

23 1

29 Chapter 1 Introduction Chapter 2 Chapter 3 Literature and review of techniques Corpora descriptions microphone recorded speech signals telephony recorded speech signals Chapter 4 Base-line systems and system evaluations Adult speech Sub-band experiments Thesis Chapter 5 Accent ID vs Speaker ID Adult speech Microphon e recorded speech signals Full-band experiments Full-band experiments Chapter 6 Speaker recognition Adult speech Microphone recorded speech signals Sub-band experiments Child speech Microphone recorded speech signals Full-band experiments Chapter 7 Gender ID Sub-band experiments Age-group ID Child speech Microphone recorded speech signals Full-band experiments Human experimtents Chapter 8 Conclusion Child speech Microphone recorded speech signals Sub-band experimetns Full-band experiments Human experiments Sub-band experiments

31 2

36 [ ] [ ] = [ ] [ ]

37 [, / ] Gain 9 x Frequency (Hz)

38 = = ( + ) = + (, ) = ( + + ) ( + )

39 =,..., =,..., ; =,..., ( ) ; ; = ; ;, =,..., ; = = ;

40 = +

41 = = ( ) ( ) ( ) = = = ( ) ( ) ( )

42 (, ) ( ) ( )

43 ( ) = (, ) = = {,, }, =,...,.

44 (, ), T (, ) =, [ (, )], = =, T, = (, ) = (, ) = T =, T =,

45 ( ) = (, ) = = {,, }, =,...,. (, ) = + + = T =, T =, = T, =,

47 = + () {, } =,..., {, }, R. =,,..., >. + =. + + = +. + = (. + ). + = +. + = (. + )

48 y w w x + b = 1 w x + b = 0 2 w w x + b = 1 x b w (. + ) + = =

49 = = = = =. =, (. + ) =

50 (.,.) ( ) = (, ) + = = = ( ) (.,.) (.,.) (, ) = ( ) ( ) ( ) ( )

51 (, )

52 N (, ) N ( +, ) < : > = = ( ) + ( ) ( ) ( ) + ( ) ( ) + ( ) ( )

53 N (, ) N (, ), = + + [ ] N (, ). N (, ) [ ] = [ ] = [ + + ] = + [ ] + [ ] = + + =. [ ] = = [( [ ])( [ ]) ] = [( [ ])( [ ]) ] = [( [ ])( [ ]) ] N (, ) = ( ) = [( [ ])( [ ]) ] = [ ( + + ) ] = [ ] + [ ] =.

54 [ ] = ( ) [ ] = [ ] [ ] = [( [ ])( [ ]) ] = [( + + )( + + ) ] = [ ] = [ ] + [ ] = +. [ ] ([ ] N, [ + ]). N (, + ) ( ) ; =,..., L(,, ) = = ( ) () / + / ( ( ) ) ( + ) ( ( ) )

55 Speaker/session dependent Mean supervector : > UBM MAP Apapt.... Features for a given speaker S Speaker/session dependent GMM. = + (, )

56 T (, ) =, ( [ + ], )N ( ) = = N ( ) = [ + ] = [, ( )], = = = T (, ) =,,, (, [ + ],, ).

57 Unlabelled data UBM T Testutterance Front-end processing Super-vector extraction i-vector extraction Optional normalization and compensation techniques, such as LDA and length norm. Scoring Model i-vector Score (, ) =,

58 (, ) = ( ) (, ) =, ( ) =, (, ) =,

59 =. = = = =

60 X1 Y1 i-vector extraction Model 1 i-vector extraction X2 Y1... Xn Ym i-vector extraction i-vector extraction LDA and length norm. Model 2... Model m Decision LDA and length norm. Test utterance

61 = +, ( ) ( )

63 =. µ T µ I µ T µ I

64 =

65 = {(, ),..., (, )} = ( )

68 =. %,. %. %. %. %. %. %. %. %. %.% %

73 3

Train Test Evaluation Train Test Evaluation Train and evaluation Test Evaluation CSLU Kids Corpus Total number of speakers 1118 Age-Group ID Gender ID Speaker ID (Identifying a child in school) 352

75 Train Test Evaluation Train Test Evaluation Train and evaluation Test Evaluation CSLU Kids Corpus Total number of speakers 1118 Age-Group ID Gender ID Speaker ID (Identifying a child in school) 352 spk. 766 spk. 766 spk.(n -1 file per spk.)* 430 spk. 687 spk. 687 spk.(n -1 file per spk.)* 918 spk. 100 spk. 100 spk. 50% male and 50% female. 54.2% male and 45.8% female. 50% male and 50% female. 55.7% male and 44.3% female. 50% male and 50% female.

79 Train and Evaluation Test TIMIT Corpus Total number of speakers 630 (438Male+192 Female) Speaker Identification 530 spk. 100 spk.

82 4

87 = = =.

89 Speaker Detection Performance 40 Miss probability (in %) False Alarm probability (in %)

91 Development Features UBM Training Computing Statistics T-Matrix Training Extracting i-vectors LDA Training i-vectors Test i-vectors Scoring Decision

93 40 False Negative Rate (FNR) [%] False Positive Rate (FPR) [%] Score Model Index Test Index

95 5

97 + =

99 ,...,

100 X1 Y1 i-vector extraction i-vector extraction X2 Y1... Xn Ym i-vector extraction i-vector extraction SVM Decision Test utterance

101

102 Identification Rate (%) A SID (30 Seconds test segments) SID (10 Seconds test segments) SID (3 Seconds test segments) B C D Sub Band

103 24 21 A B C D 18 Identification Rate (%) Sub Band A C D 0.01 NSID NAID B Sub Band

104 24 21 A B C D 18 Identification Rate (%) Sub Band

105 A C D NSID NAID B Sub Band....%

106

107

108 6

109

110

111 /

112

113 <

114

115 3 2.5 EER with 90% Confidence Interval EER (%) Number of Mixture Components

116 5 EER with 90% Confidence Interval 4 EER (%) Frequency (Hz)

117 8 7 EER with 90% Confidence Interval EER (%) Frequency (Hz)

118

119

120 EER (%) B1 GMM UBM (64 Mixture Components) GMM SVM (64 Mixture Components) B2 B3 B Sub Band GMM UBM (64 Mixture Components) GMM SVM (64 Mixture Components) 35 Identification Rate(%) Sub Band

121 GMM UBM GMM SVM GMM UBM Correlation Matrix GMM SVM

122 40 35 Kth to 2th Grade Speakers 3th to 6th Grade Speakers 7th to 10th Grade Speakers 30 Identification Rate (%) Sub Band

123

124

125 7

126

127

128 +

129

130

131 Full bandwidth performance Identification Rate (%) S1 S5 S9 S13 S17 S21 Sub Band Identification Rate (%) AG1 (5 9 years old) AG2 (9 13 years old) AG3 (13 16 years old) AG1 FB AG2 FB AG3 FB S1 S5 S9 S13 S17 S21 Sub Band

132

133 Identification Rate (%) S1 S5 S9 S13 S17 S21 Sub Band

134 0.015 Normalised GenderID Normalised AgeID Sub Band

135

136

137

138

139

140

141

142

143

144 8

145

146

147

148

149 A

150 B (, ) = = ( ) ( ) (, ) = = = ( )( ) = ( ) ( ) =

151 C = (, )

152

153 D [ ] = + N (, ) [ ] =, [ ] =. = [ ] = ( ) = [( )( ) ] =

154 ( ) = [ ] = = [( )( ) [ ] ( = )( ) ( )( ) ( )( ) ( )( ) N (, ) N (, ) = + ( ), =.

155 E {,,..., }, ( ) = ( ) = ( ) = ( ) = { ( ) = ( ) } ( ) = ( ) = ( ) ( )

156 ( ) = ( ) { ( ) + ( ) ( ) } ( ) ( ) = ( ) ( ) ( ) = ( ) ( ) ( ) = ( ) ( ) ( ) ( ) ( ),..., ( ) ( ) ( ) = + ( ) ( ) ( ) ( ) ( ( ) ( ), ( )) [ ( )] ( ) [ ( )] = ( ) ( ) [ ( )] ( ) ( ) ( ) ( ) = ( ) = ( ) ( )

157 C = ( ) ( ( ) ( )) = ( ) ( ( ) ( )) ( ) C = C C = C C ( ) = ( + ( ) ). ( )

158 F << ( ) ( ) { = + ; } ( ) + ( ) ( ) ( ) + ( ) ( ) + ( ) ( ) =

159 >

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

Front-End Factor Analysis For Speaker Verification

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This