Machine Learning for Signal Processing Applications of Linear Gaussian Models

Size: px

Start display at page:

Download "Machine Learning for Signal Processing Applications of Linear Gaussian Models"

Judith White
5 years ago
Views:

1 Machne Learnng for Sgnal Processng Applcatons of Lnear Gaussan Models Class 8. 3 Nov 207 Instructor Najm Dehak In collaboraton th Prof Bhksha Raj

2 Recap: MAP stmators MAP (Mamum A Posteror): Fnd most probable value of y gven y = argma Y P(Y ) /

3 MAP estmaton and y are jontly Gaussan z y z s Gaussan z z y C Cy ar ( z) Czz Cy ( )( y y ) C y Cyy P( z) N(, C z zz ) 2 C zz ep 0.5( z ) z C zz ( z ) z /

4 MAP estmaton: Gaussan PDF Y F X /

5 MAP estmaton: he Gaussan at a partcular value of X /

6 Condtonal Probablty of y P( y ) N( y C y C ( ), C yy C y C C y ) y C C ( ) y y y y ar( y ) C yy C y C C y he condtonal probablty of y gven s also Gaussan he slce n the fgure s Gaussan he mean of ths Gaussan s a functon of he varance of y reduces f s knon Uncertanty s reduced /

7 MAP estmaton: he Gaussan at a partcular value of X Most lkely value F /

8 Its also a mnmum-mean-squared error estmate Mnmze error: Dfferentatng and equatng to 0: / ˆ ˆ ˆ 2 y y y y y y rr 2ˆ ˆ ˆ 2ˆ ˆ ˆ y y y y y y y y y y y y rr 0 ˆ 2 ˆ 2ˆ. y y y y d d rr d ˆ y y he MMS estmate s the mean of the dstrbuton

9 For the Gaussan: MAP = MMS Most lkely value s also he MAN value Would be true of any symmetrc dstrbuton /

10 Gaussans and more Gaussans.. Lnear Gaussan Models.. PCA to develop the dea of LGM 0

11 A Bref Recap D C B D BC Prncpal component analyss: Fnd the K bases that best eplan the gven data Fnd B and C such that the dfference beteen D and BC s mnmum Whle constranng that the columns of B are orthonormal

12 Learnng PCA For the gven data: fnd the K-dmensonal subspace such that t captures most of the varance n the data arance n remanng subspace s mnmal 2

13 A Statstcal Formulaton of PCA rror s at 90 o to the egenface e 22 e D 2 e ~ N(0, B) ~ N(0, ) s a random varable generated accordng to a lnear relaton s dran from an K-dmensonal Gaussan th dagonal covarance e s dran from a 0-mean (D-K)-rank D-dmensonal Gaussan stmate (and B) gven eamples of 3

14 Lnear Gaussan Models!! e ~ N(0, B) e ~ N(0, ) s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate gven eamples of In the process also estmate B and 4

15 stmatng the varables of the μ model e ~ N(0, I) e ~ N(0, ) ~ N( μ, ) stmatng the varables of the LGM s equvalent to estmatng P() he varables are,, and 5

16 he Mamum Lkelhood stmate ~ N( μ, ) Gven tranng set, 2,.. N, fnd,, he ML estmate of does not depend on the covarance of the Gaussan μ N 6

17 Smplfed Model e ~ N(0, I) e ~ N(0, ) ~ N(0, ) stmatng the varables of the LGM s equvalent to estmatng P() he varables are, and 7

18 LGM: he complete M algorthm Intalze and step: M step: 8 ) ( I ) ( N N

19 So hat have e acheved mployed a complcated M algorthm to learn a Gaussan PDF for a varable What have e ganed??? ample uses: PCA Sensble PCA M algorthms for PCA Factor Analyss FA for feature etracton 9

20 LGMs : Applcaton Learnng prncpal components e ~ N(0, I ) e ~ N(0, ) Fnd drectons that capture most of the varaton n the data rror s orthogonal to prncpal drectons e = 0; e = 0 20

21 Some Observatons: e e ~ N(0, ) ee ee 0e 0 he covarance of e s orthogonal to 2

22 Observaton ) ( ) ( ) ( ) ( ) ( ) ( Proof ) ( ) ( 0 ) ( I

23 Observaton 3 0 ( ) ( ) pnv() 23

24 LGM: he complete M algorthm Intalze and step: M step: 24 ) ( I ) ( N N W X e

25 LGM: he complete M algorthm Intalze and step: M step: 25 ) ( I ) ( N N W X e

26 LGM: he complete M algorthm Intalze and step: M step: 26 I ) ( N N pnv ) ( ) ( W X e

27 LGM: he complete M algorthm Intalze and step: M step: 27 N N pnv ) ( I ) ( W X e

28 LGM: he complete M algorthm Intalze and step: M step: 28 N N I ) ( X W ) pnv( X W pnv ) (

29 LGM: he complete M algorthm Intalze and step: M step: 29 I ) ( N N X W ) pnv( X W pnv ) (

30 M for PCA Intalze and step: M step: 30 N N X W ) pnv( X W I ) ( pnv ) (

31 M for PCA Intalze and step: M step: 3 N N X W ) pnv( pnv ) ( I ) (

32 M for PCA Intalze and step: M step: 32 ) ( WW XW N N X W ) pnv( pnv ) ( I ) (

33 M for PCA Intalze and step: M step: 33 ) ( ) ( W X WW XW pnv N N X W ) pnv( pnv ) ( I ) (

34 M for PCA Intalze and step: pnv( ) W pnv( ) X I ( ) M step: X pnv(w) N N 34

35 M for PCA Intalze and step: M step: 35 (W) X pnv N N X W ) pnv( I ) (

36 I ) ( M for PCA Intalze and step: M step: 36 (W) X pnv N N X W ) pnv( rrelevant

37 M for PCA Intalze Iterate W pnv( ) X X pnv(w) Note: ll not be actual egenvectors, but a set of bases n space spanned by prncpal egenvectors Addtonal decorrelaton thn PC space may be needed 37

38 Why M PCA? X XX ample: Computng egenfaces ach face s 0000 : 0000 dmensonal But only 300 eamples X s What s the sze of the covarance matr? What s ts rank? 38

39 PCA on llcondtoned data Fe nstances of hgh-dmensonal data No. nstances < dmensonalty Covarance matr s very large gen decomposton s epensve.g dmensonal data: Covarance has 0 2 elements But the rank of the covarance s lo Only the no. of nstances of data 39

40 Why M PCA? X X W , W Consequence of lo rank X he actual number of bases s lmted to the rank of X Note actual sze of Ma number of columns = mn(dmenson, no. data ponts) No. of columns = rank of (XX ) Note sze of W Ma number of ros = mn(dmenson, no. of data ponts) 40

41 Why M PCA? X X W , W If X s hgh dmensonal Partcularly f the number of vectors n X s smaller than the dmensonalty Pnv() and pnv(w) are effcent to compute ll have a ma of 300 columns n the eample W ll have a ma of 300 ros 4

42 PCA as an nstance of LGM eng PCA as an nstance of lnear Gaussan models leads to M soluton ery effectve n dealng th hghdmensonal and/or data poor stuatons An asde: Another smpler soluton for the same stuaton.. 42

43 An Asde: he GRAM trck X XX he number of non-zero gen values s no more than the length of the smallest edge of X 300 n ths case hs leads to the gram trck.. Assumpton X X s nvertble: the nstances are lnearly ndependent 43

44 An Asde: he GRAM trck X X If X s , XX = XX s large but X X s not X X If X s , X X = Dffcult to compute gen vectors of XX But easy to compute gen vectors of X X 44

45 he Gram rck o compute prncpal vectors e gendecompose XX XX Let us fnd the gen vectors of X X nstead X X ˆ ˆ ˆ Manpulatng t slghtly Note that for a dagonal matr: -0.5 = -0.5 X Xˆ ˆ ˆ ˆ ˆ 45

46 he Gram rck gendecompose X X nstead of XX X X ˆ ˆ ˆ X Xˆ ˆ ˆ ˆ ˆ ˆ ˆ 0.5 ˆ ˆ 0. 5 XX X X ˆ Lettng: Xˆ ˆ 0. 5 XX ˆ s the matr of genvectors of XX!!! 46

47 he Gram rck When X s lo rank or XX s too large: Compute X X nstead Wll be manageable sze Perform gen Decomposton of X X X X ˆ ˆ ˆ Compute genvectors of XX as Xˆ ˆ 0. 5 hese are the prncpal components of X 47

48 Why M PCA Dmensonalty / Rank has alternate potental soluton Gram rck Other uses? Nose Incomplete data 48

49 PCA th nosy data e n ~ N(0, I ) e ~ N(0, ) n ~ N(0, B) rror s orthogonal to prncpal drectons e = 0; e = 0 Nose s sotropc B s dagonal Nose s not orthogonal to ether or e 49

50 LGM: he complete M algorthm Intalze and step: M step: 50 ) ( I ) ( N N

51 PCA th Nosy Data Intalze and B step: ( B) W X C NI N WW M step: XW C B N dag XX WX 5

52 PCA th Incomplete Data Ho to compute prncpal drectons hen some components n your tranng data are mssng? gen decomposton s not possble Cannot compute correlaton matr th mssng data 52

53 PCA th mssng data Ho t goes Gven : X = {X c, X m } X m are mssng components. Intalze: Intalze X m 2. Buld complete data X = {X c, X m } 3. PCA (X = W): stmate must have feer bases than dmensons of X 4. W = X 5. Xˆ = W 6. Select X m from 7. Return to 2 Xˆ 53

54 LGM for PCA Obvously many uses: Ill-condtoned data Nose Mssng data Any combnaton of the above.. 54

55 LGMs : Applcaton 2 Learnng th nsuffcent data he full covarance matr of a Gaussan has D 2 terms Fully captures the relatonshps beteen varables Problem: Needs a lot of data to estmate robustly 55

Covarance has only D terms Needs less data Problem :

56 An Appromaton Assume the covarance s dagonal Gaussan s algned to aes : no correlaton beteen dmensons Covarance has only D terms Needs less data Problem : Model loses all nformaton about correlaton beteen dmensons 56

57 Is here an Intermedate Capture the most mportant correlatons But requre less data Soluton: Fnd the key subspaces n the data Capture the complete correlatons n these subspaces Assume data s otherse uncorrelated 57

58 Factor Analyss e ~ N(0, I) e ~ N(0, ) ~ N(0, ) s a full rank dagonal matr has K columns: K-dmensonal subspace We ll capture all the correlatons n the subspace represented by stmated covarance: Dagonal covarance plus the covarance beteen dmensons n 58

59 Factor Analyss Intalze and step: M step: 59 ) ( I ) ( N dag N

60 FA Gaussan Wll get a full covarance matr But only estmate DK terms Data nsuffcency less of a problem 60

61 he Factor Analyss Model e ~ N(0, I) e ~ N(0, ) LOADINGS FACORS Often used to learn dstrbuton of data hen e have nsuffcent data Often used n psychometrcs Underlyng model: he actual systematc varatons n the data are totally eplaned by a small number of factors FA uncovers these factors 6

62 FA, PCA etc. e ~ N(0, I) e ~ N(0, ) Note: dstncton beteen PCA and FA s only n the assumptons about e FA looks a lot lke PCA th nose FA can also be performed th ncomplete data 62

63 FA, PCA etc. PCA: rror s alays at 90 degrees to the bases n FA: rror may be at any angle PCA used manly to fnd prncpal drectons that capture most of the varance Bases n ll be orthogonal to one another FA tres to capture most of the covarance 63

64 FA: A very successful use oce bometrcs: Speaker recognton Gven: Only a small amount of tranng data from a speaker to learn ts model Use to verfy speaker later Problem: Immense varaton n ays people can speak Less than mnute of tranng data totally nsuffcent! 64

65 Speaker Recognton Speaker Identfcaton Speaker erfcaton Is ths Bob s Whose voce s voce? ths????? Speaker Darzaton : Segmentaton and clusterng Where are speaker changes? Speaker A Whch segments are from the same speaker? Speaker B 65

66 Frequency (Hz) Modelng Sequence of Features Gaussan Mture Models For most recognton tasks, e need to model the dstrbuton of feature vector sequences In practce, e often use Gaussan Mture Models (GMMs) Sgnal Space me (sec) MANY ranng Utterances vec/sec Feature Space GMM

67 Why GMMs oel Classfcaton PCA 67

68 Speaker erfcaton A model represents dstrbuton of cepstral vectors for the speaker A second model represents everyone else (potental mposters) he cepstra computed from a test recordng are scored aganst both models Accept the speaker f the speaker model scores hgher 68

69 GMM for speaker verfcaton We enroll a gven speaker by adaptng the UBM usng the speaker s nput speech. Reynolds 2000 Speaker Jm UBM est Utterance Yes / No? 69

70 Speaker erfcaton Problem: One typcally has only a fe seconds or mnutes of tranng data from the speaker Hard to estmate speaker model est data may be spoken dfferently, or come over a dfferent channel, or n nose Wont really match 70

71 Frequency (Hz) Modelng Sequence of Features Gaussan Mture Models For most recognton tasks, e need to model the dstrbuton of feature vector sequences In practce, e often use Gaussan Mture Models (GMMs) Sgnal Space me (sec) MANY ranng Utterances Feature Space GMM vec/sec 2 k hs supervector s the feature that represents the recordng

72 ranng Supervectors are obtaned for each tranng speaker by adaptng a Unversal background model traned from large amounts of data Fe data by each speaker to tran a GMM based on Mamum lkelhood 72 k 2 k 2 k 2 k 2

73 ranng the Factor Analyzer he supervectors are assumed to be the output of a lnear Gaussan process Use FA to estmate are the factors that cause varatons he real nformaton s n the factor 73 k 2 k 2 k 2 k 2 e ) (0, ~ N e ) (0, ~ I N

74 I-vector : otal varablty space

75 I-ector Factor analyss as feature etractor Speaker and channel dependent supervector M = m + s rectangular, lo rank (total varablty matr) standard Normal random (total factors ntermedate vector or -vector) Factor Analyss M F C C M F C C M F C C M F C C m 2 I - e c t o r

76 ranng models for a speaker 2 k = +e Use Lnear Dscrmnant Analyss to mamze ~ N(0, I the dscrmnaton beteen the speakers ) e ~ N(0, ) 76

77 Data sualzaton based on Graph Nce performance of the cosne smlarty for speaker recognton Data vsualzaton usng the Graph ploraton System (GUSS) Represent segment as a node th connectons (edges) to nearest neghbors (3 NN used) NN computed usng blnd system (th and thout channel normalzaton) Appled to 5438 utterances from the NIS SR0 core Multple telephone and mcrophone channels Absolute locatons of nodes not mportant Relatve locatons of nodes to one another s mportant: he vsualzaton clusters nodes that are hghly connected together Meta data (speaker ID, channel nfo) not used n layout Colors and shapes of nodes used to hghlght nterestng phenomena

78 Females data th ntersesson compensaton Colors represent speakers

79 Females data th no ntersesson compensaton Colors represent speakers

80 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI

81 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI L

82 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC

83 Females data th no ntersesson compensaton Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC

84 Females data th ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI

85 Males data th ntersesson compensaton Colors represent speakers

86 Males data th no ntersesson compensaton Colors represent speakers

87 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI

88 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI L

89 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC

90 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI

91 Speaker representaton - v e c t o r - v e c t o r - v e c t o r - v e c t o r - v e c t o r Clusterng /

92 Speaker clusterng /

93 PCA sualzaton /

94 520-42/

Machine Learning for Signal Processing Linear Gaussian Models

Machine Learning for Signal Processing Linear Gaussian Models Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797 HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2 Recap: MA stmators