Machine Learning for Signal Processing Linear Gaussian Models

Size: px

Start display at page:

Download "Machine Learning for Signal Processing Linear Gaussian Models"

Alexina Pearson
5 years ago
Views:

1 Machne Learnng for Sgnal Processng Lnear Gaussan Models Class Oct 204 Instructor: Bhksha Raj 755/8797

2 Recap: MAP stmators MAP (Mamum A Posteror: Fnd a best guess for (statstcall, gven knon = argma Y P(Y 755/8797 2

3 Recap: MAP estmaton and are jontl Gaussan z [ z] z C C Var ( z Czz C [( ( ] C C P( z N( z, Czz ep 0.5( z z ( z z 2 C zz z s Gaussan 755/8797 3

4 MAP estmaton: Gaussan PDF Y F X 755/8797 4

5 MAP estmaton: he Gaussan at a partcular value of X 0 755/8797 5

6 Condtonal Probablt of P( N( C C (, C C C C [ ] C C ( Var( C C C C he condtonal probablt of gven s also Gaussan he slce n the fgure s Gaussan he mean of ths Gaussan s a functon of he varance of reduces f s knon Uncertant s reduced 755/8797 6

7 MAP estmaton: he Gaussan at a partcular value of X Most lkel value F 0 755/8797 7

8 MAP stmaton of a Gaussan RV ˆ arg ma P( [ ] 0 755/8797 8

9 Its also a mnmum-mean-squared error estmate Mnmze error: Dfferentatng and equatng to 0: 755/ ] ˆ ˆ [ ] ˆ [ 2 rr ] [ 2ˆ ˆ ˆ ] [ ] 2ˆ ˆ ˆ [ rr 0 ˆ ] [ 2 ˆ 2ˆ. d d rr d ] [ ˆ he MMS estmate s the mean of the dstrbuton

10 For the Gaussan: MAP = MMS Most lkel value s also he MAN value Would be true of an smmetrc dstrbuton 755/8797 0

11 A Lkelhood Perspectve = + s a nos readng of a rror e s Gaussan stmate A from a e ~ e N(0, 2 I Y [ 2... N ] X [ 2... N ] 755/8797 9

12 he Lkelhood of the data a e e ~ N(0, 2 I Probablt of observng a specfc, gven, for a partcular matr a P( ; a N( ; a Probablt of collecton:, P( Y X; a N( ; a, 2 I Assumng IID for convenence (not necessar 2 I 755/ Y [ 2... N ] X [ 2... N ]

13 A Mamum Lkelhood stmate Mamzng the log probablt s dentcal to mnmzng the least squared error 755/ e a (0, 2 I ~ e N ]... [ ]... [ 2 2 N N X Y D P ep (2 ( a X Y C P ; ( log a X a Y trace C P ( ( 2 ( log 2 X a Y X a Y X,a Y

14 A problem th regressons A XX - XY ML ft s senstve rror s squared Small varatons n data large varatons n eghts Outlers affect t adversel Unstable If dmenson of X >= no. of nstances (XX s not nvertble 755/

15 MAP estmaton of eghts a =a X+e Assume eghts dran from a Gaussan P(a = N(0, 2 I Ma. Lkelhood estmate Mamum a posteror estmate aˆ arg ma a X aˆ arg maa log P( Y log P( a Y, X X; a arg ma a log P( Y 755/ e X, a P( a

16 MAP estmaton of eghts Smlar to ML estmate th an addtonal term 755/ (, ( log arg ma, ( log arg ma ˆ a X a Y X Y a a A A P P P P(a = N(0, 2 I Log P(a = C log a 2 C P ( ( 2 ( log 2 X a Y X a Y X,a Y a a X a Y X a Y a A trace C ( ( 2 log ' arg ma ˆ

17 MAP estmate of eghts dl 2a XX 2X 2I da 0 a XX I - XY quvalent to dagonal loadng of correlaton matr Improves condton number of correlaton matr Can be nverted th greater stablt Wll not affect the estmaton from ell-condtoned data Also called khonov Regularzaton Dual form: Rdge regresson MAP estmate of eghts Not to be confused th MAP estmate of Y 755/

18 MAP estmate prors Left: Gaussan Pror on W Rght: Laplacan Pror 755/

19 MAP estmaton of eghts th laplacan pror Assume eghts dran from a Laplacan P(a = l - ep(-l - a Mamum a posteror estmate aˆ arg ma a C' trace ( Y a X ( Y a X l a No closed form soluton Quadratc programmng soluton requred Non-trval 755/

20 MAP estmaton of eghts th laplacan pror Assume eghts dran from a Laplacan P(a = l - ep(-l - a Mamum a posteror estmate aˆ arg ma a C' trace ( Y a X ( Y a X l a Identcal to L regularzed least-squares estmaton 755/

21 L -regularzed LS aˆ arg ma a C' trace ( Y a X ( Y a X l a No closed form soluton Quadratc programmng solutons requred Dual formulaton aˆ arg maa C ' trace ( Y a X ( Y a X subject to a t LASSO Least absolute shrnkage and selecton operator 755/

22 LASSO Algorthms Varous conve optmzaton algorthms LARS: Least angle regresson Pathse coordnate descent.. Matlab code avalable from eb 755/

23 Regularzed least squares Image Credt: bshran Regularzaton results n selecton of suboptmal (n least-squares sense soluton One of the loc outsde center khonov regularzaton selects shortest soluton L regularzaton selects sparsest soluton 755/8797 3

24 LASSO and Compressve Sensng Y = X a Gven Y and X, estmate sparse W LASSO: X = eplanator varable Y = dependent varable a = eghts of regresson CS: X = measurement matr Y = measurement a = data 755/

25 MAP / ML / MMS General statstcal estmators All used to predct a varable, based on other parameters related to t.. Most common assumpton: Data are Gaussan, all RVs are Gaussan Other probablt denstes ma also be used.. For Gaussans relatonshps are lnear as e sa.. 755/

26 Gaussans and more Gaussans.. Lnear Gaussan Models.. But frst a recap 755/

27 A Bref Recap D C B D BC Prncpal component analss: Fnd the K bases that best eplan the gven data Fnd B and C such that the dfference beteen D and BC s mnmum Whle constranng that the columns of B are orthonormal 755/

28 Remember genfaces Appromate ever face f as f = f, V + f,2 V 2 + f,3 V f,k V k stmate V to mnmze the squared error rror s uneplaned b V.. V k rror s orthogonal to genfaces 755/

29 Karhunen Loeve vs. PCA genvectors of the Correlaton matr: Prncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ 755/

30 Karhunen Loeve vs. PCA genvectors of the Correlaton matr: Prncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the Covarance matr: Prncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 755/

31 Karhunen Loeve vs. PCA genvectors of the Correlaton matr: Prncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the Covarance matr: Prncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 755/

32 Karhunen Loeve vs. PCA genvectors of the Correlaton matr: Prncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the Covarance matr: Prncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 755/

33 Karhunen Loeve vs. PCA If the data are naturall centered at orgn, KL == PCA Follong sldes refer to PCA! Assume data centered at orgn for smplct Not essental, as e ll see.. 755/8797 4

34 Remember genfaces Appromate ever face f as f = f, V + f,2 V 2 + f,3 V f,k V k stmate V to mnmze the squared error rror s uneplaned b V.. V k rror s orthogonal to genfaces 755/

35 gen Representaton 0 = + e e Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 755/

36 Representaton rror s at 90 o to the egenface = 2 + e 2 90 o 2 e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 755/

37 Representaton 0 All data th the same representaton V le a plane orthogonal to V K-dmensonal representaton rror s orthogonal to representaton 755/

38 Wth 2 bases rror s at 90 o to the egenfaces 0,0 = e e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 755/

39 Wth 2 bases rror s at 90 o to the egenfaces = e e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 755/

40 rror s at 90 o to the egenfaces e 2 In Vector Form K-dmensonal representaton X = V + 2 V 2 + e V 2 2 D 2 V rror s orthogonal to representaton Weght and error are specfc to data nstance X V V e 755/

41 rror s at 90 o to the egenface e 2 In Vector Form X = V + 2 V 2 + e 2 V e 22 V 2 D 2 V K-dmensonal representaton s a D dmensonal vector V s a D K matr s a K dmensonal vector e s a D dmensonal vector 755/

42 Learnng PCA For the gven data: fnd the K-dmensonal subspace such that t captures most of the varance n the data Varance n remanng subspace s mnmal 755/

43 Constrants rror s at 90 o to the egenface V e 2 22 V 2 D 2 e 2 V V V = I : gen vectors are orthogonal to each other For ever vector, error s orthogonal to gen vectors e V = 0 Over the collecton of data Average = Dagonal : gen representatons are uncorrelated Determnant e e = mnmum: rror varance s mnmum Mean of error s 0 755/8797 5

44 A Statstcal Formulaton of PCA rror s at 90 o to the egenface V e 22 e 2 2 V 2 D 2 V e ~ N(0, B ~ N(0, s a random varable generated accordng to a lnear relaton s dran from an K-dmensonal Gaussan th dagonal covarance e s dran from a 0-mean (D-K-rank D-dmensonal Gaussan stmate V (and B gven eamples of 755/

45 Lnear Gaussan Models!! V e ~ N(0, B e ~ N(0, s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate V gven eamples of In the process also estmate 755/8797 B and 53

46 Lnear Gaussan Models!! V e ~ N(0, B e ~ N(0, s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate V gven eamples of In the process also estmate 755/8797 B and 54

47 Lnear Gaussan Models μ V e ~ N(0, B e ~ N(0, Observatons are lnear functons of to uncorrelated Gaussan random varables A eght varable An error varable e rror not correlated to eght: [e ] = 0 Learnng LGMs: stmate parameters of the model gven nstances of he problem of learnng the dstrbuton of a Gaussan RV 755/

48 LGMs: Probablt Denst μ V e ~ N(0, B e ~ N(0, he mean of : [ ] μ V[ ] [ e] μ he Covarance of : [ [ ] [ ] ] VBV 755/

49 he probablt of μ V e e ~ N(0, B ~ N(0, ~ N( μ, VBV P( ep 0.5 D 2 VBV μ VBV μ s a lnear functon of Gaussans: s also Gaussan Its mean and varance are as gven 755/

50 stmatng the varables of the μ V model e e ~ N(0, B ~ N(0, ~ N( μ, VBV stmatng the varables of the LGM s equvalent to estmatng P( he varables are, V, B and 755/

51 stmatng the model μ V e e ~ N(0, B ~ N(0, ~ N( μ, VBV he model s ndetermnate: V = VCC - = (VC(C - We need etra constrants to make the soluton unque Usual constrant : B = I Varance of s an dentt matr 755/

52 stmatng the varables of the μ V model e ~ N(0, I e ~ N(0, ~ N( μ, VV stmatng the varables of the LGM s equvalent to estmatng P( he varables are, V, and 755/

53 he Mamum Lkelhood stmate ~ N( μ, VV Gven tranng set, 2,.. N, fnd, V, he ML estmate of does not depend on the covarance of the Gaussan μ N 755/8797 6

54 Centered Data We can safel assume centered data = 0 If the data are not centered, center t stmate mean of data Whch s the mamum lkelhood estmate Subtract t from the data 755/

55 Smplfed Model V e ~ N(0, I e ~ N(0, ~ N(0, VV stmatng the varables of the LGM s equvalent to estmatng P( he varables are V, and 755/

56 stmatng the model V e ~ N(0, VV Gven a collecton of terms, 2,.. N stmate V and s unknon for each But f assume e kno for each, then hat do e get: 755/

57 stmatng the Parameters V e P( e N(0, P( N( V, P( 2 D ep 0.5( V ( V We ll use a mamum-lkelhood estmate he log-lkelhood of.. N knong ther s log P(.. N.. N 0.5N log 0.5 ( V ( V 755/

58 Mamzng the log-lkelhood Dfferentatng.r.t. V and settng to 0 755/ N LL ( ( 0.5 log 0.5 V V 0 ( 2 V V Dfferentatng.r.t. - and settng to 0 N V

59 stmatng LGMs: If e kno But n realt e don t kno the for each So ho to deal th ths? M.. 755/ V N V e V (0, ( P N e

60 Recall M Instance from blue dce Instance from red dce Dce unknon Collecton of blue numbers Collecton of red numbers Collecton of blue numbers Collecton of red numbers Collecton of blue numbers Collecton of red numbers We fgured out ho to compute parameters f e kne the mssng nformaton hen e fragmented the observatons accordng to the posteror probablt P(z and counted as usual In effect e took the epectaton th respect to the a posteror probablt of the mssng data: P(z 755/

61 M for LGMs Replace unseen data terms th epectatons taken.r.t. P( 755/ V N V e V (0, ( P N e N N V ] [ ] [ ] [ V

62 M for LGMs Replace unseen data terms th epectatons taken.r.t. P( 755/ V N V e V (0, ( P N e N N V ] [ ] [ ] [ V

63 pected Value of gven V e P( N(0, e P( N(0, I P( N(0, VV and are jontl Gaussan! s Gaussan s Gaussan he are lnearl related z P( z N(, z Czz 755/8797 7

64 pected Value of gven C V [( ( ] e P( N(0, VV P( N(0, I z V P( z N( z, Czz C C zz z zz C C VV V 0 C C V I and are jontl Gaussan! 755/

65 he condtonal epectaton of gven z P( z s a Gaussan 755/ , ( ( ( C C C C C C N P I C V V VV zz zz C C C C C 0 z (, ( ( ( V VV V VV V I N P VV V ( ] [ Var ] [ ] [ ( ] [ I ] [ ] [ ( ] [ V VV V

66 LGM: he complete M algorthm Intalze V and step: M step: 755/ VV V ( ] [ I ] [ ] [ ( ] [ V VV V ] [ ] [ V N N V ] [

67 So hat have e acheved mploed a complcated M algorthm to learn a Gaussan PDF for a varable What have e ganed??? Net class: PCA Sensble PCA M algorthms for PCA Factor Analss FA for feature etracton 755/

68 LGMs : Applcaton Learnng prncpal components V e ~ N(0, I e ~ N(0, Fnd drectons that capture most of the varaton n the data rror s orthogonal to these varatons 755/

terms Full captures the relatonshps beteen varables

69 LGMs : Applcaton 2 Learnng th nsuffcent data FULL COV FIGUR he full covarance matr of a Gaussan has D 2 terms Full captures the relatonshps beteen varables Problem: Needs a lot of data to estmate robustl 755/

70 o be contnued.. Other applcatons.. Net class 755/

Machine Learning for Signal Processing Linear Gaussian Models

Machine Learning for Signal Processing Linear Gaussian Models Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797 HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2 Recap: MA stmators