Machine Learning for Signal Processing Regression and Prediction

Size: px

Start display at page:

Download "Machine Learning for Signal Processing Regression and Prediction"

June Blankenship
5 years ago
Views:

1 Machne Learnng for Sgnal Processng Regresson and Predcton Class 4. 7 Oct 0 Instructor: Bhsha Raj 7 Oct /8797

2 f... D df Matr Identtes df d d df d d... df dd dd he dervatve of a scalar functon w.r.t. a vector s a vector 7 Oct /8797

3 Matr Identtes f.. D.. D D D DD df df df df d d d d d D.. d df df df d d.. d d d d D df df.. df dd dd d dd dd ddd D D DD he dervatve of a scalar functon w.r.t. a vector s a vector he dervatve w.r.t. a matr s a matr 7 Oct /8797 3

4 Matr Identtes F F F F FN D df d d df df df d... d dfn.. dfn d d df d d df d d.. dfn d d df d dd.. df.. d.. dd.... dfn d d D D D D he dervatve of a vector functon w.r.t. a vector s a matr Note transposton of order 7 Oct /8797 4

5 Dervatves, UV, N UV N NUV UVN In general: Dfferentatng an MN functon b a UV argument results n an MNUV tensor dervatve 7 Oct /8797 5

6 Matr dervatve denttes d Xa Xda d a X X da X s a matr, a s a vector. Soluton ma also be X d AX da X ; d XA X da A s a matr d d a Xa a X X da A XA XAA AA X trace d trace d trace X X da Some basc lnear and quadratc denttes 7 Oct /8797 6

7 A Common Problem Can ou spot the gltches? 7 Oct /8797 7

8 How to f ths problem? Gltches n audo Must be detected How? hen what? Gltches must be fed Delete the gltch Results n a hole Fll n the hole How? 7 Oct /8797 8

9 Interpolaton.. Etend the curve on the left to predct the values n the blan regon Forward predcton Etend the blue curve on the rght leftwards to predct the blan regon Bacward predcton How? Regresson analss.. 7 Oct /8797 9

10 Detectng the Gltch OK NO OK Regresson-based reconstructon can be done anwhere Reconstructed value wll not match actual value Large error of reconstructon dentfes gltches 7 Oct /8797 0

11 What s a regresson Analzng relatonshp between varables Epressed n man forms Wpeda Lnear regresson, Smple regresson, Ordnar least squares, Polnomal regresson, General lnear model, Generalzed lnear model, Dscrete choce, Logstc regresson, Multnomal logt, Med logt, Probt, Multnomal probt,. Generall a tool to predct varables 7 Oct /8797

12 Regressons for predcton = f; Q + e Dfferent possbltes s a scalar s real s categorcal classfcaton s a vector s a vector s a set of real valued varables s a set of categorcal varables s a combnaton of the two f. s a lnear or affne functon f. s a non-lnear functon f. s a tme-seres model 7 Oct /8797

13 A lnear regresson Y Assumpton: relatonshp between varables s lnear A lnear trend ma be found relatng and = dependent varable = eplanator varable Gven, can be predcted as an affne functon of X 7 Oct /8797 3

14 An magnar regresson.. Chec ths sht out Fg.. hat's bonafde, 00%-real data, m frends. I too t mself over the course of two wees. And ths was not a lesurel two wees, ether; I busted m ass da and nght n order to provde ou wth nothng but the best data possble. Now, let's loo a bt more closel at ths data, rememberng that t s absolutel frst-rate. Do ou see the eponental dependence? I sure don't. I see a bunch of crap. Chrst, ths was such a waste of m tme. Banng on m hopes that whoever grades ths wll just loo at the pctures, I drew an eponental through m nose. I beleve the apparent legtmac s enhanced b the fact that I used a complcated computer program to mae the ft. I understand ths s the same process b whch the top quar was dscovered. 7 Oct /8797 4

15 Lnear Regressons = A + b + e e = predcton error Gven a tranng set of {, } values: estmate A and b = A + b + e = A + b + e 3 = A 3 + b+ e 3 If A and b are well estmated, predcton error wll be small 7 Oct /8797 5

16 Lnear Regresson to a scalar = a + b + e = a + b + e 3 = a 3 + b + e 3 Defne: [...] 3 e [ e e...] 3 e X 3... A a b Rewrte A X e 7 Oct /8797 6

17 Learnng the parameters A X e ˆ A X Assumng no error Gven tranng data: several, Can defne a dvergence : D, ŷ Measures how much ŷ dffers from Ideall, f the model s accurate ths should be small Estmate A, b to mnmze D, ŷ 7 Oct /8797 7

18 he predcton error as dvergence Defne dvergence as sum of the squared error n predctng 7 Oct / e e X A ˆ = a + b + e = a + b + e 3 = a 3 + b + e 3... ˆ 3 e e e E D, b b b a a a X A X A X A E

19 Predcton error as dvergence = a + e e = predcton error Fnd the slope a such that the total squared length of the error lnes s mnmzed 7 Oct /8797 9

20 Solvng a lnear regresson A X e Mnmze squared error E X A A X A X XX A - X Dfferentatng w.r.t A and equatng to 0 A A XX - X A 0 de d A A - X XX pnvx A XX - X 7 Oct /8797 0

21 Regresson n multple dmensons = A + b + e = A + b + e 3 = A 3 + b + e 3 s a vector j = j th component of vector Also called multple regresson Equvalent of sang: a = th column of A b j = j th component of b = a + b + e = A + b + e = a + b + e 3 = a 3 + b 3 + e 3 Fundamentall no dfferent from N separate sngle regressons But we can use the relatonshp between s to our beneft 7 Oct /8797

22 Multple Regresson Y [...] 3 E [ e e...] 3 e Y 3 A X... Â b Aˆ X E D vector of ones DIV Aˆ trace Dfferentatng and equatng to 0 7 Oct /8797 Y ˆ Y - A X X daˆ 0 d. Dv Aˆ X Y Aˆ X ˆ YX XX - YpnvX ˆ A XX A YX Aˆ - XX XY

23 A Dfferent Perspectve = + s a nos readng of A Error e s Gaussan Estmate A from e ~ A e N0, I Y [... N ] X [... N ] 7 Oct /8797 3

24 he Lelhood of the data Probablt of observng a specfc, gven, for a partcular matr A Probablt of collecton: Assumng IID for convenence not necessar 7 Oct / e A 0, I ~ e N, ; ; I A A N P ]... [ ]... [ N N X Y N P, ; ; I A A X Y

25 A Mamum Lelhood Estmate A e e ~ N0, I P Y X ep A D log P Y X; A C C Mamzng the log probablt s dentcal to mnmzng the trace Identcal to the least squares soluton Y [... N ] X [... trace N A Y A X Y A X ] A - YX XX Ypnv X A XX XY 7 Oct /

26 Predctng an output From a collecton of tranng data, have learned A Gven for a new nstance, but not, what s? Smple soluton: ˆ A X 7 Oct /8797 6

27 Applng t to our problem Predcton b regresson Forward regresson t = a t- + a t- a t- +e t Bacward regresson t = b t+ + b t+ b t+ +e t 7 Oct /8797 7

28 Applng t to our problem Forward predcton 7 Oct / K t t t K K K t t t K t t t K t t e e e a e Xa t pnv a t X

29 Applng t to our problem Bacward predcton 7 Oct / e e e K t K t t K K K t t t K t t t K t K t b e Xb t pnv b t X

30 Fndng the burst At each tme Learn a forward predctor a t At each tme, predct net sample t est = S a t, t- Compute error: ferr t = t - t est Learn a bacward predct and compute bacward error berr t Compute average predcton error over wndow, threshold 7 Oct /

31 Fllng the hole Learn forward predctor at left edge of hole For each mssng sample At each tme, predct net sample t est = S a t, t- Use estmated samples f real samples are not avalable Learn bacward predctor at left edge of hole For each mssng sample At each tme, predct net sample t est = S b t, t+ Use estmated samples f real samples are not avalable Average forward and bacward predctons 7 Oct /8797 3

32 Reconstructon zoom n Reconstructon area Net gltch Dstorted sgnal Recovered sgnal Interpolaton result Actual data 7 Oct /8797 3

33 Incrementall learnng the regresson A XX Can we learn A ncrementall nstead? As data comes n? he Wdrow Hoff rule a t a t XY Note the structure Can also be done n batch mode! - Requres nowledge of all, pars t t ˆ t t ˆ t a t error Scalar predcton verson 7 Oct /

34 A Predctng a value - XX XY A YX XX ˆ What are we dong eactl? For the eplanaton we are assumng no b X s 0 mean Eplanaton generalzes easl even otherwse Let ˆ C Whtenng C XX ˆ and YX C C Xˆ C X N -0.5 C -0.5 s the whtenng matr for YXˆ 7 Oct / ˆ

35 Predctng a value What are we dong eactl? 7 Oct / YX ˆ ˆ ˆ ˆ ˆ N N N YX ˆ ˆ ˆ ˆ ˆ... ˆ ˆ ˆ

36 Predctng a value ˆ ˆ ˆ Gven tranng nstances, for =..N, estmate for a new test nstance of wth unnown : s smpl a weghted sum of the nstances from the tranng data he weght of an s smpl the nner product between ts correspondng and the new Wth due whtenng and scalng.. 7 Oct /

37 What are we dong: A dfferent perspectve ˆ A YX XX Assumes XX s nvertble What f t s not Dmensonalt of X s greater than number of observatons? Underdetermned In ths case X X wll generall be nvertble A X Y X X ˆ X X Y X 7 Oct /

38 Hgh-dmensonal regresson X X s the Gram Matr 7 Oct / X X Y X ˆ N N N N N N G X YG ˆ

39 Hgh-dmensonal regresson ˆ YG X Normalze Y b the nverse of the gram matr Y YG Worng our wa down.. ˆ YX ˆ 7 Oct /

40 Lnear Regresson n Hgh-dmensonal Spaces ˆ Y YG Gven tranng nstances, for =..N, estmate for a new test nstance of wth unnown : s smpl a weghted sum of the normalzed nstances from the tranng data he normalzaton s done va the Gram Matr he weght of an s smpl the nner product between ts correspondng and the new 7 Oct /

41 Relatonshps are not alwas lnear How do we model these? Multple solutons 7 Oct /8797 4

42 = Aj+e Non-lnear regresson φ [... N ] X X [ φ φ... φ K ] Y = AX+e Replace X wth X n earler equatons for soluton A X X - X Y 7 Oct /8797 4

43 Y = AX+e Problem Replace X wth X n earler equatons for soluton A X X - X Y X ma be n a ver hgh-dmensonal space he hgh-dmensonal space or the transform X ma be unnown.. 7 Oct /

44 he regresson s n hgh dmensons Lnear regresson: Hgh-dmensonal regresson 7 Oct / ˆ YG Y N N N N N G YG Y ˆ

45 Dong t wth Kernels Hgh-dmensonal regresson wth Kernels: Regresson n Kernel Hlbert Space.. 7 Oct / ,,,,,,,,, N N N N N N K K K K K K K K K G K, YG Y K, ˆ

46 A dfferent wa of fndng nonlnear relatonshps: Locall lnear regresson Prevous dscusson: Regresson parameters are optmzed over the entre tranng set Mnmze E all Sngle global regresson s estmated and appled to all future Alternatve: Local regresson b Learn a regresson that s specfc to A 7 Oct /

47 Beng non-commttal: Local Regresson Estmate the regresson to be appled to an usng tranng nstances near E j neghborhood A b he resultant regresson has the form j d, neghborhood j j e Note : ths regresson s specfc to A separate regresson must be learned for ever 7 Oct /

48 Local Regresson j d, neghborhood j j e But what s d? For lnear regresson d s an nner product More generc form: Choose d as a functon of the dstance between and j If d falls off rapdl wth and j the neghbhorhood requrement can be relaed d, j all j e 7 Oct /

49 Kernel Regresson: d = K ˆ K h K h pcal Kernel functons: Gaussan, Laplacan, other denst functons Must fall off rapdl wth ncreasng dstance between and j Regresson s local to ever : Local regresson Actuall a non-parametrc MAP estmator of But frst.. MAP estmators.. 49

50 Map Estmators MAP Mamum A Posteror: Fnd a best guess for statstcall, gven nown = argma Y PY ML Mamum Lelhood: Fnd that value of for whch the statstcal best guess of would have been the observed = argma Y P Y MAP s smpler to vsualze 7 Oct /

51 MAP estmaton: Gaussan PDF Assume X and Y are jontl Gaussan Y he parameters of the F Gaussan are learned from tranng data X F0 7 Oct /8797 5

52 Learnng the parameters of the Gaussan z z N N z C z N N z z z z z C z C C XX YX C C XY YY 7 Oct /8797 5

53 Learnng the parameters of the N Gaussan N z N z z Cz z z z z N z N C z C C N N C XY N XX YX C C XY YY 7 Oct /

54 MAP estmaton: Gaussan PDF Assume X and Y are jontl Gaussan Y he parameters of the F Gaussan are learned from tranng data X F0 7 Oct /

55 MAP Estmator for Gaussan RV Assume X and Y are jontl Gaussan Level set of Gaussan he parameters of the Gaussan are learned from tranng data X 0 Now we are gven an X, but no Y What s Y? 755/

56 MAP estmator for Gaussan RV 0 7 Oct /

57 MAP estmaton: Gaussan PDF Y F X 7 Oct /

58 MAP estmaton: he Gaussan at a partcular value of X F 0 7 Oct /

59 MAP estmaton: he Gaussan at a partcular value of X Most lel value F 0 7 Oct /

60 MAP Estmaton of a Gaussan RV Y = argma P X??? 0 7 Oct /

61 MAP Estmaton of a Gaussan RV 0 7 Oct /8797 6

62 MAP Estmaton of a Gaussan RV Y = argma P X 0 7 Oct /8797 6

63 So what s ths value? Clearl a lne Equaton of Lne: ˆ C Y YX C XX Scalar verson gven; vector verson s dentcal ˆ Y C YX C XX Dervaton? Later n the program a bt Note the smlart to regresson 7 Oct /

64 hs s a multple regresson ˆ Y C YX C XX hs s the MAP estmate of = argma Y PY What about the ML estmate of argma Y P Y Note: Nether of these ma be the regresson lne! MAP estmaton of s the regresson on Y for Gaussan RVs But ths s not the MAP estmaton of the regresson parameter 7 Oct /

65 Its also a mnmum-mean-squared error estmate General prncple of MMSE estmaton: s unnown, s nown Must estmate t such that the epected squared error s mnmzed Err E[ ˆ ] Mnmze above term 7 Oct /

66 Its also a mnmum-mean-squared error estmate Mnmze error: Dfferentatng and equatng to 0: 7 Oct / ] ˆ ˆ [ ] ˆ [ E E Err ] [ ˆ ˆ ˆ ] [ ] ˆ ˆ ˆ [ E E E Err 0 ˆ ] [ ˆ ˆ. d E d Err d ] [ ˆ E he MMSE estmate s the mean of the dstrbuton

67 For the Gaussan: MAP = MMSE Most lel value s also he MEAN value Would be true of an smmetrc dstrbuton 7 Oct /

68 MMSE estmates for mture dstrbutons 68 Let P be a mture denst he MMSE estmate of s gven b Just a weghted combnaton of the MMSE estmates from the component dstrbutons, P P P d P P E, ] [ d P P, ], [ E P

69 MMSE estmates from a Gaussan mture 7 Oct / P s also a Gaussan mture Let P, be a Gaussan Mture, ; N P P P S z z, z,,,, P P P P P P P P P P P P,

70 MMSE estmates from a Gaussan mture 7 Oct / Let P s a Gaussan Mture P P P, ], ; ];[ ; [,,,,,,,, C C C C N P, ;,,,,, Q C C N P Q C C N P P, ;,,,,

71 MMSE estmates from a Gaussan mture 7 Oct / E[ ] s also a mture P s a mture Gaussan denst Q C C N P P, ;,,,, E P E ], [ ] [ C C P E ] [,,,,

72 MMSE estmates from a Gaussan mture A mture of estmates from ndvdual Gaussans 7 Oct /8797 7

jont vectors Gven speech from one speaer, fnd MMSE

73 Voce Morphng Algn tranng recordngs from both speaers Cepstral vector sequence Learn a GMM on jont vectors Gven speech from one speaer, fnd MMSE estmate of the other Snthesze from cepstra 7 Oct /

74 MMSE wth GMM: Voce ransformaton - Festvo GMM transformaton sute oda awb bdl jm slt awb bdl jm slt 7 Oct /

75 A problem wth regressons A XX - XY ML ft s senstve Error s squared Small varatons n data large varatons n weghts Outlers affect t adversel Unstable If dmenson of X >= no. of nstances XX s not nvertble 7 Oct /

76 MAP estmaton of weghts a =a X+e Assume weghts drawn from a Gaussan Pa = N0, I Ma. Lelhood estmate Mamum a posteror estmate aˆ arg ma X aˆ arg ma a log P a log P a, X X; a arg ma log P 7 Oct / a e X, a P a

77 MAP estmaton of weghts Smlar to ML estmate wth an addtonal term 7 Oct / , log arg ma, log arg ma ˆ a X a X a a A A P P P Pa = N0, I Log Pa = C log a C P log X a X a X, a a a X a X a a A C 5 0. log ' arg ma ˆ

78 MAP estmate of weghts dl a XX X I da 0 a XX I - XY Equvalent to dagonal loadng of correlaton matr Improves condton number of correlaton matr Can be nverted wth greater stablt Wll not affect the estmaton from well-condtoned data Also called honov Regularzaton Dual form: Rdge regresson MAP estmate of weghts Not to be confused wth MAP estmate of Y 7 Oct /

79 MAP estmate prors Left: Gaussan Pror on W Rght: Laplacan Pror 7 Oct /

80 MAP estmaton of weghts wth laplacan pror Assume weghts drawn from a Laplacan Pa = l - ep-l - a Mamum a posteror estmate aˆ arg ma A C' a X a X l a No closed form soluton Quadratc programmng soluton requred Non-trval 7 Oct /

81 MAP estmaton of weghts wth laplacan pror Assume weghts drawn from a Laplacan Pa = l - ep-l - a Mamum a posteror estmate aˆ arg ma A C' a X a X l Identcal to L regularzed least-squares estmaton a 7 Oct /8797 8

82 L -regularzed LSE aˆ arg ma C' No closed form soluton Quadratc programmng solutons requred Dual formulaton A a X a X l a aˆ arg ma A C ' a X a X subject to a t LASSO Least absolute shrnage and selecton operator 7 Oct /8797 8

83 LASSO Algorthms Varous conve optmzaton algorthms LARS: Least angle regresson Pathwse coordnate descent.. Matlab code avalable from web 7 Oct /

84 Regularzed least squares Image Credt: bshran Regularzaton results n selecton of suboptmal n least-squares sense soluton One of the loc outsde center honov regularzaton selects shortest soluton L regularzaton selects sparsest soluton 7 Oct /

85 LASSO and Compressve Sensng Y = X a Gven Y and X, estmate sparse W LASSO: X = eplanator varable Y = dependent varable a = weghts of regresson CS: X = measurement matr Y = measurement a = data 7 Oct /

86 An nterestng problem: Predctng War! Economsts measure a number of socal ndcators for countres weel Happness nde Hunger nde Freedom nde wtter records Queston: Wll there be a revoluton or war net wee? 7 Oct /

87 An nterestng problem: Predctng War! Issues: Dssatsfacton bulds up not an nstantaneous phenomenon Usuall War / rebellon buld up much faster Often n hours Important to predct Preparedness for securt Economc mpact 7 Oct /

88 Predctng War O O O 3 O 4 O 5 O 6 O 7 O 8 W W W W W W W W S S S S S S S S Gven w w w3 w4 w5 w6 w7 w8 Sequence of economc ndcators for each wee Sequence of unrest marers for each wee At the end of each wee we now f war happened or not that wee Predct probablt of unrest net wee hs could be a new unrest or persstence of a current one 7 Oct /

89 Predctng me Seres Need tme-seres models HMMs later n the course 7 Oct /

Machine Learning for Signal Processing Regression and Prediction

Machine Learning for Signal Processing Regression and Prediction Machne Learnng for Sgnal Processng Regresson and Predcton Class 4. 5 Oct 06 Instructor: Bhsha Raj 755/8797 A Common Problem Can ou spot the gltches? 755/8797 How to f ths problem? Gltches n audo Must be