Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30

Recap: regresso problems Iput to learg problem: trag data L {( x1, y1),,( x, y)} Istaces gve by feature vector, label x x... x Trag data matrx form: x X x 11 1m 1 1 m x x m m y y1 y y Output: model f : X Y 31

Recap: lear regresso Lear regresso: predcto s weghted sum of features. Model gve by weghts Model defed by parameter vector ( weght vector ), costat term 0 s tegrated to weght vector by addg a costat attrbute x : f f ( x) ( x ) m 0 0 m x 1 costat term attrbute weghts 0 1 x x T. 0... weght vector m x 3

Recap: rdge regresso Trag data L {( x, y ),,( x, y )}. 1 1 Approach: mmze regularzed loss fucto * arg m f ) ( 1 y x ( f ( x ), y ) ( (, ) ) 1/ Quadratc loss fucto ( f ( x ), y) ( f ( x ) y). L-Regularzer ( ). Soluto: * X T X I X T y. 1 33

Iserto: uvarate ormal dstrbuto Dstrbuto over x. Gveby desty fucto wth parameters (mea) ad (varace). Desty of ormal dstrbuto 34

Iserto: multvarate ormal dstrbuto Dstrbuto over vectors Gve by desty fucto wth parameters mea vector p( x z 1) ( x μ, Σ) k x D. covarace matrx 1 1 T 1 exp ( x μ) Σ ( x μ) Z μ D D D, Σ. Normalzer Z D/ 1/ Example D=: desty, samples from dstrbuto 35

Probablstc Lear Regresso Lear regresso as a probablstc model: p y y f ( x) T ( x, ) ( x, ). f ( x) p y y T ( x, ) ( x, ) x 36

Probablstc Lear Regresso Lear regresso as a probablstc model: p y y f ( x) T ( x, ) ( x, ). f ( x) p y y T ( x, ) ( x, ) x T * Label y geerated by lear model f * ( x ) x plus ormally dstrbuted ose: y x T mt ~ ( 0, ). * 37

Most probable model gve the data Goal: most probably model gve the data. Approach: derve a-posteror dstrbuto Lkelhood: Probablty of data, gve model * arg max p( L) pror dstrbuto over parameters p( L) p( L ) p( ) pl ( ) 38

Bayesa lear regresso: lkelhood Lkelhood of data: staces depedet 1 multvarate ormal dstrbuto wth covarace matrx I p( L ) p( y,... y, x, ) 1, x1, 1 py ( x, ) T ( y x, ) y X, I staces x depedet of f ( x ) x T X x,..., 1 x ( )T y ( y,..., 1 y )T X x... x T 1 T vector of predctos 39

Bayesa lear regresso: pror Pror dstrbuto over weght vectors. Approprate pror dstrbuto: ormal dstrbuto. p( ) ( 0, I p p ) 1 1 exp p m/ m p cotrols stregth of pror Normal dstrbuto s cojugate to tself: ormally dstrbuted pror ad ormal lkelhood result ormally dstrbuted posteror. p( ) 0 0 1 40

Bayesa lear regresso: posteror Posteror dstrbuto over models gve the data 1 p( L) p( L ) p( ) Z 1 Z Bayes rule ( y X, I) ( 0, pi) 1 (, A ) Theorem descrbg propertes of ormal dstrbutos mt ( X X I) X T 1 T p data matrx label vector y ud ose parameter A XX T p I varace of pror Posteror dstrbuto over parameter vectors s aga ormally dstrbuted wth ovel mea ad covarace matrx A 1. 41

Summary dervato of posteror Summary dervato of posteror: Approach: Bayes rule p( L) p( L p( pl ( ) Derve lkelhood, choose approprate pror dstrbuto (ormal dstrbuto). Posteror p( L): how probably s lear regresso model after havg see the data L? Computato of posteror s relatvely smple: Pror ( 0, I). p Observato of data L. 1 Resultg posteror (, A ). 4

Bayesa lear regresso: MAP model Posteror over parameter vectors s aga ormally 1 dstrbted, wth ew mea ad covarace matrx A. Most probable model gve the data: * arg max p( L) 1 arg max (, A ) wth ( X X I) X T 1 T p y ad A XX T p I 43

Example MAP soluto regresso Trag data: 0 4 x1 3, x 3, 3 Matrx otato (addg costat attrbute): x 0 1, y1 y 3 y3 4 X 1 3 0 1 4 3 1 0 1 y 3 4 44

Example MAP soluto regresso Choose Varace of pror: Nose parameter: p 1 0.5 T 1 T Compute: ( X X I) X y p 1 T 1 0 0 0 T 1 3 0 1 3 0 1 3 0 0 1 0 0 1 4 3 1 4 3 0.5 1 4 3 3 0 0 1 0 1 0 1 1 0 1 1 0 1 4 0 0 0 1 0.7975-0.5598 0.7543 1.117 45

Example MAP soluto regresso Predctos of model o the trag data: 0.7975 1 3 0 1.9408-0.5598 ˆ y X 1 4 3 3.0646 0.7543 1 0 1 3.795 1.117 46

Coecto to rdge regresso MAP parameter lear regresso: X X y T 1 T ( X I) p Recall: rdge regresso * ( ( ) arg m f ) 1 y x X X 1 I T X T y. MAP soluto of Bayesa regresso detcal to soluto of rdge regresso for. p 47

Coecto to rdge regresso Coecto betwee loss fucto ad lkelhood, regularzer ad pror. MAP model: * arg max p( L) arg max p( L ) p( ) arg max log p( L ) log p( ) arg m log p( L ) l og p( ) Negatve Log-Lkelhood Negatve Log-Pror 48

Coecto to rdge regresso Negatve log-lkelhood correspods to squared loss log p( L ) log p( y x, ) 1 1 1 1 1 log py ( x, ) lo T g ( y x, ) 1 1 log exp ( 1/ y ( ) 1 ( T y ) x cost x T ) 49

Coecto to rdge regresso Negatve log-pror correspods to regularzer log p( ) log ( 0, I) log 1 p p 1 1 exp p m/ m cost MAP soluto correspods to the mmzato of a regularzed loss fucto. 50

Vsualzato: sequetal update of posteror staces depedet Computato of posteror by sequetal updatg: multply lkelhood of dvdual staces p( L) p( ) p( y X, ) p( ) p( y, ) 1 x Multply lkelhood of o pror Let p ( ) p( ) 0, p ( ) k the posteror f oly the frst k staces L are used: y dvdually p( L) p( ) p( y x, ) p( y x, ) p( y x, )... p( y x, ) p ( ) 1 1 1 3 p ( ) p ( ) 3 p ( ) 3 51

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: p ( ) p( ) 0 p ( ) p( ) 0 Sample from p0( ) 1 0 5

Example: sequetal update posteror 1 f ( x) x 0 1 Sequetal update: Lkelhood p( y x, ) 1 1 (oe dmesoal regresso) p ( ) p ( ) p( y x, ) 1 0 1 1 P( w) 1 Istace x1, y1 y f ( x ) 1 1 1 x 0 1 1 1 Sample aus P( w) 1 0 x y 0 1 1 1 1 53

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: Lkelhood p( y x, ) 1 1 p ( ) p ( ) p( y x, ) 1 0 1 1 Posteror p1 ( ) Sample aus p1 ( ) 1 1 0 0 54

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: p( y x, ) p ( ) p ( ) p( y x, ) 1 p ( ) Sample aus p ( ) 1 1 0 0 55

Example: sequetal update posteror f ( x) Sequetal update: p( y x, ) 0 1x (oe dmesoal regresso) p ( ) p ( ) p( y x, ) 1 p ( ) Sample aus p( ) 1 1 0 0 56

Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 57