UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 12: Bayes Classifiers. Dr. Yanjun Qi. University of Virginia

Size: px
Start display at page:

Download "UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 12: Bayes Classifiers. Dr. Yanjun Qi. University of Virginia"

Transcription

1 Dr. Yanjun Q / UVA CS 6316 / f16 UVA CS 6316/4501 Fall 2016 Machne Learnng Lecture 12: Genera@ve Bayes Classfers Dr. Yanjun Q Unversty of Vrgna Department of Computer Scence 1

2 Dr. Yanjun Q / UVA CS 6316 / f16 Where are we? è Fve major secfons of ths course q Regresson (supervsed q ClassfcaFon (supervsed q Unsupervsed models q Learnng theory q Graphcal models 2

3 Dr. Yanjun Q / UVA CS 6316 / f16 Where are we? è Three major secfons for classfcafon We can dvde the large varety of classfcaton approaches nto roughly three major types 1. Dscrmnatve - drectly estmate a decson rule/boundary - e.g., support vector machne, decson tree 2. Generatve: - buld a generatve statstcal model - e.g., naïve bayes classfer, Bayesan networks 3. Instance based classfers - Use observaton drectly (no models - e.g. K nearest neghbors 3

4 Dr. Yanjun Q / UVA CS 6316 / f16 Last Lecture Recap: Probablty Revew The bg pcture : data <-> probablsfc model Sample space, Events and Event spaces Random varables Jont probablty, Margnal probablty, condfonal probablty, Chan rule, Bayes Rule, Law of total probablty, etc. Structural properfes Independence, condfonal ndependence 4

5 Dr. Yanjun Q / UVA CS 6316 / f16 Today : GeneraFve Bayes Classfers ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Bayes Classfers Gaussan dstrbufon Gaussan NBC LDA, QDA 5

6 C Dr. Yanjun Q / UVA CS 6316 / f16 A Dataset for classfcafon C Output as Dscrete Class Label C 1, C 2,, C L Data/ponts/nstances/examples/samples/records: [ rows ] Features/a0rbutes/dmensons/ndependent varables/covarates/ predctors/regressors: [ columns, except the last] Target/outcome/response/label/dependent varable: specal column to be predcted [ last column ] 6

7 Dr. Yanjun Q / UVA CS 6316 / f16 Bayes classfers Treat each feature attrbute and the class label as random varables. Gven a sample x wth attrbutes ( x 1, x 2,, x p : Goal s to predct ts class C. Specfcally, we want to fnd the value of C that maxmzes p( C x 1, x 2,, x p. Can we estmate p(c x = p( C x 1, x 2,, x p drectly from data? 7

8 Dr. Yanjun Q / UVA CS 6316 / f16 Bayes classfers Treat each feature attrbute and the class label as random varables. Gven a sample x wth attrbutes ( x 1, x 2,, x p : Goal s to predct ts class C. Specfcally, we want to fnd the value of C that maxmzes p( C x 1, x 2,, x p. Can we estmate p(c x = p( C x 1, x 2,, x p drectly from data? 8

9 Dr. Yanjun Q / UVA CS 6316 / f16 Bayes classfers Treat each feature attrbute and the class label as random varables. Gven a sample x wth attrbutes ( x 1, x 2,, x p : Goal s to predct ts class C. Specfcally, we want to fnd the value of C that maxmzes p( C x 1, x 2,, x p. Can we estmate p(c x = p( C x 1, x 2,, x p drectly from data? 9

10 Bayes classfers Dr. Yanjun Q / UVA CS 6316 / f16 è MAP classfcaton rule Establshng a probablstc model for classfcaton è MAP classfcaton rule MAP: Maxmum A Posteror Assgn x to c* f * P( C = c X = x > P( C = c X = x c c, c = c1 *,,c L Adapt from Prof. Ke Chen NB 10 sldes

11 Bayes classfers Dr. Yanjun Q / UVA CS 6316 / f16 è MAP classfcaton rule Establshng a probablstc model for classfcaton è MAP classfcaton rule MAP: Maxmum A Posteror Assgn x to c* f P(C = c * X = x > P(C = c X = x for c c *, c = c 1,,c L Adapt from Prof. Ke Chen NB 11 sldes

12 Bayes classfers Dr. Yanjun Q / UVA CS 6316 / f16 è MAP classfcaton rule Establshng a probablstc model for classfcaton è MAP classfcaton rule MAP: Maxmum A Posteror Assgn x to c* f * P( C = c X = x > P( C = c X = x c c, c = c1 *,,c L Adapt from Prof. Ke Chen NB 12 sldes

13 Bayes classfers Dr. Yanjun Q / UVA CS 6316 / f16 è MAP classfcaton rule Establshng a probablstc model for classfcaton (1 Dscrmnatve (2 Generatve 13

14 (1 Dscrmnatve Dr. Yanjun Q / UVA CS 6316 / f16 P(C X C = c 1,,c L, X = (X 1,,X p P ( c 1 x P ( c 2 x P( x c L Dscrmnatve Probablstc Classfer x1 x2 x = (x 1, x 2,, x p x p Adapt from Prof. Ke Chen NB 14 sldes

15 (2 Generatve Dr. Yanjun Q / UVA CS 6316 / f16 P(X C, C = c 1,,c L, X = (X 1,,X p P( x c1 P( x c2 P( x cl Generatve Probablstc Model for Class 1 Generatve Probablstc Model for Class 2 Generatve Probablstc Model for Class L x1 x2 x p x1 x2 x p x1 x2 x p x = (x 1, x 2,, x p Adapt from Prof. Ke Chen NB 15 sldes

16 (2 Generatve Dr. Yanjun Q / UVA CS 6316 / f16 P(X C, C = c 1,,c L, X = (X 1,,X p P( x c1 P( x c2 P( x cl Generatve Probablstc Model for Class 1 Generatve Probablstc Model for Class 2 Generatve Probablstc Model for Class L x1 x2 x p x1 x2 x p x1 x2 x p x = (x 1, x 2,, x p Adapt from Prof. Ke Chen NB 16 sldes

17 Revew : Bayes Rule Dr. Yanjun Q / UVA CS 6316 / f16 for Generatve Bayes Classfers P(C, X = P(C XP(X = P(X CP(C P(C X = P(X CP(C P(X P(C 1, P(C 2,, P(C L P(C 1 x, P(C 2 x,, P(C L x P(C X = P(X CP(C P(X 17

18 Revew : Bayes Rule Dr. Yanjun Q / UVA CS 6316 / f16 for Generatve Bayes Classfers P(C, X = P(C XP(X = P(X CP(C P(C X = P(X CP(C P(X P(C 1, P(C 2,, P(C L P(C 1 x, P(C 2 x,, P(C L x P(C X = P(X C P(C P(X 18

19 Revew : Bayes Rule Dr. Yanjun Q / UVA CS 6316 / f16 for Generatve Bayes Classfers P(C, X = P(C XP(X = P(X CP(C Posteror P(C X = P(X CP(C P(X Pror P(C 1, P(C 2,, P(C L P(C 1 x, P(C 2 x,, P(C L x P(C X = P(X CP(C P(X 19

20 Summary: Generatve classfcaton wth the MAP rule MAP classfcaton rule MAP: Maxmum A Posteror Assgn x to c* f * P( C = c X = x > P( C = c X = x c c, c = c1 * Dr. Yanjun Q / UVA CS 6316 / f16,,c L Generatve classfcaton wth the MAP rule Apply Bayes rule to convert P( Xthem = x Cnto = c posteror P( C = c probabltes P( C = c X = x = P( X = x P( X = x C = c P( C = c for = 1,2,, L Then apply the MAP rule Adapt from Prof. Ke Chen NB 20 sldes

21 Summary: Generatve classfcaton wth the MAP rule Generatve classfcaton wth the MAP rule Apply Bayes rule to convert them nto posteror probabltes Then apply the MAP rule,c L, c c c c c P C c C P = = = > = = 1 * *, ( ( x X x X L c C P c C P P c C P c C P c C P, 1,2, for ( ( ( ( ( ( = = = = = = = = = = = x X x X x X x X Dr. Yanjun Q / UVA CS 6316 / f16 Adapt from Prof. Ke Chen NB sldes 21

22 Summary: Dr. Yanjun Q / UVA CS 6316 / f16 Generatve Bayes Classfer wth the MAP rule Task: Classfy a new nstance X based on a tuple of attrbute values nto one of the classes X = X 1, X 2,, X p c MAP = argmax c j C = argmax c j C = argmax c j C P(c j x 1, x 2,, x p P(x 1, x 2,, x p c j P(c j P(x 1, x 2,, x p P(x 1, x 2,, x p c j P(c j MAP = Maxmum A Posteror Adapt From Carols prob tutoral 22

23 Dr. Yanjun Q / UVA CS 6316 / f16 C Example: Play Tenns An Example 23

24 Dr. Yanjun Q / UVA CS 6316 / f16 24

25 Dr. Yanjun Q / UVA CS 6316 / f16 C Example: Play Tenns Example 25

26 maxmum lkelhood estmates (explan later smply use the frequences n the data Dr. Yanjun Q / UVA CS 6316 / f16 26

27 Generatve Bayes Classfer: Learnng Phase Dr. Yanjun Q / UVA CS 6316 / f16 C P(C 1, P(C 2,, P(C L Outlook (3 values P(Play=Yes = 9/14 P(Play=No = 5/14 P(X 1,X 2,, X p C 1, P(X 1,X 2,, X p C 2 Temperature (3 values Humdty (2 values Wnd (2 values Play=Yes Play=No sunny hot hgh weak 0/9 1/5 sunny hot hgh strong /9 /5 sunny hot normal weak /9 /5 sunny hot normal strong /9 / *3*2*2 [conjunctons of attrbutes] * 2 [two classes]=72 parameters 27

28 Generatve Bayes Classfer: Dr. Yanjun Q / UVA CS 6316 / f16! [ ˆP( a 1 c * ˆP( a p c * ] ˆP(c * >[ ˆP( a 1 c ˆP( a p c] ˆP(c Test Phase Gven an unknown nstance X ts = ( a 1,, a p Look up tables to assgn the label c* to X ts f ˆP( a 1, a p c * ˆP(c * > ˆP( a 1, a p c ˆP(c, c c *, c = c 1,,c L Gven a new nstance, x =(Outlook=Sunny, Temperature=Cool, Humdty=Hgh, Wnd=Strong 28

29 Dr. Yanjun Q / UVA CS 6316 / f16 Today : GeneraFve Bayes Classfers ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Bayes Classfers Gaussan dstrbufon Gaussan NBC LDA, QDA 29

30 Naïve Bayes Classfer Dr. Yanjun Q / UVA CS 6316 / f16 Bayes classfcaton argmax c j C P(x 1,x 2,,x p c j P(c j Dffculty: learnng the jont probablty Naïve Bayes classfcaton Assumpton that all nput attrbutes are condtonally ndependent! 30

31 Naïve Bayes Classfer Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes classfcaton Assumpton that all nput attrbutes are condtonally ndependent! P(X 1,X 2,,X p C= P(X 1 X 2,,X p,cp(x 2,,X p C = P(X 1 CP(X 2,,X p C = P(X 1 CP(X 2 C P(X p C 31

32 Naïve Bayes Classfer Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes classfcaton Assumpton that all nput attrbutes are condtonally ndependent! P(X 1,X 2,,X p C= P(X 1 CP(X 2 C P(X p C MAP classfcaton rule: for a sample x = (x 1,x 2,,x p [P(x 1 c * P(x p c * ]P(c * > [P(x 1 c P(x p c]p(c, c c *, c = c 1,,c L 32

33 Naïve Bayes Classfer Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes classfcaton Assumpton that all nput attrbutes are condtonally ndependent! P(X 1,X 2,,X p C= P(X 1 CP(X 2 C P(X p C MAP classfcaton rule: for a sample x = (x 1,x 2,,x p [P(x 1 c * P(x p c * ]P(c * > [P(x 1 c P(x p c]p(c, c c *, c = c 1,,c L 33

34 Naïve Bayes Classfer (for dscrete nput attrbutes - tranng Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes Algorthm (for dscrete nput attrbutes Learnng Phase: Gven a tranng set S, For each target value of c (c = c 1,,c L ˆP(C = c estmate P(C = c wth examples n S; For every attrbute value x jk of each attrbute X j ( j =1,, p; k =1,,K j ˆP(X j = x jk C = c estmate P(X j = x jk C = c wth examples n S; Output: condtonal probablty tables; for X j, K j L elements 34

35 Naïve Bayes Classfer (for dscrete nput attrbutes - tranng Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes Algorthm (for dscrete nput attrbutes Learnng Phase: Gven a tranng set S, For each target value of c (c = c 1,,c L ˆP(C = c estmate P(C = c wth examples n S; For every attrbute value x jk of each attrbute X j ( j =1,, p; k =1,,K j ˆP(X j = x jk C = c estmate P(X j = x jk C = c wth examples n S; Output: condtonal probablty tables; for X j, K j L elements 35

36 Naïve Bayes Classfer (for dscrete nput attrbutes - tranng Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Bayes Algorthm (for dscrete nput attrbutes Learnng Phase: Gven a tranng set S, For each target value of c (c = c 1,,c L ˆP(C = c estmate P(C = c wth examples n S; For every attrbute value x jk of each attrbute X j ( j =1,, p; k =1,,K j ˆP(X j = x jk C = c estmate P(X j = x jk C = c wth examples n S; Output: condtonal probablty tables; for X j, K j L elements 36

37 Naïve Bayes Dr. Yanjun Q / UVA CS 6316 / f16 (for dscrete nput attrbutes - testng Naïve Bayes Algorthm (for dscrete nput attrbutes Test Phase: Gven an unknown nstance X! = ( a 1!,, a! p Look up tables to assgn the label c* to X f [ ˆP( a 1! c * ˆP( a! p c * ] ˆP(c * > [ ˆP( a 1! c ˆP( a! p c] ˆP(c, c c *, c = c 1,, c L 37

38 Dr. Yanjun Q / UVA CS 6316 / f16 C Example: Play Tenns An Example 38

39 Learnng (tranng the NBC Model maxmum lkelhood estmates (explan later smply use the frequences n the data (, ( ˆ( j j j c C N c C x X N c x P = = = = C X 1 X 2 X 5 X 3 X 4 X 6 N c C N c P j j ( ˆ( = = 39 Dr. Yanjun Q / UVA CS 6316 / f16

40 Dr. Yanjun Q / UVA CS 6316 / f16 40

41 Dr. Yanjun Q / UVA CS 6316 / f16 Estmate P(X j = x jk C = c wth examples n tranng; Learnng Phase Outlook Play=Yes Play=No Sunny 2/9 3/5 Overcast 4/9 0/5 Ran 3/9 2/5 P(X 2 C 1, P(X 2 C 2 Temperature Play=Yes Play=No Hot 2/9 2/5 Mld 4/9 2/5 Cool 3/9 1/5 Humdty Play=Yes Play=N o Hgh 3/9 4/5 Normal 6/9 1/5 P(X 4 C 1, P(X 4 C 2 Wnd Play=Yes Play=No Strong 3/9 3/5 Weak 6/9 2/ [naïve assumpton] * 2 [two classes]= 20 parameters P(Play=Yes = 9/14 P(Play=No = 5/14 P(C 1, P(C 2,, P(C L 41

42 Testng the NBC Model Dr. Yanjun Q / UVA CS 6316 / f16 Test Phase Gven a new nstance,! [ ˆP( a 1 c * ˆP( a p c * ] ˆP(c * >[ ˆP( a 1 c ˆP( a p c] ˆP(c x =(Outlook=Sunny, Temperature=Cool, Humdty=Hgh, Wnd=Strong 42

43 Testng the NBC Model Dr. Yanjun Q / UVA CS 6316 / f16 Test Phase Gven a new nstance,! [ ˆP( a 1 c * ˆP( a p c * ] ˆP(c * >[ ˆP( a 1 c ˆP( a p c] ˆP(c x =(Outlook=Sunny, Temperature=Cool, Humdty=Hgh, Wnd=Strong 43

44 Testng the NBC Model Dr. Yanjun Q / UVA CS 6316 / f16 Test Phase! [ ˆP( a 1 c * ˆP( a p c * ] ˆP(c * >[ ˆP( a 1 c ˆP( a p c] ˆP(c Gven a new nstance, x =(Outlook=Sunny, Temperature=Cool, Humdty=Hgh, Wnd=Strong Look up n condtonal-prob tables P(Outlook=Sunny Play=Yes = 2/9 P(Temperature=Cool Play=Yes = 3/9 P(Humnty=Hgh Play=Yes = 3/9 P(Wnd=Strong Play=Yes = 3/9 P(Play=Yes = 9/14 P(Outlook=Sunny Play=No = 3/5 P(Temperature=Cool Play==No = 1/5 P(Humnty=Hgh Play=No = 4/5 P(Wnd=Strong Play=No = 3/5 P(Play=No = 5/14 MAP rule P(Yes x : [P(Sunny YesP(Cool YesP(Hgh YesP(Strong Yes]P(Play=Yes = P(No x : [P(Sunny No P(Cool NoP(Hgh NoP(Strong No]P(Play=No = Gven the fact P(Yes x < P(No x, we label x to be No. 44

45 Dr. Yanjun Q / UVA CS 6316 / f16 WHY? Naïve Bayes Assumpton P(c j Can be esfmated from the frequency of classes n the tranng examples. P(x 1,x 2,,x p c j O( X 1. X 2. X 3. X p. C parameters Could only be esfmated f a very, very large number of tranng examples was avalable. Naïve Bayes CondFonal Independence AssumpFon: Assume that the probablty of observng the conjuncfon of alrbutes s equal to the product of the ndvdual probablfes P(x c j. If no naïve assumpfon 45 Adapt From Mannng textcat tutoral

46 Dr. Yanjun Q / UVA CS 6316 / f16 WHY? Naïve Bayes Assumpton Not Naïve P(c j Can be esfmated from the frequency of classes n the tranng examples. P(x 1,x 2,,x p c j O( X 1. X 2. X 3. X p. C parameters Could only be esfmated f a very, very large number of tranng examples was avalable. Naïve P(x k c j O([ X 1 + X 2 + X 3.+ X p ]. C parameters Assume that the probablty of observng the conjuncfon of alrbutes s equal to the product of the 46 ndvdual probablfes P(x c j.

47 Dr. Yanjun Q / UVA CS 6316 / f16 WHY? Naïve Bayes Assumpton Not Naïve P(c j Can be esfmated from the frequency of classes n the tranng examples. P(x 1,x 2,,x p c j O( X 1. X 2. X 3. X p. C parameters Could only be esfmated f a very, very large number of tranng examples was avalable. Naïve P(x k c j O([ X 1 + X 2 + X 3.+ X p ]. C parameters Assume that the probablty of observng the conjuncfon of alrbutes s equal to the product of the 47 ndvdual probablfes P(x c j.

48 Dr. Yanjun Q / UVA CS 6316 / f16 DETOUR: Course Schedule WED / In CLASS / 70mns Open to your notes + (prnted lecture + Four HWs we had so far Nothng else s allowed Please turn off your phone at the begnnng No Electronc Devces (other than basc calculator Fnal Exam Wll be close-note! 48

49 Learnng (tranng the NBC Model maxmum lkelhood estmates (explan later smply use the frequences n the data (, ( ˆ( j j j c C N c C x X N c x P = = = = C X 1 X 2 X 5 X 3 X 4 X 6 N c C N c P j j ( ˆ( = = 49 Dr. Yanjun Q / UVA CS 6316 / f16

50 Dr. Yanjun Q / UVA CS 6316 / f16 For nstance: C=Flu X 1 X 2 X 3 X 4 X 5 X 6 =Muscle-ache What f we have seen no tranng cases where patent had no flu and muscle aches? Zero probabltes cannot be condtoned away, no matter the other evdence! ˆP(X 6 = t C = not_flu= N(X = t,c = nf 6 N(C = nf = 0?? = argmax c ˆP(c ˆP(x c 50

51 Dr. Yanjun Q / UVA CS 6316 / f16 51

52 Smoothng to Avod Overfttng k c C N c C x X N c x P j j j + = + = = = ( 1, ( ˆ( # of values of feature X 52 To make sum_ (P(x Cj=1 Dr. Yanjun Q / UVA CS 6316 / f16 Adapt From Mannng textcat tutoral

53 Dr. Yanjun Q / UVA CS 6316 / f16 Smoothng to Avod Overfttng ˆP(x c j = N(X = x,c = c j +1 N(C = c j + k # of values of X Somewhat more subtle verson overall fracfon n data where X =x,k P( x N( X = x, C c mp ˆ c =, k j, k, k j N( C = c + m j = + extent of smoothng 53

54 Dr. Yanjun Q / UVA CS 6316 / f16 Today : GeneraFve Bayes Classfers ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Bayes Classfers Gaussan dstrbufon Gaussan NBC LDA, QDA 54

55 Dr. Yanjun Q / UVA CS 6316 / f16 Revew: ConFnuous Random Varables Probablty densty funcfon (pdf nstead of probablty mass funcfon (pmf For dscrete RV: Probablty mass funcfon (pmf: P(X = x A pdf (prob. Densty func. s any funcfon f(x that descrbes the probablty densty n terms of the nput varable x. 55

56 Dr. Yanjun Q / UVA CS 6316 / f16 Revew: Probablty of ConFnuous RV ProperFes of pdf f (x 0, x + f (x= 1 Actual probablty can be obtaned by takng the ntegral of pdf E.g. the probablty of X beng between 5 and 6 s P(5 X 6= f (xdx

57 Dr. Yanjun Q / UVA CS 6316 / f16 Revew: Mean and Varance of RV Mean (ExpectaFon: Dscrete RVs: µ = E ( X = v = ( X P( X E v v E(g(X = v g(v P(X = v ConFnuous RVs: + ( X ( E = xf x dx E(g(X = + g(x f (xdx 57 Adapt From Carols prob tutoral

58 Dr. Yanjun Q / UVA CS 6316 / f16 Revew: Mean and Varance of RV Varance: Dscrete RVs: Var(X = E((X µ 2 2 ( X ( µ P( X = v = V v v ConFnuous RVs: + V( X = ( x µ f ( x dx 2 Covarance: Cov(X,Y = E((X µ x (Y µ y = E(XY µ x µ y 58 Adapt From Carols prob tutoral

59 Dr. Yanjun Q / UVA CS 6316 / f16 Gaussan Dstrbu@on ( 2 X ~ N µσ, Mean Courtesy: hlp://research.mcrosow.com/~cmbshop/prml/ndex.htm Covarance Matrx 59

60 Dr. Yanjun Q / UVA CS 6316 / f16 Mul@varate Normal (Gaussan PDFs The only wdely used contnuous jont PDF s the multvarate normal (or Gaussan: Where * represents determnant Bvarate normal PDF:. Mean of normal PDF s at peak value. Contours of equal PDF form ellpses. X 1 X 2 The covarance matrx captures lnear dependences among the varables 60

61 Example: the Bvarate Normal dstrbuton Dr. Yanjun Q / UVA CS 6316 / f16 f ( x,x = ( 2π Σ 1 2 e 1/2! x! µ ( T Σ 1!! ( x µ wth 2 2! µ = µ 1 µ 2 and σ σ σ ρσσ Σ= = σ σ ρσ σ σ ( Σ= σσ σ = σσ ρ

62 Surface Plots of the bvarate Normal dstrbuton Dr. Yanjun Q / UVA CS 6316 / f16 62

63 Contour Plots of the bvarate Normal dstrbuton Dr. Yanjun Q / UVA CS 6316 / f16 63

64 Scatter Plots of data from the bvarate Normal dstrbuton Dr. Yanjun Q / UVA CS 6316 / f16 64

65 Trvarate Normal dstrbuton Dr. Yanjun Q / UVA CS 6316 / f16 x 3 x 2 x 1 65

66 How to Gaussan: MLE (Later Dr. Yanjun Q / UVA CS 6316 / f16 We can ft statstcal models by maxmzng the probablty / lkelhood of generatng the observed samples: L(x 1,,x n \theta = p(x 1 \theta p(x n \theta (the samples are assumed to be IID In the 1D Gaussan case, we smply set the mean and the varance to the sample mean and the sample varance: µ = 1 n n = 1 x 2 σ = 1 n n ( x µ =1 2 66

67 The p-multvarate Normal dstrbuton ( < X 1, X 2!, X p >~ N µ "#,Σ Dr. Yanjun Q / UVA CS 6316 / f16 67

68 DETOUR: ProbablsFc InterpretaFon of Lnear Regresson Let us assume that the target varable and the nputs are related by the equafon: where ε s an error term of unmodeled effects or random nose Now assume that ε follows a Gaussan N(0,σ, then we have: By IID assumpfon: T y ε θ + = x = σ θ πσ θ ( exp ; ( T y x y p x = = = = σ θ πσ θ θ n T n n y x y p L ( exp ; ( ( x Dr. Yanjun Q / UVA CS 6316 / f16

69 Dr. Yanjun Q / UVA CS 6316 / f16 DETOUR: ProbablsFc InterpretaFon of Lnear Regresson Let us assume that the target varable and the nputs are related by the equafon: where ε s an error term of unmodeled effects or random nose Now assume that ε follows a Gaussan N(0,σ, then we have: By IID assumpfon: T y ε θ + = x = σ θ πσ θ ( exp ; ( T y x y p x = = = = σ θ πσ θ θ n T n n y x y p L ( exp ; ( ( x

70 DETOUR: ProbablsFc InterpretaFon of Lnear Regresson Dr. Yanjun Q / UVA CS 6316 / f16 Let us assume that the target varable and the nputs are related by the equafon: T y = θ x + ε where ε s an error term of unmodeled effects or random nose Now assume that ε follows a Gaussan N(0,σ 2, then we have: p 1 ( y exp 2πσ ( y x; θ = 2 By IID (ndependent and denfcally dstrbuted assumpfon: n 1 L(θ= p( y x ;θ = =1 2πσ n exp T θ x 2σ n =1 2 ( y θ T x 2 2σ 2

71 n L(θ= p( y x ;θ = =1 1 2πσ n exp n =1 ( y θ T x 2 2σ 2 Dr. Yanjun Q / UVA CS 6316 / f16 We can learn \theta by maxmzng the probablty / lkelhood of generafng the observed samples: l(θ= log(l(θ= nlog 1 2πσ 1 1 n ( y σ 2 2 θ T x 2 =1 n 1 = T J ( θ ( x θ 2 = 1 y 2 71

72 n L(θ= p( y x ;θ = =1 1 2πσ n exp n =1 ( y θ T x 2 2σ 2 Dr. Yanjun Q / UVA CS 6316 / f16 We can learn \theta by maxmzng the probablty / lkelhood of generafng the observed samples: l(θ= log(l(θ= nlog 1 2πσ 1 1 n ( y σ 2 2 θ T x 2 =1 n 1 = T J ( θ ( x θ 2 = 1 y 2 72

73 n L(θ= p( y x ;θ = =1 1 2πσ n exp n =1 ( y θ T x 2 2σ 2 Dr. Yanjun Q / UVA CS 6316 / f16 We can learn \theta by maxmzng the probablty / lkelhood of generafng the observed samples: l(θ= log(l(θ= nlog 1 2πσ 1 1 n ( y σ 2 2 θ T x 2 =1 n 1 = T J ( θ ( x θ 2 = 1 y 2 73

74 Dr. Yanjun Q / UVA CS 6316 / f16 Maxmum Lkelhood EsFmaFon A general Statement Consder a sample set T=(X 1...X n whch s drawn from a probablty dstrbufon P(X \theta where \theta are parameters. If the Xs are ndependent wth probablty densty funcfon P(X \theta, the jont probablty of the whole set s ˆ θ = argmax θ P( X 1... X n θ = n P( =1 X θ ths may be maxmsed wth respect to \theta to gve the maxmum lkelhood esfmates. P(Tran M(θ = argmax P( X 1... X n θ 74 θ

75 Dr. Yanjun Q / UVA CS 6316 / f16 Maxmum Lkelhood EsFmaFon A general Statement Consder a sample set T=(X 1...X n whch s drawn from a probablty dstrbufon P(X \theta where \theta are parameters. If the Xs are ndependent wth probablty densty funcfon P(X \theta, the jont probablty of the whole set s P( X 1... X n θ = è Ths may be maxmsed wth respect to \theta to gve the maxmum lkelhood esfmates (MLE of \theta : n P( =1 θ ˆ = argmax P( X 1... X n θ θ X θ 75

76 The dea s to Dr. Yanjun Q / UVA CS 6316 / f16 ü assume a parfcular model wth unknown parameters: ü we can then defne the probablty of observng a gven event condfonal on a parfcular set of parameters. P( X θ ü We have observed a set of outcomes n the real world. ü It s then possble to choose a set of parameters whch are most lkely to have produced the observed results. ˆ θ = argmax θ Ths s maxmum lkelhood. In most cases t s both consstent and effcent. It provdes a standard to compare other esfmafon technques. log(l(θ = log(p( X θ =1 It s owen convenent to work wth the Log of the lkelhood funcfon. n P( X 1... X n θ θ 76

77 The dea s to Dr. Yanjun Q / UVA CS 6316 / f16 ü assume a parfcular model wth unknown parameters, ü we can then defne the probablty of observng a gven event condfonal on a parfcular set of parameters. P( ü We have observed a set of outcomes n the real world. ü It s then possble to choose a set of parameters whch are most lkely to have produced the observed results. ˆ θ = argmax θ Ths s maxmum lkelhood. In most cases t s both consstent and effcent. It provdes a standard to compare other esfmafon technques. log(l(θ = log(p( X θ =1 It s owen convenent to work wth the Log of the lkelhood funcfon. n P( X 1... X n θ θ X θ 77

78 The dea s to Dr. Yanjun Q / UVA CS 6316 / f16 ü assume a parfcular model wth unknown parameters, ü we can then defne the probablty of observng a gven event condfonal on a parfcular set of parameters. P( ü We have observed a set of outcomes n the real world. ü It s then possble to choose a set of parameters whch are most lkely to have produced the observed results. ˆ θ = argmax θ Ths s maxmum lkelhood. In most cases t s both consstent and effcent. It provdes a standard to compare other esfmafon technques. log(l(θ = log(p( X θ =1 It s owen convenent to work wth the Log of the lkelhood funcfon. n P( X 1... X n θ θ X θ 78

79 DETOUR: ProbablsFc InterpretaFon of Lnear Regresson Dr. Yanjun Q / UVA CS 6316 / f16 Hence the log-lkelhood s: l(θ= log(l(θ= nlog 1 2πσ 1 1 n ( y σ 2 2 θ T x 2 =1 Recognze the last term? Yes t s: n 1 T J ( θ = ( x θ 2 = 1 y 2 Thus under ndependence Gaussan resdual assumpfon, resdual square error s equvalent to MLE of θ!

80 Dr. Yanjun Q / UVA CS 6316 / f16 80

81 Dr. Yanjun Q / UVA CS 6316 / f16 Today : GeneraFve Bayes Classfers ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Bayes Classfers Gaussan dstrbufon Gaussan NBC Not-naïve Gaussan BC è LDA, QDA 81

82 Dr. Yanjun Q / UVA CS 6316 / f16 Gaussan Naïve Bayes Classfer argmax C P(C X = argmax C Naïve Bayes Classfer P(X,C = argmax C P(X C = P(X 1, X 2,, X p C P(X CP(C = P(X 1 X 2,, X p,cp(x 2,, X p C = P(X 1 CP(X 2,, X p C = P(X 1 CP(X 2 C P(X p C 2 1 ( X j µ j Pˆ( Xj C = c = exp 2 2πσ j 2σ j µ : mean(avearage of attrbute values X of examples σ j j : standarddevaton of attrbute values X j j of examples for whch C = c for whch C = c 82

83 Gaussan Naïve Bayes Classfer Contnuous-valued Input Attrbutes Dr. Yanjun Q / UVA CS 6316 / f16 Condtonal probablty modeled wth the normal dstrbuton 1 " ˆP(X j C = c = exp (X j µ j 2 % 2 2πσ $ j # 2σ ' j & µ j : mean (avearage of attrbute values X j of examples for whch C = c σ j : standard devaton of attrbute values X j of examples for whch C = c Learnng Phase: Output: normal dstrbutons and p L for X = (X 1,, X p, C = c 1,, c L P(C = c =1,, L Test Phase: for X! = ( X 1!,, X! p Calculate condtonal probabltes wth all the normal dstrbutons Apply the MAP rule to make a decson 83

84 Gaussan Naïve Bayes Classfer Contnuous-valued Input Attrbutes Dr. Yanjun Q / UVA CS 6316 / f16 Condtonal probablty modeled wth the normal dstrbuton 1 " ˆP(X j C = c = exp (X j µ j 2 % 2 2πσ $ j # 2σ ' j & µ j : mean (avearage of attrbute values X j of examples for whch C = c σ j : standard devaton of attrbute values X j of examples for whch C = c Learnng Phase: Output: normal dstrbutons and p L for X = (X 1,, X p, C = c 1,, c L P(C = c =1,, L Test Phase: for X! = ( X 1!,, X! p Calculate condtonal probabltes wth all the normal dstrbutons Apply the MAP rule to make a decson 84

85 Dr. Yanjun Q / UVA CS 6316 / f16 Naïve Gaussan means? Not Naïve P(X 1, X 2,, X p C = Naïve P(X 1, X 2,, X p C = c j = P(X 1 CP(X 2 C P(X p C = 1 $ exp (X j µ j 2 ' 2 2πσ & j % 2σ j ( Dagonal Matrx Σ_ c k = Λ _ c k Each class covarance matrx s dagonal 85

86 Dr. Yanjun Q / UVA CS 6316 / f16 Today : GeneraFve Bayes Classfers ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Bayes Classfers Gaussan dstrbufon Gaussan NBC Not-naïve Gaussan BC è LDA, QDA 86

87 Dr. Yanjun Q / UVA CS 6316 / f16 1 covarance matrx are the same across classes è LDA (Lnear Dscrmnant Analyss Each class covarance matrx s the same Class k Class l Class k Class l 87

88 Dr. Yanjun Q / UVA CS 6316 / f16 Op@mal Classfca@on argmax k P(C _ k X = argmax k P(X,C = argmax k P(X CP(C - Note 88

89 argmax P(C k X= argmax P(X,C k = argmax k k k = argmax log{p(x C k P(C k }! k Dr. Yanjun Q / UVA CS 6316 / f16 P(X C k P(C k 89

90 argmax P(C k X= argmax P(X,C k = argmax k k k = argmax log{p(x C k P(C k }! k Dr. Yanjun Q / UVA CS 6316 / f16 P(X C k P(C k 90

91 Dr. Yanjun Q / UVA CS 6316 / f16 log P(C X k P(C! l X = log P(X C k P(X C l + log P(C k P(C l 91

92 Dr. Yanjun Q / UVA CS 6316 / f16 è The Decson Boundary Between class k and l, {x : δ k (x = δ l (x}, s lnear Equals to zero Boundary ponts X : when P(c_k X == P(c_l X, the lew lnear equafon ==0, a lnear lne / plane 92

93 Dr. Yanjun Q / UVA CS 6316 / f16 Vsualzaton (three classes 93

94 Dr. Yanjun Q / UVA CS 6316 / f16 (2 If covarance matrx are not same e.g. è QDA (QuadraFc Dscrmnant Analyss 94

95 Dr. Yanjun Q / UVA CS 6316 / f16 LDA on Expanded Bass LDA wth quadrafc bass Versus QDA 95

96 Dr. Yanjun Q / UVA CS 6316 / f16 (3 Regularzed Dscrmnant Analyss 96

97 Dr. Yanjun Q / UVA CS 6316 / f16 An example: Gaussan Bayes Classfer 97

98 Dr. Yanjun Q / UVA CS 6316 / f16 Gaussan Bayes Classfer 98

99 Today Recap : GeneraFve Bayes Classfers Dr. Yanjun Q / UVA CS 6316 / f16 ü Bayes Classfer MAP classfcafon rule GeneraFve Bayes Classfer ü Naïve Bayes Classfer ü Gaussan Naïve Bayes Classfers Gaussan dstrbufon Gaussan NBC Not-naïve Gaussan BC è LDA, QDA 99

100 argmax k P(C _ k X = argmax k Generatve Bayes Classfer P(X,C = argmax k P(X CP(C Dr. Yanjun Q / UVA CS 6316 / f16 Task classfcaton Representaton Score Functon Prob. models p(x C EPE wth 0-1 loss è lkelhood P(X 1,, X p C Search/Optmzaton Many optons Models, Parameters Prob. Models Parameter Bernoull Naïve p(w = true c k = p,k Gaussan Nave Mul2nomal 1 " ˆP(X j C = c k = exp (X j µ jk 2 % 2 2πσ $ jk # 2σ ' jk & N! P(W 1 = n 1,...,W v = n v c k = n 1k!n 2k!..n vk! θ n 1k n θ 2 k n 1k 2k..θ vk vk

101 Dr. Yanjun Q / UVA CS 6316 / f16 References q Prof. Andrew Moore s revew tutoral q Prof. Ke Chen NB sldes q Prof. Carlos Guestrn rectafon sldes 101

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 16: Genera,ve vs. Dscrmna,ve / K- nearest- neghbor Classfer / LOOCV Yanjun Q / Jane,, PhD Unversty of Vrgna Department of

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$15:$LogisAc$Regression$/$ GeneraAve$vs.$DiscriminaAve$$ Dr.YanjunQ/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachneLearnng Lecture15:LogsAcRegresson/ GeneraAvevs.DscrmnaAve 10/21/15 Dr.YanjunQ UnverstyofVrgna Departmentof ComputerScence 1 Wherearewe?! FvemajorsecHonsofthscourse

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 Generatve and Dscrmnatve Models Je Tang Department o Computer Scence & Technolog Tsnghua Unverst 202 ML as Searchng Hpotheses Space ML Methodologes are ncreasngl statstcal Rule-based epert sstems beng

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Probabilistic & Unsupervised Learning. Introduction and Foundations

Probabilistic & Unsupervised Learning. Introduction and Foundations Probablstc & Unsupervsed Learnng Introducton and Foundatons Maneesh Sahan maneesh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, and MSc ML/CSML, Dept Computer Scence Unversty College London Term

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Clustering & (Ken Kreutz-Delgado) UCSD

Clustering & (Ken Kreutz-Delgado) UCSD Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont. UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 10: Classfca8on wth Support Vector Machne (cont. ) Yanjun Q / Jane Unversty of Vrgna Department of Computer Scence 9/6/14

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Classification Bayesian Classifiers

Classification Bayesian Classifiers lassfcaton Bayesan lassfers Jeff Howbert Introducton to Machne Learnng Wnter 2014 1 Bayesan classfcaton A robablstc framework for solvng classfcaton roblems. Used where class assgnment s not determnstc,.e.

More information