Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Size: px
Start display at page:

Download "Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING"

Transcription

1 MACHINE LEARNING Vasant Honavar Bonformatcs and Computatonal Bology Program Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty

2 Recall the Bayesan recpe for classfcaton The Bayesan recpe s smple, optmal, and n prncple, straghtforward to apply To use ths recpe n practce, we need to know P ω the generatve model for data for each class and Pω the pror probabltes of classes Because these probabltes are unknown, we need to estmate them from data or learn them! s typcally hgh-dmensonal Need to estmate P ω from lmted data

3 Naïve Bayes Classfer We can classfy f we know P ω How to learn P ω? One soluton: Assume that the random varables n are condtonally ndependent gven the class. Result: Naïve Bayes classfer whch performs optmally under certan assumptons A smple, practcal learnng algorthm grounded n Probablty Theory When to use Attrbutes that descrbe nstances are lkely to be condtonally ndependent gven classfcaton The data s nsuffcent to estmate all the probabltes relably f we do not assume ndependence 3

4 Naïve Bayes Classfer Successful applcatons Dagnoss Document Classfcaton Proten Functon Classfcaton Predcton of proten-proten nterfaces and many others.. 4

5 Condtonal Independence Let on a gven event space. Z P Z,..., Z,... Z n n andw be random varables are mutually ndependent gven W n Z, Z,... Zn W P Z W f Note that these represent sets of equatons, for all possble value assgnments to random varables 5

6 Implcatons of Independence Suppose we have 5 Bnary attrbutes and a bnary class label Wthout ndependence, n order to specfy the ont dstrbuton, we need to specfy a probablty for each possble assgnment of values to each varable resultng n a table of sze 6 64 Suppose the features are ndependent gven the class label we only need 5x0 entres The reducton n the number of probabltes to be estmated s even more strkng when N, the number of attrbutes s large from O N to ON 6

7 7 Iowa State Unversty Nave Bayes Classfer,...,, arg max,...,,,...,, arg max..., arg max where...,, attrbute values n termsof s descrbed..., where an nstance : dscrete valued target functon Consder a n n n n n n n n MAP n n n P x x x P x x x P P x x x P x x x P Doman x x x x f ω ω ω ω ω ω χ χ ω ω ω Ω Ω Ω Ω ω MAP s called the maxmum a posteror classfcaton

8 Nave Bayes Classfer ω MAP MAP ω NB arg max P ω ω Ω ω Ω arg max P ω Ω arg max ω Ω arg max P ω n n If the attrbutes are ndependent gven the class, we have ω P x ω P x, x... n x, x,..., x ω P ω n x n x n ω P ω 8

9 For each possble value ω Pˆ Nave Bayes Learner Ω ω Estmate P Ω ω, D For Classfy Pˆ c each possble value a k ω a new nstance n argmax P ω P x ω ω Ω Estmate of Ω, a x k of, x P a Ω ω, D,... x Estmate s a procedure for estmatng the relevant probabltes from set of tranng examples N k 9

10 Estmaton of Probabltes from Small Samples Pˆ n n p a k As n, Pˆ k ω s the pror estmate for Pˆ m s the weght gven to the pror n number of a n + mp + m whch have attrbute value a k k s the number of tranng examples of class ω tranng examples of class ω ω n n k k a k for attrbute ω Ths s effectvely the same as usng Drchlet prors as we shall see later 0

11 Sample Applcatons of Naïve Bayes Classfer Learnng datng preferences Learn whch news artcles are of nterest. Learn to classfy web pages by topc. Learn to classfy SPAM Learn to assgn protens to functonal famles based on amno acd composton Nave Bayes s among the most useful algorthms What attrbutes shall we use to represent text?

12 Learnng Datng Preferences Instances ordered 3-tuples of attrbute values correspondng to Classes Heght tall, short Har dark, blonde, red Eye blue, brown +, Tranng Data Instance Class label I t, d, l + I s, d, l + I 3 t, b, l I 4 t, r, l I 5 s, b, l I 6 t, b, w + I 7 t, d, w + I 8 s, b, w +

13 Probabltes to estmate P+ 5/8 PHeght c t s PHar c d b r P 3/8 + 3/5 /5 + 3/5 /5 0 /3 /3 0 /3 /3 PEye c + l /5 w 3/5 0 Classfy Heghtt, Harb, eyel P + 3/5/5/5 /5 P /3/3 4/9 Classfcaton? Classfy Heghtt, Harr, eyew Note the problem wth zero probabltes Soluton Use Laplacan estmates 3

14 Learnng to Classfy Text Target concept Interestng? : Documents {+,-} Learnng: Use tranng examples to estmate P +, P -, P d +, P d - Alternatve generatve models for documents: Represent each document by sequence of words In the most general case, we need a probablty for each word occurrence n each poston n the document, for each possble document length Too many probabltes to estmate! Represent each document by tuples of word counts 4

15 P d length d ω Learnng to Classfy Text P length d P ω, length d Ths would requre estmatng for each document, Vocabulary lengthd probabltes for each possble document length! To smplfy matters, assume that probablty of encounterng a specfc word n a partcular poston s ndependent of the poston, and of document length Treat each document as a bag of words! Ω 5

16 Bag of Words Representaton So we estmate one poston -ndependent class - condtonal probablty P w probabltes P Vocabulary Ω k ω for each word w ω... P nstead of The number of probabltes to be estmated drops to w the set of ω k length d k The result s a generatve model for documents that treats each document as an ordered tuple of word frequences More sophstcated models can consder dependences between adacent word postons Markov models we wll come back to these later 6

17 Learnng to Classfy Text Wth the bag of words representaton, we have P nkd! k d ω s proportonal to P wk ω nkd! k where n kd k s the number of occurences of w gnorng dependence on length of the document We can estmate P w k n document d ω from the labeled bags of words we have. k n kd 7

18 Naïve Bayes Text Classfer Gven 000 tranng documents from each group, learn to classfy new documents accordng to the newsgroup where t belongs Nave Bayes acheves 89% classfcaton accuracy comp.graphcs comp.os.ms-wndows.msc comp.sys.bm.pc.hardware comp.sys.mac.hardware comp.wndows.x alt.athesm soc.relgon.chrstan talk.relgon.msc talk.poltcs.mdeast talk.poltcs.msc talk.poltcs.guns msc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sc.space sc.crypt sc.electroncs sc.med 8

19 Naïve Bayes Text Classfer Representatve artcle from rec.sport.hockey Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogcse!uwm.edu From: John Doe Subect: Re: Ths year's bggest and worst opnon... Date: 5 Apr 93 09:53:39 GMT I can only comment on the Kngs, but the most obvous canddate for pleasant surprse s Alex Zhtnk. He came hghly touted as a defensve defenseman, but he's clearly much more than that. Great skater and hard shot though wsh he were more accurate. In fact, he pretty much allowed the Kngs to trade away that huge defensve lablty Paul Coffey. Kelly Hrudey s only the bggest dsappontment f you thought he was any good to begn wth. But, at best, he's only a medocre goaltender. A better choce would be Tomas Sandstrom, though not through any fault of hs own, but because some thugs n Toronto decded. 9

20 Sequence Classfcaton Need a generatve model for sequences Smplest alternatve sequence-length ndependent multnomal bag of letters model! More sophstcated alternatves possble for example, Markov models that capture dependences among small wndows of neghborng letters 0

21 Naïve Bayes Learner Summary Produces mnmum error classfer f attrbutes are condtonally ndependent gven the class When to use Attrbutes that descrbe nstances are lkely to be condtonally ndependent gven classfcaton There s not enough data to estmate all the probabltes relably f we do not assume ndependence Often works well even f when ndependence assumpton s volated Domgos and Pazzan, 996 Can be used teratvely Kang et al., 006

22 Estmatng probabltes from data dscrete case Maxmum lkelhood estmaton Bayesan estmaton Maxmum a posteror estmaton

23 Example: Bnomal Experment Head Tal When tossed, the thumbtack can land n one of two postons: Head or Tal We denote by the unknown probablty PH. Estmaton task Gven a sequence of toss samples x[], x[],, x[m] we want to estmate the probabltes PH and PT - 3

24 Statstcal parameter fttng Consder samples x[], x[],, x[m] such that The set of values that can take s known Each s sampled from the same dstrbuton Each s sampled ndependently of the rest..d. samples The task s to fnd a parameter Θ so that the data can be summarzed by a probablty Px[] Θ. The parameters depend on the gven famly of probablty dstrbutons: multnomal, Gaussan, Posson, etc. We wll focus frst on bnomal and then on multnomal dstrbutons The man deas generalze to other dstrbuton famles 4

25 The Lkelhood Functon How good s a partcular? It depends on how lkely t s to generate the observed data : D P D P x[ m] L The lkelhood for the sequence H,T, T, H, H s m L :D L : D

26 Lkelhood functon The lkelhood functon L : D provdes a measure of relatve preferences for varous values of the parameter gven a collecton of observatons D drawn from a dstrbuton that s parameterzed by fxed but unknown. L : D s the probablty of the observed data D consdered as a functon of. Suppose data D s 5 heads out of 8 tosses. What s the lkelhood functon assumng that the observatons were generated by a bnomal dstrbuton wth an unknown but fxed parameter?

27 Suffcent Statstcs To compute the lkelhood n the thumbtack example we only requre N H and N T the number of heads and the number of tals N H and N T are suffcent statstcs for the parameter that specfes the bnomal dstrbuton A statstc s smply a functon of the data A suffcent statstc s for a parameter s a functon that summarzes from the data D, the relevant nformaton sd needed to compute the lkelhood L :D. Ifs s a suffcent statstc for sd sd, then L :D L :D L : D N H N T 7

28 Maxmum Lkelhood Estmaton Man Idea: Learn parameters that maxmze the lkelhood functon Maxmum lkelhood estmaton s Intutvely appealng One of the most commonly used estmators n statstcs Assumes that the parameter to be estmated s fxed, but unknown 8

29 Example: MLE for Bnomal Data Applyng the MLE prncple we get Why? NH ˆ N + N H T Example: N H,N T 3, ML estmate s 3/5 0.6 L :D

30 N H NT MLE for Bnomal data L :D log L :D N log + N log The lkelhood s postve for all legtmate values of So maxmzng the lkelhood s equvalent to maxmzng ts logarthm.e. log lkelhood log L log L N + N ML H :D 0 at extrema of L :D :D T N + N H N N H N T H H + T N 0 H Note that the lkelhood s ndeed maxmzed at ML because n the neghborhood of ML, the value of the lkelhood s smaller than t s at ML T 30

31 Maxmum and curvature of lkelhood around the maxmum At the maxmum, the dervatve of the log lkelhood s zero At the maxmum, the second dervatve s negatve. The curvature of the log lkelhood s defned as I : log L D Large observed curvature I ML at ML s assocated wth a sharp peak, ntutvely ndcatng less uncertanty about the maxmum lkelhood estmate I ML s called the Fsher nformaton 3

32 Maxmum Lkelhood Estmate ML estmate can be shown to be Asymptotcally unbased Asymptotcally consstent - converges to the true value as the number of examples approaches nfnty lm N lm N Pr E { ε } ML lm E N ML True 0 Asymptotcally effcent acheves the lowest varance that any estmate can acheve for a tranng set of a certan sze satsfes the Cramer-Rao bound True ML True 3

33 Maxmum Lkelhood Estmate ML estmate can be shown to be representatonally nvarant If ML s an ML estmate of, and g s a functon of, then g ML s an ML estmate of g When the number of samples s large, the probablty dstrbuton of ML has Gaussan dstrbuton wth mean True the actual value of the parameter a consequence of the central lmt theorem a random varable whch s a sum of a large number of random varables has a Gaussan dstrbuton ML estmate s related to the sum of random varables We can use the lkelhood rato to reect the null hypothess correspondng to 0 as unsupported by data f the rato of the lkelhoods evaluated at 0 and at ML s small. The rato can be calbrated when the lkelhood functon s approxmately quadratc 33

34 Naïve Bayes Classfer We can defne the lkelhood for a Naïve Bayesan Classfer Let Θ be the class condtonal probabltes for class Let L be the correspondng lkelhood L factorzes..d. samples p L Θ : D P x[ p], K, x [ p]: Θ p p L Θ P x [ p]: Θ P x [ p]: Θ : D n Independence factorzaton Each Θ specfes a bnomal dstrbuton assocated wth class for th attrbute 34

35 Naïve Bayes Classfer Decomposton Independent Estmaton Problems If the parameters for each famly are decoupled va ndependence, then they can be estmated ndependently of each other 35

36 From Bnomal to Multnomal Suppose a random varable can take the values,,,k We want to learn the parameters,., K Suffcent statstcs: N, N,, N K - the number of tmes each outcome s observed K Lkelhood functon Nk L : D k k ML estmate ˆk N k N l l 36

37 37 Iowa State Unversty MLE estmates for Nave Bayes Classfers When we assume that P C s multnomal, we get the decomposton: For each class we get an ndependent multnomal estmaton problem The MLE s Θ Θ Θ c x c x N c x c x c x N m c x P m c m x P D L,, : : ] [ ] [ :, ˆ c x c N c x N

38 Summary of Maxmum Lkelhood estmaton Defne a lkelhood functon whch s a measure of how lkely t s that the observed data were generated from a probablty dstrbuton wth a partcular choce of parameters Select the parameters that maxmze the lkelhood In smple cases, ML estmate has a closed form soluton In other cases, ML estmaton may requre numercal optmzaton Problem wth ML estmate assgns zero probablty to unobserved values can lead to dffcultes when estmatng from small samples Queston How would Naïve Bayes classfer behave f some of the class condtonal probablty estmates are zero? 38

39 Bayesan Estmaton MLE commts to a specfc value of the unknown parameter s MLE s the same n both cases shown vs Of course, n general, one cannot summarze a functon by a sngle number! Intutvely, the confdence n the estmates should be dfferent 39

40 Bayesan Estmaton Maxmum Lkelhood approach s Frequentst at ts core Assumes there s an unknown but fxed parameter Estmates wth some confdence Predcton of probabltes usng the estmated parameter value Bayesan Approach Represents uncertanty about the unknown parameter Uses probablty to quantfy ths uncertanty: Unknown parameters as random varables Predcton follows from the rules of probablty: Expectaton over the unknown parameters 40

41 4 Iowa State Unversty Example: Bnomal Data Revsted ] [ [0,], and In ths case, d d D p D H m P D p d p D p p D p d p D p p D p D p Suppose that we choose a unform pror p for n [0,] P D s proportonal to the lkelhood L :D

42 Example: Bnomal Data Revsted NH,NT 4, MLE for P H s 4/5 0.8 Bayesan estmate s 5 P x[ M + ] H D P D d 0. 74K 7 In ths example, MLE and Bayesan predcton dffer It can be proved that If the pror s well-behaved.e. does not assgn 0 densty to any feasble parameter value Then both MLE and Bayesan estmate converge to the same value n the lmt Both almost surely converge to the underlyng dstrbuton P But the ML and Bayesan approaches behave dfferently when the number of samples s small 4

43 All relatve frequences are not equ-probable In practce we mght want to express prors that allow us to express our belefs regardng the parameter to be estmated For example, we mght want a pror that assgns a hgher probablty to parameter values that descrbe a far con than t does to an unfar con The beta dstrbuton allows us to capture such pror belefs 43

44 Gamma Functon: Beta dstrbuton Γ 0 x t x t e dt The ntegral converges f and only f x > 0. If x s an nteger that s greater than 0, t can be shown that Γ x x! Γ x + So x Γ x The beta densty functon wth parameters a, b, N a + b, where a,b are real numbers > 0, p Γ Γ N a Γ b a beta ;a,b b where 0 s : 44

45 If a,b are real numbers > a b d Beta dstrbuton 0, then a + Γ b + Γ a + b 0 + Γ a If has dstrbuton gven by beta N Let D Let N H { [],..., [ M ]} s; N T t; Then we can show that be and p p ;a,b, then E. a sequence of d samples from a bnomal dstrbuton; beta ; a, b D beta ; a + s, b + t Update of the parameter wth a beta pror based on data yelds a beta posteror 45

46 Conugate Famles The property that the posteror dstrbuton follows the same parametrc form as the pror dstrbuton s called conugacy Conugate famles are useful because: For many dstrbutons we can represent them wth hyper parameters They allow for sequental update to obtan the posteror In many cases we have closed-form soluton for predcton Beta pror s a conugate famly for the bnomal lkelhood 46

47 Bayesan predcton pror : beta Data : D posteror : ; a, b { [],... [ M ]} p predcton : P D beta ; a + N, b + N [ M + ] H D H T a + N H N + M a + N H a + b + N + N H T 47

48 48 Iowa State Unversty Drchlet Prors Recall that the lkelhood functon s A Drchlet pror wth hyperparameters α,,α K s defned as Then the posteror has the same form, wth hyperparameters α +N,,α K +N K Θ K k N k k D L : K K k k k K k k K k k... Θ α Γ N Γ P k where ; 0 ; Θ α + Θ Θ Θ K k N k K k N k K k k k k k k D P P D P α α

49 Drchlet Prors Drchlet prors enable closed form predcton based on multnomal samples: If PΘ s Drchlet wth hyperparameters α,,α K then αk P [] k k P Θ dθ α Snce the posteror s also Drchlet, we get l l P [ M + ] k D αk + Nk k P Θ D dθ α + N l l l 49

50 Intuton behnd prors The hyperparameters α,,α K can be thought of as magnary counts from our pror experence Equvalent sample sze α + +α K The larger the equvalent sample sze the more confdent we are n our pror 50

51 Effect of Prors Predcton of PH after seeng data wth N H 0.5 N T for dfferent sample szes Dfferent strength α H + α T Fxed rato α H / α T Fxed strength α H + α T Dfferent rato α H / α T

52 Effect of Prors In real data, Bayesan estmates are less senstve to nose n the data P D MLE Drchlet.5,.5 Drchlet, Drchlet5,5 Drchlet0, N Toss Result 0 N 5

53 Conugate Famles The property that the posteror dstrbuton follows the same parametrc form as the pror dstrbuton s called conugacy Drchlet pror s a conugate famly for the multnomal lkelhood Conugate famles are useful because: For many dstrbutons we can represent them wth hyperparameters They allow for sequental update wthn the same representaton In many cases we have closed-form soluton for predcton 53

54 Bayesan Estmaton P x[ M + ] x[], K, x[ M ] P x[ M + ], x[], K, x[ M ] P x[], K, x[ M ] d P x[ M + ] P x[], K, x[ M ] d where Lkelhood Pror P x[], Kx[ M ] Posteror P x[], Kx[ M ] P P x[], Kx[ M ] Probablty of data 54

55 Summary of Bayesan estmaton Treat the unknown parameters as random varables Assume a pror dstrbuton for the unknown parameters Update the dstrbuton of the parameters based on data Use Bayes rule to make predcton 55

56 Maxmum a posteror MAP estmates Reconclng ML and Bayesan approaches P Θ Θ D MAP P D Θ P Θ P D arg max arg max P Θ arg max P Θ Θ P Θ D D Θ P Θ Θ L Θ : D 56

57 Maxmum a posteror MAP estmates Reconclng ML and Bayesan approaches Θ L Θ D ΘMAP arg max P : Θ Lke n Bayesan estmaton, we treat the unknown parameters as random varables But we estmate a sngle value for the parameter the maxmum a posteror estmate that corresponds to the most probable value of the parameter gven the data for a gven choce of the pror 57

58 Back to Naïve Bayes Classfer P ˆ 0 ˆ ˆ a ω P ω P l al ω 0 k k l If one of the attrbute values has estmated class condtonal probablty of 0, t domnates all other attrbute values When we have few examples, ths s more lkely Soluton use prors e.g., assume each value to be equally lkely unless data ndcates otherwse 58

59 Decson Tree Classfers Decson tree Representaton for modelng dependences among nput varables usng Elements of nformaton theory How to learn decson trees from data Over-fttng and how to mnmze t How to deal wth mssng values n the data Learnng decson trees from dstrbuted data Learnng decson trees at multple levels of abstracton 59

60 Decson tree representaton In the smplest case, each nternal node tests on an attrbute each branch corresponds to an attrbute value each leaf node corresponds to a class label In general, each nternal node corresponds to a test on nput nstances wth mutually exclusve and exhaustve outcomes tests may be unvarate or multvarate each branch corresponds to an outcome of a test each leaf node corresponds to a class label 60

61 Decson Tree Representaton Attrbutes E x a m p l e s 3 4 x 0 0 y 0 0 Class c A B A B, 0,0, ca x 0 00 cb x 0 x 0 0 ca, 0, 0, 00 cb y 0 00 ca cb Data set Tree Tree Should we choose Tree or Tree? Why? 6

62 Decson tree representaton Any Boolean functon can be represented by a decson tree Any functon f : A A A n C A where each s the doman of the th attrbute and C s a dscrete set of values class labels can be represented by a decson tree In general, the nputs need not be dscrete valued 6

63 Learnng Decson Tree Classfers Decson trees are especally well suted for representng smple rules for classfyng nstances that are descrbed by dscrete attrbute values Decson tree learnng algorthms Implement Ockham s razor as a preference bas smpler decson trees are preferred over more complex trees Are relatvely effcent lnear n the sze of the decson tree and the sze of the data set Produce comprehensble results Are often among the frst to be tred on a new data set 63

64 Learnng Decson Tree Classfers Ockham s razor recommends that we pck the smplest decson tree that s consstent wth the tranng set Smplest tree s one that takes the fewest bts to encode why? nformaton theory There are far too many trees that are consstent wth a tranng set Searchng for the smplest tree that s consstent wth the tranng set s not typcally computatonally feasble Soluton Use a greedy algorthm not guaranteed to fnd the smplest tree but works well n practce Or restrct the space of hypothess to a subset of smple trees 64

65 Informaton Some ntutons Informaton reduces uncertanty Informaton s relatve to what you already know Informaton content of a message s related to how surprsng the message s Informaton s related Informaton depends on context 65

66 Dgresson: Informaton and Uncertanty Message Sender Recever You are stuck nsde. You send me out to report back to you on what the weather s lke. I do not le, so you trust me. You and I are both generally famlar wth the weather n Iowa On a July afternoon n Iowa, I walk nto the room and tell you t s hot outsde On a January afternoon n Iowa, I walk nto the room and tell you t s hot outsde 66

67 Dgresson: Informaton and Uncertanty Sender Message Recever How much nformaton does a message contan? If my message to you descrbes a scenaro that you expect wth certanty, the nformaton content of the message for you s zero The more surprsng the message to the recever, the greater the amount of nformaton conveyed by the message What does t mean for a message to be surprsng? 67

68 Dgresson: Informaton and Uncertanty Suppose I have a con wth heads on both sdes and you know that I have a con wth heads on both sdes. I toss the con, and wthout showng you the outcome, tell you that t came up heads. How much nformaton dd I gve you? Suppose I have a far con and you know that I have a far con. I toss the con, and wthout showng you the outcome, tell you that t came up heads. How much nformaton dd I gve you? 68

69 Informaton Wthout loss of generalty, assume that messages are bnary made of 0s and s. Conveyng the outcome of a far con toss requres bt of nformaton need to dentfy one out of two equally lkely outcomes Conveyng the outcome one of an experment wth 8 equally lkely outcomes requres 3 bts.. Conveyng an outcome of that s certan takes 0 bts In general, f an outcome has a probablty p, the nformaton content of the correspondng message s p log p I 0 0 I 69

70 Informaton s Subectve Suppose there are 3 agents Adran, Oksana, Jun, n a world where a dce has been tossed. Adran observes the outcome s a 6 and whspers to Oksana that the outcome s even but Jun knows nothng about the outcome. Probablty assgned by Oksana to the event 6 s a subectve measure of Oksana s belef about the state of the world. Informaton ganed by Adran by lookng at the outcome of the dce log 6 bts. Informaton conveyed by Adran to Oksana log 6 log 3 bts Informaton conveyed by Adran to Jun 0 bts 70

71 Informaton and Shannon Entropy Suppose we have a message that conveys the result of a random experment wth m possble dscrete outcomes, wth probabltes p, p,... p m The expected nformaton content of such a message s called the entropy of the probablty dstrbuton H p, p,.. p I I m p p log p p 0 otherwse m p I provded p 0 7

72 Let H H Shannon s entropy as a measure of nformaton n n P p log p log p H, r P p... p p p p 0, p log p I 0I 0 0 bt n be The entropy of the dstrbuton P s gven by r log a dscrete probablty log dstrbuton log bt 7

73 Propertes of Shannon s entropy r P H r P 0 If there are N possble outcomes, r If p H P, log N N p H r If such that P, 0 v H P s a contnuous functon of H r P r P log N 73

74 Shannon s entropy r as a measure r of nformaton P, H P For any dstrbuton s the optmal number of bnary questons requred on average to determne an outcome drawn from P. We can extend these deas to talk about how much nformaton s conveyed by the observaton of the outcome of one experment about the possble outcomes of another mutual nformaton We can also quantfy the dfference between two probablty dstrbutons Kullback-Lebler dvergence or relatve entropy 74

75 Codng Theory Perspectve Suppose you and I both know the dstrbuton P r I choose an outcome accordng to P r Suppose I want to send you a message about the outcome You and I could agree n advance on the questons I can smply send you the answers Optmal message length on average s H P r Ths generalzes to nosy communcaton 75

76 Entropy of random varables and sets of random varables For a H If random varable P log P s a set of H n a log P a random varables, P log P x P takng values a... a n, 76

77 For Jont Entropy and Condtonal Entropy random varables and Y, the ont entropy H, Y Condtonal entropy of H H Y, Y, Y Y P, Y log gven Y P, Y log P, Y P Y P Y H Y a Y a P, Y alog P Y a 77

78 Jont Entropy and Condtonal Entropy Some Useful results : H H H, Y H + H Y H Y H Y When do we have equalty? Chan rule for Entropy H, Y + H Y Y + H Y 78

79 Example of entropy calculatons P H; Y H 0.. P H; Y T 0.4 P T; Y H 0.3. P T; Y T 0. H,Y-0.log P H 0.6. H 0.97 PY H 0.5. HY.0 PY H H 0./ PY T H PY H T 0.3/ PY T T0./ HY

80 80 Iowa State Unversty Mutual Informaton, log,, probablty dstrbutons, In terms of,,, Or by usng chan rule,, and nformaton between mutual the average, and random varable For a, b Y P a P b Y a P b Y a P Y I Y H Y H Y I Y H H Y I Y H Y H Y H H Y H Y H Y H H Y I Y Y Y Queston: When s I,Y0?

81 Relatve Entropy Let P and Q be two dstrbutons over random varable The relatve entropy Kullback - Lebler dstance s a measure of "dstance" from P to Q. P D P Q P log Q Note D P Q D Q P D P Q 0 D P P 0. 8

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

EGR 544 Communication Theory

EGR 544 Communication Theory EGR 544 Communcaton Theory. Informaton Sources Z. Alyazcoglu Electrcal and Computer Engneerng Department Cal Poly Pomona Introducton Informaton Source x n Informaton sources Analog sources Dscrete sources

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEARNING Vasant Honavar Bonformatcs and Computatonal Bology Program Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding Chapter 7 Channel Capacty and Codng Contents 7. Channel models and channel capacty 7.. Channel models Bnary symmetrc channel Dscrete memoryless channels Dscrete-nput, contnuous-output channel Waveform

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected. ANSWERS CHAPTER 9 THINK IT OVER thnk t over TIO 9.: χ 2 k = ( f e ) = 0 e Breakng the equaton down: the test statstc for the ch-squared dstrbuton s equal to the sum over all categores of the expected frequency

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding Wreless Informaton Transmsson System Lab. Chapter 7 Channel Capacty and Codng Insttute of Communcatons Engneerng atonal Sun Yat-sen Unversty Contents 7. Channel models and channel capacty 7.. Channel models

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal

More information

Quantifying Uncertainty

Quantifying Uncertainty Partcle Flters Quantfyng Uncertanty Sa Ravela M. I. T Last Updated: Sprng 2013 1 Quantfyng Uncertanty Partcle Flters Partcle Flters Appled to Sequental flterng problems Can also be appled to smoothng problems

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

CHAPTER 3: BAYESIAN DECISION THEORY

CHAPTER 3: BAYESIAN DECISION THEORY HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University PHYS 45 Sprng semester 7 Lecture : Dealng wth Expermental Uncertantes Ron Refenberger Brck anotechnology Center Purdue Unversty Lecture Introductory Comments Expermental errors (really expermental uncertantes)

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information