9.2 Maximum A Posteriori and Maximum Likelihood

Size: px
Start display at page:

Download "9.2 Maximum A Posteriori and Maximum Likelihood"

Transcription

1 Maxmum A Posteror and Maxmum Lkelhood In the above, p( 0 < 0.5 V) = = Z p( 0 V)d 0 (9.1.29) 1 B( + N H, + N T ) Z N H 1 (1 ) +N T 1 d (9.1.30) I 0.5 ( + N H, + N T ) (9.1.31) where I x (a, b) stheregularsed ncomplete Beta functon. For the former case of N H =2,N T = 8, under a flat pror, p( 0 < 0.5 V) =I 0.5 (N H +1,N T + 1) = (9.1.32) Snce the events are exclusve, p( V) = = Hence the expected utlty of sayng heads s more lkely s = (9.1.33) Smlarly, the utlty of sayng tals s more lkely s = (9.1.34) So we are better o takng the decson that the con s more lkely to come up tals. If we modfy the above so that we lose 100 mllon dollars f we guess tals when n fact t as heads, the expected utlty of sayng tals would be n whch case we would be better of sayng heads. In ths case, even though we are more confdent that the con s lkely to come up tals, we would pay such a penalty of makng a mstake n sayng tals, that t s fact better to say heads. 9.2 Maxmum A Posteror and Maxmum Lkelhood Summarsng the posteror Defnton 86 (Maxmum Lkelhood and Maxmum a Posteror). Maxmum Lkelhood sets parameter, gven data V, usng ML = argmax p(v ) (9.2.1) Maxmum A Posteror uses that settng that maxmses the posteror dstrbuton of the parameter, MAP = argmax where p() s the pror dstrbuton. p(v )p() (9.2.2) A crude summary of the posteror s gven by a dstrbuton wth all ts mass n a sngle most lkely state,, MAP. In makng such an approxmaton, potentally useful nformaton concernng the relablty of the parameter estmate s lost. In contrast the full posteror reflects our belefs about the range of possbltes and ther assocated credbltes. One can motvate MAP from a decson theoretc perspectve. If we assume a utlty that s zero for all but the correct, U( true, ) =I [ true = ] (9.2.3) DRAFT March 9,

2 Maxmum A Posteror and Maxmum Lkelhood a s a s a n c n s n n =1:N Fgure 9.4: (a): A model for the relatonshp between lung Cancer, Asbestos exposure and Smokng. (b): Plate notaton replcatng the observed n dataponts and placng prors over the CPTs, ted across all dataponts. c (a) c (b) then the expected utlty of s U() = X true I [ true = ] p( true V) =p( V) (9.2.4) Ths means that the maxmum utlty decson s to return that wth the hghest posteror value. When a flat pror p() = const. s used the MAP parameter assgnment s equvalent to the Maxmum Lkelhood settng ML = argmax p(v ) (9.2.5) The term Maxmum Lkelhood refers to the parameter for whch the observed data s most lkely to be generated by the model. Snce the logarthm s a strctly ncreasng functon, then for a postve functon f() opt = argmax f(), opt = argmax log f() (9.2.6) so that the MAP parameters can be found ether by optmsng the MAP objectve or, equvalently, ts logarthm, log p( V) = log p(v ) + log p() log p(v) (9.2.7) where the normalsaton constant, p(v), s not a functon of. The log lkelhood s convenent snce under the..d. assumpton t s a summaton of data terms, log p( V) = X n log p(v n ) + log p() log p(v) (9.2.8) so that quanttes such as dervatves of the log-lkelhood w.r.t. are straghtforward to compute. Example 36. In the con-tossng experment of secton(9.1.1) the ML settng s =0.2 n both N H = 2,N T = 8 and N H = 20,N T = Maxmum lkelhood and the emprcal dstrbuton Gven a dataset of dscrete varables X = x 1,...,x N we defne the emprcal dstrbuton as q(x) = 1 N I [x = x n ] (9.2.9) n=1 170 DRAFT March 9, 2010

3 Maxmum A Posteror and Maxmum Lkelhood a s c Fgure 9.5: A database contanng nformaton about the Asbestos exposure (1 sgnfes exposure), beng a Smoker (1 sgnfes the ndvdual s a smoker), and lung Cancer (1 sgnfes the ndvdual has lung Cancer). Each row contans the nformaton for an ndvdual, so that there are 7 ndvduals n the database. n the case that x s a vector of varables, I [x = x n ]= Y I [x = x n ] (9.2.10) The Kullback-Lebler dvergence between the emprcal dstrbuton q(x) and a dstrbuton p(x) s KL(q p) =hlog q(x) q(x) hlog p(x) q(x) (9.2.11) Our nterest s the functonal dependence of KL(q p) on p. Snce the entropc term hlog q(x) q(x) s ndependent of p(x) we may consder ths constant and focus on the second term alone. Hence KL(q p) = hlog p(x) q(x) + const. = 1 N log p(x n )+const. (9.2.12) n=1 We recognse P N n=1 log p(xn ) as the log lkelhood under the model p(x), assumng that the data s..d. Ths means that settng parameters by maxmum lkelhood s equvalent to settng parameters by mnmsng the Kullback-Lebler dvergence between the emprcal dstrbuton and the parametersed dstrbuton. In the case that p(x) s unconstraned, the optmal choce s to set p(x) =q(x), namely the maxmum lkelhood optmal dstrbuton corresponds to the emprcal dstrbuton Maxmum lkelhood tranng of belef networks Consder the followng model of the relatonshp between exposure to asbestos (a), beng a smoker (s) and the ncdence of lung cancer (c) p(a, s, c) =p(c a, s)p(a)p(s) (9.2.13) whch s depcted n fg(9.4a). Each varable s bnary, dom(a) ={0, 1}, dom(s) ={0, 1}, dom(c) ={0, 1}. We assume that there s no drect relatonshp between Smokng and exposure to Asbestos. Ths s the knd of assumpton that we may be able to elct from medcal experts. Furthermore, we assume that we have a lst of patent records, fg(9.5), where each row represents a patent s data. To learn the table entres p(c a, s) we can do so by countng the number of c s n state 1 for each of the 4 parental states of a and s: p(c =1 a =0,s= 0) = 0, p(c =1 a =0,s= 1) = 0.5 p(c =1 a =1,s= 0) = 0.5 p(c =1 a =1,s= 1) = 1 (9.2.14) Smlarly, based on countng, p(a = 1) = 4/7, and p(s = 1) = 4/7. These three CPTs then complete the full dstrbuton specfcaton. Settng the CPT entres n ths way by countng the relatve number of occurrences corresponds mathematcally to maxmum lkelhood learnng under the..d. assumpton, as we show below. Maxmum lkelhood corresponds to countng For a BN there s a constrant on the form of p(x), namely p(x) = KY p(x pa (x )) (9.2.15) =1 DRAFT March 9,

4 CHAPTER 10 Nave Bayes 10.1 Nave Bayes and Condtonal Independence Nave Bayes (NB) s a popular classfcaton method and ads our dscusson of condtonal ndependence, overfttng and Bayesan methods. In NB, we form a jont model of observatons x and the correspondng class label c usng a Belef network of the form p(x,c)=p(c) DY p(x c) (10.1.1) =1 whose Belef Network s depcted n fg(10.1a). Coupled wth a sutable choce for each condtonal dstrbuton p(x c), we can then use Bayes rule to form a classfer for a novel attrbute vector x : p(c x )= p(x c)p(c) p(x ) = p(x c)p(c) P c p(x c)p(c) (10.1.2) In practce t s common to consder only two classes dom(c) ={0, 1}. The theory we descrbe below s vald for any number of classes c, though our examples are restrcted to the bnary class case. Also, the attrbutes x are often taken to be bnary, as we shall do ntally below as well. The extenson to more than two attrbute states, or contnuous attrbutes s straghtforward. Example 47. EZsurvey.org consders Rado staton lsteners convenently fall nto two groups the young and old. They assume that, gven the knowledge that a customer s ether young or old, ths s su cent to determne whether or not a customer wll lke a partcular Rado staton, ndependent of ther lkes or dslkes for any other statons: p(r1,r2,r3,r4 age) =p(r1 age)p(r2 age)p(r3 age)p(r4 age) (10.1.3) where each of the varables R1,R2,R3,R4 can take the states ether lke or dslke, and the age varable can take the value ether young or old. Thus the nformaton about the age of the customer determnes the ndvdual product preferences wthout needng to know anythng else. To complete the specfcaton, gven that a customer s young, she has a 95% chance to lke Rado1, a 5% chance to lke Rado2, a 2% chance to lke Rado3 and a 20% chance to lke Rado4. Smlarly, an old lstener has a 3% chance to lke Rado1, an 82% chance to lke Rado2, a 34% chance to lke Rado3 and a 92% chance to lke Rado4. They know that 90% of the lsteners are old. 203

5 Estmaton usng Maxmum Lkelhood c n c Fgure 10.1: Nave Bayes classfer. (a): The central c x 1 x 2 x 3 x n n =1:N,c =1:D assumpton s that gven the class c, the attrbutes x are ndependent. (b): Assumng the data s..d., Maxmum Lkelhood learns the optmal parameters of the dstrbuton p(c) and the class-dependent attrbute dstrbutons p(x c). (a) (b) Gven ths model, and a new customer that lkes Rado1, and Rado3, but dslkes Rado2 and Rado4, what s the probablty that they are young? Ths s gven by p(age = young R1 = lke,r2 = dslke,r3 = lke,r4 = dslke) = p(r1 =lke,r2=dslke,r3=lke,r4=dslke age = young)p(age = young) Page p(r1 =lke,r2=dslke,r3=lke,r4=dslke age)p(age) (10.1.4) Usng the Nave Bayes structure, the numerator above s gven by p(r1 = lke age = young)p(r2 = dslke age = young) Pluggng n the values we obtan = p(r3 = lke age = young)p(r4 = dslke age = young)p(age = young) (10.1.5) The denomnator s gven by ths value plus the correspondng term evaluated under assumng the customer s old, = Whch gves p(age = young R1 = lke,r2 = dslke,r3 = lke,r4 = dslke) = = (10.1.6) 10.2 Estmaton usng Maxmum Lkelhood Learnng the table entres for NB s a straghtforward applcaton of the more general BN learnng dscussed n secton(9.2.3). For a fully observed dataset, Maxmum Lkelhood learnng of the table entres corresponds to countng the number of occurrences n the tranng data, as we show below Bnary attrbutes Consder a dataset {x n,n=1,...,n} of bnary attrbutes, x n 2 {0, 1}, =1,...,D. Each datapont x n has an assocated class label c n. The number of dataponts from class c =0sn 0 and the number from class c = 1 denoted s n 1. For each attrbute of the two classes, we need to estmate the values p(x =1 c) c. The other probablty, p(x =0 c) s gven by the normalsaton requrement, p(x =0 c) =1 p(x =1 c) =1 c. 204 DRAFT March 9, 2010

6 Estmaton usng Maxmum Lkelhood Based on the NB condtonal ndependence assumpton the probablty of observng a vector x can be compactly wrtten DY DY p(x c) = p(x c) = ( c ) x (1 c ) 1 x (10.2.1) =1 =1 In the above expresson, x s ether 0 or 1 and hence each term contrbutes a factor c f x = 1 or 1 c f x = 0. Together wth the assumpton that the tranng data s..d. generated, the log lkelhood of the attrbutes and class labels s L = X n log p(x n,c n )= X n log p(c n ) Y p(x n c n ) (10.2.2) = X,n x n log cn +(1 x n ) log(1 cn )+n 0 log p(c = 0) + n 1 log p(c = 1) (10.2.3) Ths can be wrtten more explctly n terms of the parameters as L = X,n I [x n =1,c n = 0] log 0 + I [x n =0,c n = 0] log(1 0 )+I [x n =1,c n = 1] log 1 + I [x n =0,c n = 1] log(1 1 ) + n 0 log p(c = 0) + n 1 log p(c = 1) (10.2.4) We can fnd the Maxmum Lkelhood optmal c by d erentatng w.r.t. c P c n = p(x =1 c) = I [xn =1,c n = c] P n I [xn =0,c n = c]+i [x n =1,c n = c] = number of tmes x = 1 for class c number of dataponts n class c and equatng to zero, gvng (10.2.5) (10.2.6) Smlarly, optmsng equaton (10.2.3) wth respect to p(c) gves p(c) = number of tmes class c occurs total number of data ponts (10.2.7) Classfcaton boundary We classfy a novel nput x as class 1 f p(c =1 x ) >p(c =0 x ) (10.2.8) Usng Bayes rule and wrtng the log of the above expresson, ths s equvalent to log p(x c = 1) + log p(c = 1) log p(x ) > log p(x c = 0) + log p(c = 0) log p(x ) (10.2.9) From the defnton of the classfer, ths s equvalent to (the normalsaton constant log p(x ) can be dropped from both sdes) X log p(x c = 1) + log p(c = 1) > X log p(x c = 0) + log p(c = 0) ( ) Usng the bnary encodng x 2 {0, 1}, we classfy x as class 1 f X x log 1 +(1 x ) log(1 1 ) +log p(c = 1) > X x log 0 +(1 x ) log(1 0 ) +log p(c = 0) ( ) Ths decson rule can be expressed n the form: classfy x as class 1 f P w x + a>0 for some sutable choce of weghts w and constant a, see exercse(133). The nterpretaton s that w specfes a hyperplane n the attrbute space and x s classfed as 1 f t les on the postve sde of the hyperplane. DRAFT March 9,

7 Estmaton usng Maxmum Lkelhood (a) (b) Fgure 10.2: (a): Englsh tastes over attrbutes (shortbread, lager, whskey, porrdge, f ootball). Each column represents the tastes of an ndvdual. (b): Scottsh tastes. Example 48 (Are they Scottsh?). Consder the followng vector of attrbutes: (lkes shortbread, lkes lager, drnks whskey, eats porrdge, watched England play football) ( ) A vector x = (1, 0, 1, 1, 0) T would descrbe that a person lkes shortbread, does not lke lager, drnks whskey, eats porrdge, and has not watched England play football. Together wth each vector x, there s a label nat descrbng the natonalty of the person, dom(nat) ={scottsh, englsh}, see fg(10.2). We wsh to classfy the vector x =(1, 0, 1, 1, 0) T as ether scottsh or englsh. We can use Bayes rule to calculate the probablty that x s Scottsh or Englsh: p(scottsh x) = p(x scottsh)p(scottsh) p(x) = p(x scottsh)p(scottsh) p(x scottsh)p(scottsh)+p(x englsh)p(englsh) ( ) By Maxmum Lkelhood the pror class probablty p(scottsh) s gven by the fracton of people n the database that are Scottsh, and smlarly p(englsh) s gven as the fracton of people n the database that are Englsh. Ths gves p(scottsh) = 7/13 and p(englsh) = 6/13. For p(x nat) under the Nave Bayes assumpton: p(x nat) =p(x 1 nat)p(x 2 nat)p(x 3 nat)p(x 4 nat)p(x 5 nat) ( ) so that knowng whether not someone s Scottsh, we don t need to know anythng else to calculate the probablty of ther lkes and dslkes. Based on the table n fg(10.2) and usng Maxmum Lkelhood we have: p(x 1 =1 englsh) = 1/2 p(x 1 =1 scottsh) = 1 p(x 2 =1 englsh) = 1/2 p(x 2 =1 scottsh) = 4/7 p(x 3 =1 englsh) = 1/3 p(x 3 =1 scottsh) = 3/7 p(x 4 =1 englsh) = 1/2 p(x 4 =1 scottsh) = 5/7 p(x 5 =1 englsh) = 1/2 p(x 5 =1 scottsh) = 3/7 ( ) For x =(1, 0, 1, 1, 0) T, we get p(scottsh x) = Snce ths s greater than 0.5, we would classfy ths person as beng Scottsh. = ( ) Small data counts In example(48), consder tryng to classfy the vector x =(0, 1, 1, 1, 1) T. In the tranng data, all Scottsh people say they lke shortbread. Ths means that for ths partcular x, p(x, scottsh) = 0, and therefore that we make the extremely confdent classfcaton p(scottsh x) = 0. Ths demonstrates a d culty usng Maxmum Lkelhood wth sparse data. One way to amelorate ths s to smooth the probabltes, for example by addng a certan small number to the frequency counts of each attrbute. Ths ensures that 206 DRAFT March 9, 2010

8 Estmaton usng Maxmum Lkelhood there are no zero probabltes n the model. An alternatve s to use a Bayesan approach that dscourages extreme probabltes, as dscussed n secton(10.3). Potental ptfalls wth encodng In many o -the-shelf packages mplementng Nave Bayes, bnary attrbutes are assumed. In practce, however, the case of non-bnary attrbutes often occurs. Consder the followng attrbute : age. In a survey, a person s age s marked down usng the varable a 2 1, 2, 3. a = 1 means the person s between 0 and 10 years old, a = 2 means the person s between 10 and 20 years old, a = 3 means the person s older than 20. One way to transform the varable a nto a bnary representaton would be to use three bnary varables (a 1,a 2,a 3 )wth(1, 0, 0), (0, 1, 0), (0, 0, 1) representng a =1,a =2,a =3respectvely. Ths s called 1 of M codng snce only 1 of the bnary varables s actve n encodng the M states. By constructon, means that the varables a 1,a 2,a 3 are dependent for example, f we know that a 1 = 1, we know that a 2 = 0 and a 3 = 0. Regardless of any class condtonng, these varables wll always be dependent, contrary to the assumpton of Nave Bayes. A correct approach s to use varables wth more than two states, as explaned n secton(10.2.2) Mult-state varables For a varable x wth more than two states, dom(x )={1,...,S}, the lkelhood of observng a state x = s s denoted p(x = s c) = s(c) ( ) wth P s p(x = s c) = 1. For a set of data vectors x n,n =1,...N, belongng to class c, underthe..d. assumpton, the lkelhood of the NB model generatng data from class c s NY NY DY SY CY p(x n c n )= s(c) I[xn =s]i[cn =c] n=1 n=1 =1 s=1 c=1 ( ) whch gves the class condtonal log-lkelhood L = DX SX n=1 =1 s=1 c=1 CX I [x n = s] I [c n = c] log s(c) ( ) We can optmze wth respect to the parameters usng a Lagrange multpler (one for each of the attrbutes and classes c) to ensure normalsaton: L() = DX SX n=1 =1 s=1 c=1 CX I [x n = s] I [c n = c] log s(c)+ CX DX c=1 =1 c 1! SX s(c) s=1 ( ) To fnd the optmum of ths functon we may d erentate wth respect to s(c) and equate to zero. Solvng the resultng equaton we obtan n=1 I [x n = s] I [c n = c] s(c) = c ( ) Hence, by normalsaton, s(c) =p(x = s c) = P n I [xn = s] I [c n = c] Ps 0,n 0 I x n0 = s 0 I [c n0 = c] ( ) The Maxmum Lkelhood settng for the parameter p(x = s c) equals the relatve number of tmes that attrbute s n state s for class c. DRAFT March 9,

9 Bayesan Nave Bayes n =1:N c n c x n Fgure 10.3: Bayesan Nave Bayes wth a factorsed pror on the class condtonal attrbute probabltes p(x = s c). For smplcty we assume that the class probablty c p(c) s learned wth Maxmum Lkelhood, so that no dstrbuton s placed over ths parameter.,c c =1:C =1:D Text classfcaton Consder a set of documents about poltcs, and another set about sport. Our nterest s to make a method that can automatcally classfy a new document as pertanng to ether sport or poltcs. We search through both sets of documents to fnd the 100 most commonly occurrng words. Each document s then represented by a 100 dmensonal vector representng the number of tmes that each of the words occurs n that document the so called bag of words representaton (ths s a crude representaton of the document snce t dscards word order). A Nave Bayes model specfes a dstrbuton of these number of occurrences p(x c), where x s the count of the number of tmes word appears n documents of type c. One can acheve ths usng ether a multstate representaton (as dscussed n secton(10.2.2)) or usng a contnuous x to represent the frequency of word n the document. In ths case p(x c) could be convenently modelled usng for example a Beta dstrbuton. Despte the smplcty of Nave Bayes, t can classfy documents surprsngly well[125]. Intutvely a potental justfcaton for the condtonal ndependence assumpton s that f we know a document s about poltcs, ths s a good ndcaton of the knds of other words we wll fnd n the document. Because Nave Bayes s a reasonable classfer n ths sense, and has mnmal storage and fast tranng, t has been appled to tme-storage crtcal applcatons, such as automatcally classfyng webpages nto types[289], and spam flterng[9] Bayesan Nave Bayes To predct the class c of an nput x we use p(c x, D) / p(x, D, c)p(c D) / p(x D, c)p(c D) (10.3.1) For convenence we wll smply set p(c D) usng Maxmum Lkelhood p(c D) = 1 X I [c n = c] (10.3.2) N n However, as we ve seen, settng the parameters of p(x D,c) usng Maxmum Lkelhood tranng can yeld over-confdent predctons n the case of sparse data. A Bayesan approach that addresses ths d culty s to use prors on the probabltes p(x = s c) s(c) that dscourage extreme values. The model s depcted n fg(10.3). The pror We wll use a pror on the table entres and make the global factorsaton assumpton (see secton(9.3)) p() = Y,c p( (c)) (10.3.3) 208 DRAFT March 9, 2010

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Maxent Models & Deep Learning

Maxent Models & Deep Learning Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Chapter 1. Probability

Chapter 1. Probability Chapter. Probablty Mcroscopc propertes of matter: quantum mechancs, atomc and molecular propertes Macroscopc propertes of matter: thermodynamcs, E, H, C V, C p, S, A, G How do we relate these two propertes?

More information

arxiv: v2 [stat.me] 26 Jun 2012

arxiv: v2 [stat.me] 26 Jun 2012 The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015 Lecture 2. 1/07/15-1/09/15 Unversty of Washngton Department of Chemstry Chemstry 453 Wnter Quarter 2015 We are not talkng about truth. We are talkng about somethng that seems lke truth. The truth we want

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

1 Binary Response Models

1 Binary Response Models Bnary and Ordered Multnomal Response Models Dscrete qualtatve response models deal wth dscrete dependent varables. bnary: yes/no, partcpaton/non-partcpaton lnear probablty model LPM, probt or logt models

More information

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013 1 Ph 219a/CS 219a Exercses Due: Wednesday 23 October 2013 1.1 How far apart are two quantum states? Consder two quantum states descrbed by densty operators ρ and ρ n an N-dmensonal Hlbert space, and consder

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

17 Support Vector Machines

17 Support Vector Machines 17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information