9.2 Maximum A Posteriori and Maximum Likelihood
|
|
- Tyler Pope
- 6 years ago
- Views:
Transcription
1 Maxmum A Posteror and Maxmum Lkelhood In the above, p( 0 < 0.5 V) = = Z p( 0 V)d 0 (9.1.29) 1 B( + N H, + N T ) Z N H 1 (1 ) +N T 1 d (9.1.30) I 0.5 ( + N H, + N T ) (9.1.31) where I x (a, b) stheregularsed ncomplete Beta functon. For the former case of N H =2,N T = 8, under a flat pror, p( 0 < 0.5 V) =I 0.5 (N H +1,N T + 1) = (9.1.32) Snce the events are exclusve, p( V) = = Hence the expected utlty of sayng heads s more lkely s = (9.1.33) Smlarly, the utlty of sayng tals s more lkely s = (9.1.34) So we are better o takng the decson that the con s more lkely to come up tals. If we modfy the above so that we lose 100 mllon dollars f we guess tals when n fact t as heads, the expected utlty of sayng tals would be n whch case we would be better of sayng heads. In ths case, even though we are more confdent that the con s lkely to come up tals, we would pay such a penalty of makng a mstake n sayng tals, that t s fact better to say heads. 9.2 Maxmum A Posteror and Maxmum Lkelhood Summarsng the posteror Defnton 86 (Maxmum Lkelhood and Maxmum a Posteror). Maxmum Lkelhood sets parameter, gven data V, usng ML = argmax p(v ) (9.2.1) Maxmum A Posteror uses that settng that maxmses the posteror dstrbuton of the parameter, MAP = argmax where p() s the pror dstrbuton. p(v )p() (9.2.2) A crude summary of the posteror s gven by a dstrbuton wth all ts mass n a sngle most lkely state,, MAP. In makng such an approxmaton, potentally useful nformaton concernng the relablty of the parameter estmate s lost. In contrast the full posteror reflects our belefs about the range of possbltes and ther assocated credbltes. One can motvate MAP from a decson theoretc perspectve. If we assume a utlty that s zero for all but the correct, U( true, ) =I [ true = ] (9.2.3) DRAFT March 9,
2 Maxmum A Posteror and Maxmum Lkelhood a s a s a n c n s n n =1:N Fgure 9.4: (a): A model for the relatonshp between lung Cancer, Asbestos exposure and Smokng. (b): Plate notaton replcatng the observed n dataponts and placng prors over the CPTs, ted across all dataponts. c (a) c (b) then the expected utlty of s U() = X true I [ true = ] p( true V) =p( V) (9.2.4) Ths means that the maxmum utlty decson s to return that wth the hghest posteror value. When a flat pror p() = const. s used the MAP parameter assgnment s equvalent to the Maxmum Lkelhood settng ML = argmax p(v ) (9.2.5) The term Maxmum Lkelhood refers to the parameter for whch the observed data s most lkely to be generated by the model. Snce the logarthm s a strctly ncreasng functon, then for a postve functon f() opt = argmax f(), opt = argmax log f() (9.2.6) so that the MAP parameters can be found ether by optmsng the MAP objectve or, equvalently, ts logarthm, log p( V) = log p(v ) + log p() log p(v) (9.2.7) where the normalsaton constant, p(v), s not a functon of. The log lkelhood s convenent snce under the..d. assumpton t s a summaton of data terms, log p( V) = X n log p(v n ) + log p() log p(v) (9.2.8) so that quanttes such as dervatves of the log-lkelhood w.r.t. are straghtforward to compute. Example 36. In the con-tossng experment of secton(9.1.1) the ML settng s =0.2 n both N H = 2,N T = 8 and N H = 20,N T = Maxmum lkelhood and the emprcal dstrbuton Gven a dataset of dscrete varables X = x 1,...,x N we defne the emprcal dstrbuton as q(x) = 1 N I [x = x n ] (9.2.9) n=1 170 DRAFT March 9, 2010
3 Maxmum A Posteror and Maxmum Lkelhood a s c Fgure 9.5: A database contanng nformaton about the Asbestos exposure (1 sgnfes exposure), beng a Smoker (1 sgnfes the ndvdual s a smoker), and lung Cancer (1 sgnfes the ndvdual has lung Cancer). Each row contans the nformaton for an ndvdual, so that there are 7 ndvduals n the database. n the case that x s a vector of varables, I [x = x n ]= Y I [x = x n ] (9.2.10) The Kullback-Lebler dvergence between the emprcal dstrbuton q(x) and a dstrbuton p(x) s KL(q p) =hlog q(x) q(x) hlog p(x) q(x) (9.2.11) Our nterest s the functonal dependence of KL(q p) on p. Snce the entropc term hlog q(x) q(x) s ndependent of p(x) we may consder ths constant and focus on the second term alone. Hence KL(q p) = hlog p(x) q(x) + const. = 1 N log p(x n )+const. (9.2.12) n=1 We recognse P N n=1 log p(xn ) as the log lkelhood under the model p(x), assumng that the data s..d. Ths means that settng parameters by maxmum lkelhood s equvalent to settng parameters by mnmsng the Kullback-Lebler dvergence between the emprcal dstrbuton and the parametersed dstrbuton. In the case that p(x) s unconstraned, the optmal choce s to set p(x) =q(x), namely the maxmum lkelhood optmal dstrbuton corresponds to the emprcal dstrbuton Maxmum lkelhood tranng of belef networks Consder the followng model of the relatonshp between exposure to asbestos (a), beng a smoker (s) and the ncdence of lung cancer (c) p(a, s, c) =p(c a, s)p(a)p(s) (9.2.13) whch s depcted n fg(9.4a). Each varable s bnary, dom(a) ={0, 1}, dom(s) ={0, 1}, dom(c) ={0, 1}. We assume that there s no drect relatonshp between Smokng and exposure to Asbestos. Ths s the knd of assumpton that we may be able to elct from medcal experts. Furthermore, we assume that we have a lst of patent records, fg(9.5), where each row represents a patent s data. To learn the table entres p(c a, s) we can do so by countng the number of c s n state 1 for each of the 4 parental states of a and s: p(c =1 a =0,s= 0) = 0, p(c =1 a =0,s= 1) = 0.5 p(c =1 a =1,s= 0) = 0.5 p(c =1 a =1,s= 1) = 1 (9.2.14) Smlarly, based on countng, p(a = 1) = 4/7, and p(s = 1) = 4/7. These three CPTs then complete the full dstrbuton specfcaton. Settng the CPT entres n ths way by countng the relatve number of occurrences corresponds mathematcally to maxmum lkelhood learnng under the..d. assumpton, as we show below. Maxmum lkelhood corresponds to countng For a BN there s a constrant on the form of p(x), namely p(x) = KY p(x pa (x )) (9.2.15) =1 DRAFT March 9,
4 CHAPTER 10 Nave Bayes 10.1 Nave Bayes and Condtonal Independence Nave Bayes (NB) s a popular classfcaton method and ads our dscusson of condtonal ndependence, overfttng and Bayesan methods. In NB, we form a jont model of observatons x and the correspondng class label c usng a Belef network of the form p(x,c)=p(c) DY p(x c) (10.1.1) =1 whose Belef Network s depcted n fg(10.1a). Coupled wth a sutable choce for each condtonal dstrbuton p(x c), we can then use Bayes rule to form a classfer for a novel attrbute vector x : p(c x )= p(x c)p(c) p(x ) = p(x c)p(c) P c p(x c)p(c) (10.1.2) In practce t s common to consder only two classes dom(c) ={0, 1}. The theory we descrbe below s vald for any number of classes c, though our examples are restrcted to the bnary class case. Also, the attrbutes x are often taken to be bnary, as we shall do ntally below as well. The extenson to more than two attrbute states, or contnuous attrbutes s straghtforward. Example 47. EZsurvey.org consders Rado staton lsteners convenently fall nto two groups the young and old. They assume that, gven the knowledge that a customer s ether young or old, ths s su cent to determne whether or not a customer wll lke a partcular Rado staton, ndependent of ther lkes or dslkes for any other statons: p(r1,r2,r3,r4 age) =p(r1 age)p(r2 age)p(r3 age)p(r4 age) (10.1.3) where each of the varables R1,R2,R3,R4 can take the states ether lke or dslke, and the age varable can take the value ether young or old. Thus the nformaton about the age of the customer determnes the ndvdual product preferences wthout needng to know anythng else. To complete the specfcaton, gven that a customer s young, she has a 95% chance to lke Rado1, a 5% chance to lke Rado2, a 2% chance to lke Rado3 and a 20% chance to lke Rado4. Smlarly, an old lstener has a 3% chance to lke Rado1, an 82% chance to lke Rado2, a 34% chance to lke Rado3 and a 92% chance to lke Rado4. They know that 90% of the lsteners are old. 203
5 Estmaton usng Maxmum Lkelhood c n c Fgure 10.1: Nave Bayes classfer. (a): The central c x 1 x 2 x 3 x n n =1:N,c =1:D assumpton s that gven the class c, the attrbutes x are ndependent. (b): Assumng the data s..d., Maxmum Lkelhood learns the optmal parameters of the dstrbuton p(c) and the class-dependent attrbute dstrbutons p(x c). (a) (b) Gven ths model, and a new customer that lkes Rado1, and Rado3, but dslkes Rado2 and Rado4, what s the probablty that they are young? Ths s gven by p(age = young R1 = lke,r2 = dslke,r3 = lke,r4 = dslke) = p(r1 =lke,r2=dslke,r3=lke,r4=dslke age = young)p(age = young) Page p(r1 =lke,r2=dslke,r3=lke,r4=dslke age)p(age) (10.1.4) Usng the Nave Bayes structure, the numerator above s gven by p(r1 = lke age = young)p(r2 = dslke age = young) Pluggng n the values we obtan = p(r3 = lke age = young)p(r4 = dslke age = young)p(age = young) (10.1.5) The denomnator s gven by ths value plus the correspondng term evaluated under assumng the customer s old, = Whch gves p(age = young R1 = lke,r2 = dslke,r3 = lke,r4 = dslke) = = (10.1.6) 10.2 Estmaton usng Maxmum Lkelhood Learnng the table entres for NB s a straghtforward applcaton of the more general BN learnng dscussed n secton(9.2.3). For a fully observed dataset, Maxmum Lkelhood learnng of the table entres corresponds to countng the number of occurrences n the tranng data, as we show below Bnary attrbutes Consder a dataset {x n,n=1,...,n} of bnary attrbutes, x n 2 {0, 1}, =1,...,D. Each datapont x n has an assocated class label c n. The number of dataponts from class c =0sn 0 and the number from class c = 1 denoted s n 1. For each attrbute of the two classes, we need to estmate the values p(x =1 c) c. The other probablty, p(x =0 c) s gven by the normalsaton requrement, p(x =0 c) =1 p(x =1 c) =1 c. 204 DRAFT March 9, 2010
6 Estmaton usng Maxmum Lkelhood Based on the NB condtonal ndependence assumpton the probablty of observng a vector x can be compactly wrtten DY DY p(x c) = p(x c) = ( c ) x (1 c ) 1 x (10.2.1) =1 =1 In the above expresson, x s ether 0 or 1 and hence each term contrbutes a factor c f x = 1 or 1 c f x = 0. Together wth the assumpton that the tranng data s..d. generated, the log lkelhood of the attrbutes and class labels s L = X n log p(x n,c n )= X n log p(c n ) Y p(x n c n ) (10.2.2) = X,n x n log cn +(1 x n ) log(1 cn )+n 0 log p(c = 0) + n 1 log p(c = 1) (10.2.3) Ths can be wrtten more explctly n terms of the parameters as L = X,n I [x n =1,c n = 0] log 0 + I [x n =0,c n = 0] log(1 0 )+I [x n =1,c n = 1] log 1 + I [x n =0,c n = 1] log(1 1 ) + n 0 log p(c = 0) + n 1 log p(c = 1) (10.2.4) We can fnd the Maxmum Lkelhood optmal c by d erentatng w.r.t. c P c n = p(x =1 c) = I [xn =1,c n = c] P n I [xn =0,c n = c]+i [x n =1,c n = c] = number of tmes x = 1 for class c number of dataponts n class c and equatng to zero, gvng (10.2.5) (10.2.6) Smlarly, optmsng equaton (10.2.3) wth respect to p(c) gves p(c) = number of tmes class c occurs total number of data ponts (10.2.7) Classfcaton boundary We classfy a novel nput x as class 1 f p(c =1 x ) >p(c =0 x ) (10.2.8) Usng Bayes rule and wrtng the log of the above expresson, ths s equvalent to log p(x c = 1) + log p(c = 1) log p(x ) > log p(x c = 0) + log p(c = 0) log p(x ) (10.2.9) From the defnton of the classfer, ths s equvalent to (the normalsaton constant log p(x ) can be dropped from both sdes) X log p(x c = 1) + log p(c = 1) > X log p(x c = 0) + log p(c = 0) ( ) Usng the bnary encodng x 2 {0, 1}, we classfy x as class 1 f X x log 1 +(1 x ) log(1 1 ) +log p(c = 1) > X x log 0 +(1 x ) log(1 0 ) +log p(c = 0) ( ) Ths decson rule can be expressed n the form: classfy x as class 1 f P w x + a>0 for some sutable choce of weghts w and constant a, see exercse(133). The nterpretaton s that w specfes a hyperplane n the attrbute space and x s classfed as 1 f t les on the postve sde of the hyperplane. DRAFT March 9,
7 Estmaton usng Maxmum Lkelhood (a) (b) Fgure 10.2: (a): Englsh tastes over attrbutes (shortbread, lager, whskey, porrdge, f ootball). Each column represents the tastes of an ndvdual. (b): Scottsh tastes. Example 48 (Are they Scottsh?). Consder the followng vector of attrbutes: (lkes shortbread, lkes lager, drnks whskey, eats porrdge, watched England play football) ( ) A vector x = (1, 0, 1, 1, 0) T would descrbe that a person lkes shortbread, does not lke lager, drnks whskey, eats porrdge, and has not watched England play football. Together wth each vector x, there s a label nat descrbng the natonalty of the person, dom(nat) ={scottsh, englsh}, see fg(10.2). We wsh to classfy the vector x =(1, 0, 1, 1, 0) T as ether scottsh or englsh. We can use Bayes rule to calculate the probablty that x s Scottsh or Englsh: p(scottsh x) = p(x scottsh)p(scottsh) p(x) = p(x scottsh)p(scottsh) p(x scottsh)p(scottsh)+p(x englsh)p(englsh) ( ) By Maxmum Lkelhood the pror class probablty p(scottsh) s gven by the fracton of people n the database that are Scottsh, and smlarly p(englsh) s gven as the fracton of people n the database that are Englsh. Ths gves p(scottsh) = 7/13 and p(englsh) = 6/13. For p(x nat) under the Nave Bayes assumpton: p(x nat) =p(x 1 nat)p(x 2 nat)p(x 3 nat)p(x 4 nat)p(x 5 nat) ( ) so that knowng whether not someone s Scottsh, we don t need to know anythng else to calculate the probablty of ther lkes and dslkes. Based on the table n fg(10.2) and usng Maxmum Lkelhood we have: p(x 1 =1 englsh) = 1/2 p(x 1 =1 scottsh) = 1 p(x 2 =1 englsh) = 1/2 p(x 2 =1 scottsh) = 4/7 p(x 3 =1 englsh) = 1/3 p(x 3 =1 scottsh) = 3/7 p(x 4 =1 englsh) = 1/2 p(x 4 =1 scottsh) = 5/7 p(x 5 =1 englsh) = 1/2 p(x 5 =1 scottsh) = 3/7 ( ) For x =(1, 0, 1, 1, 0) T, we get p(scottsh x) = Snce ths s greater than 0.5, we would classfy ths person as beng Scottsh. = ( ) Small data counts In example(48), consder tryng to classfy the vector x =(0, 1, 1, 1, 1) T. In the tranng data, all Scottsh people say they lke shortbread. Ths means that for ths partcular x, p(x, scottsh) = 0, and therefore that we make the extremely confdent classfcaton p(scottsh x) = 0. Ths demonstrates a d culty usng Maxmum Lkelhood wth sparse data. One way to amelorate ths s to smooth the probabltes, for example by addng a certan small number to the frequency counts of each attrbute. Ths ensures that 206 DRAFT March 9, 2010
8 Estmaton usng Maxmum Lkelhood there are no zero probabltes n the model. An alternatve s to use a Bayesan approach that dscourages extreme probabltes, as dscussed n secton(10.3). Potental ptfalls wth encodng In many o -the-shelf packages mplementng Nave Bayes, bnary attrbutes are assumed. In practce, however, the case of non-bnary attrbutes often occurs. Consder the followng attrbute : age. In a survey, a person s age s marked down usng the varable a 2 1, 2, 3. a = 1 means the person s between 0 and 10 years old, a = 2 means the person s between 10 and 20 years old, a = 3 means the person s older than 20. One way to transform the varable a nto a bnary representaton would be to use three bnary varables (a 1,a 2,a 3 )wth(1, 0, 0), (0, 1, 0), (0, 0, 1) representng a =1,a =2,a =3respectvely. Ths s called 1 of M codng snce only 1 of the bnary varables s actve n encodng the M states. By constructon, means that the varables a 1,a 2,a 3 are dependent for example, f we know that a 1 = 1, we know that a 2 = 0 and a 3 = 0. Regardless of any class condtonng, these varables wll always be dependent, contrary to the assumpton of Nave Bayes. A correct approach s to use varables wth more than two states, as explaned n secton(10.2.2) Mult-state varables For a varable x wth more than two states, dom(x )={1,...,S}, the lkelhood of observng a state x = s s denoted p(x = s c) = s(c) ( ) wth P s p(x = s c) = 1. For a set of data vectors x n,n =1,...N, belongng to class c, underthe..d. assumpton, the lkelhood of the NB model generatng data from class c s NY NY DY SY CY p(x n c n )= s(c) I[xn =s]i[cn =c] n=1 n=1 =1 s=1 c=1 ( ) whch gves the class condtonal log-lkelhood L = DX SX n=1 =1 s=1 c=1 CX I [x n = s] I [c n = c] log s(c) ( ) We can optmze wth respect to the parameters usng a Lagrange multpler (one for each of the attrbutes and classes c) to ensure normalsaton: L() = DX SX n=1 =1 s=1 c=1 CX I [x n = s] I [c n = c] log s(c)+ CX DX c=1 =1 c 1! SX s(c) s=1 ( ) To fnd the optmum of ths functon we may d erentate wth respect to s(c) and equate to zero. Solvng the resultng equaton we obtan n=1 I [x n = s] I [c n = c] s(c) = c ( ) Hence, by normalsaton, s(c) =p(x = s c) = P n I [xn = s] I [c n = c] Ps 0,n 0 I x n0 = s 0 I [c n0 = c] ( ) The Maxmum Lkelhood settng for the parameter p(x = s c) equals the relatve number of tmes that attrbute s n state s for class c. DRAFT March 9,
9 Bayesan Nave Bayes n =1:N c n c x n Fgure 10.3: Bayesan Nave Bayes wth a factorsed pror on the class condtonal attrbute probabltes p(x = s c). For smplcty we assume that the class probablty c p(c) s learned wth Maxmum Lkelhood, so that no dstrbuton s placed over ths parameter.,c c =1:C =1:D Text classfcaton Consder a set of documents about poltcs, and another set about sport. Our nterest s to make a method that can automatcally classfy a new document as pertanng to ether sport or poltcs. We search through both sets of documents to fnd the 100 most commonly occurrng words. Each document s then represented by a 100 dmensonal vector representng the number of tmes that each of the words occurs n that document the so called bag of words representaton (ths s a crude representaton of the document snce t dscards word order). A Nave Bayes model specfes a dstrbuton of these number of occurrences p(x c), where x s the count of the number of tmes word appears n documents of type c. One can acheve ths usng ether a multstate representaton (as dscussed n secton(10.2.2)) or usng a contnuous x to represent the frequency of word n the document. In ths case p(x c) could be convenently modelled usng for example a Beta dstrbuton. Despte the smplcty of Nave Bayes, t can classfy documents surprsngly well[125]. Intutvely a potental justfcaton for the condtonal ndependence assumpton s that f we know a document s about poltcs, ths s a good ndcaton of the knds of other words we wll fnd n the document. Because Nave Bayes s a reasonable classfer n ths sense, and has mnmal storage and fast tranng, t has been appled to tme-storage crtcal applcatons, such as automatcally classfyng webpages nto types[289], and spam flterng[9] Bayesan Nave Bayes To predct the class c of an nput x we use p(c x, D) / p(x, D, c)p(c D) / p(x D, c)p(c D) (10.3.1) For convenence we wll smply set p(c D) usng Maxmum Lkelhood p(c D) = 1 X I [c n = c] (10.3.2) N n However, as we ve seen, settng the parameters of p(x D,c) usng Maxmum Lkelhood tranng can yeld over-confdent predctons n the case of sparse data. A Bayesan approach that addresses ths d culty s to use prors on the probabltes p(x = s c) s(c) that dscourage extreme values. The model s depcted n fg(10.3). The pror We wll use a pror on the table entres and make the global factorsaton assumpton (see secton(9.3)) p() = Y,c p( (c)) (10.3.3) 208 DRAFT March 9, 2010
Learning from Data 1 Naive Bayes
Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationBayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County
Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationEconomics 101. Lecture 4 - Equilibrium and Efficiency
Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationMaxent Models & Deep Learning
Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationChapter 1. Probability
Chapter. Probablty Mcroscopc propertes of matter: quantum mechancs, atomc and molecular propertes Macroscopc propertes of matter: thermodynamcs, E, H, C V, C p, S, A, G How do we relate these two propertes?
More informationarxiv: v2 [stat.me] 26 Jun 2012
The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationLaboratory 1c: Method of Least Squares
Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationUniversity of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015
Lecture 2. 1/07/15-1/09/15 Unversty of Washngton Department of Chemstry Chemstry 453 Wnter Quarter 2015 We are not talkng about truth. We are talkng about somethng that seems lke truth. The truth we want
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More information1 Binary Response Models
Bnary and Ordered Multnomal Response Models Dscrete qualtatve response models deal wth dscrete dependent varables. bnary: yes/no, partcpaton/non-partcpaton lnear probablty model LPM, probt or logt models
More informationOpen Systems: Chemical Potential and Partial Molar Quantities Chemical Potential
Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More informationPh 219a/CS 219a. Exercises Due: Wednesday 23 October 2013
1 Ph 219a/CS 219a Exercses Due: Wednesday 23 October 2013 1.1 How far apart are two quantum states? Consder two quantum states descrbed by densty operators ρ and ρ n an N-dmensonal Hlbert space, and consder
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationHere is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)
Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationLaboratory 3: Method of Least Squares
Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More information